Agent & Agentic Exploitation
Security overview of autonomous AI agents, covering the expanded attack surface created by tool use, persistent memory, multi-step reasoning, and multi-agent coordination.
AI agents represent a fundamental expansion of the LLM attack surface. While a basic chatbot can only produce text, an agent can execute code, browse the web, send emails, modify files, and interact with external services. Every tool an agent can access becomes a potential attack vector.
What Makes Agents Different
Traditional LLM applications are stateless text-in, text-out systems. Agents add:
- Tool access — Functions the agent can call (file system, APIs, databases, code execution)
- Persistent memory — State that carries across conversations and sessions
- Multi-step reasoning — The agent plans and executes sequences of actions
- Environment interaction — The agent reads from and writes to external systems
- Autonomy — The agent makes decisions without human approval for each step
Each of these capabilities creates new attack surfaces that do not exist in simple chat interfaces.
The Agent Attack Surface
┌─────────────────────┐
│ Tool Definitions │ ← Tool poisoning
└──────────┬──────────┘
│
User Input ──→ Agent LLM ──→ Tool Calls ──→ External Systems
↑ │ │ │
│ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ │ Memory │ │ Results │ │ Side │
│ │ Store │ │ Parsing │ │ Effects │
│ └─────────┘ └─────────┘ └─────────┘
│ ↑ Memory ↑ Indirect ↑ Real-world
│ poisoning injection impact
└──── Prompt injection via any input channel
| Attack Category | Description | Impact |
|---|---|---|
| Tool abuse | Manipulate which tools the agent calls and with what parameters | Code execution, data exfiltration, privilege escalation |
| CoT manipulation | Steer the agent's reasoning process to reach attacker-desired conclusions | Subtle behavior modification, goal hijacking |
| Multi-agent attacks | Exploit trust relationships between cooperating agents | Cascade failures, inter-agent injection |
| Memory poisoning | Inject persistent instructions into the agent's memory | Long-term backdoors, cross-session attacks |
The MCP Attack Surface
The Model Context Protocol (MCP) standardizes how agents discover and invoke tools. MCP introduces specific attack vectors:
- Tool enumeration — An attacker can see what tools are available to an agent
- Tool description manipulation — Malicious tool descriptions can steer agent behavior
- Parameter injection — Crafted inputs cause the agent to pass attacker-controlled values to tools
- Transport-level attacks — Man-in-the-middle on the stdio or HTTP/SSE transport
Key Principles for Agent Red Teaming
- Map the tool surface first — Before testing injections, enumerate every tool the agent can access and understand its capabilities and permissions
- Tools amplify injection impact — Every tool is a potential exfiltration channel or destructive capability
- Memory creates persistence — Injections stored in agent memory persist beyond the current session
- Trust boundaries are implicit — Agents typically trust tool outputs and other agents without verification
- Autonomy increases blast radius — Agents that act without human confirmation are higher-impact targets
Learning Path
Start with Tool Use Exploitation to understand the most common and impactful agent attack vector, then progress to Chain-of-Thought Manipulation for subtler techniques, and finally Multi-Agent Attacks for the most complex scenarios.
Related Topics
- Prompt Injection & Jailbreaks — The foundational vulnerability that agent exploitation amplifies
- Agent Architectures — Understanding ReAct, tool use, and memory patterns that create the attack surface
- Lab: Agent Exploitation — Hands-on practice exploiting agent tool use and reasoning
- API Security — Securing the tool interfaces and transport layers agents depend on
- MCP Security — Attack vectors specific to the Model Context Protocol
References
- Greshake, K. et al. (2023). "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection"
- Zhan, Q. et al. (2024). "InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated LLM Agents"
- Ruan, Y. et al. (2024). "Identifying the Risks of LM Agents with an LM-Emulated Sandbox"
- OWASP (2025). OWASP Top 10 for LLM Applications
- Xi, Z. et al. (2023). "The Rise and Potential of Large Language Model Based Agents: A Survey"
Why does tool access fundamentally change the risk profile of prompt injection?