What is Tool Use Exploitation?

How to manipulate AI agents into calling tools with attacker-controlled parameters, abusing tool capabilities for data exfiltration, privilege escalation, and unauthorized actions.

What is Chain-of-Thought Manipulation?

Techniques for influencing an AI agent's chain-of-thought reasoning to steer its planning, decision-making, and tool selection toward attacker-desired outcomes.

What is Multi-Agent Attacks?

Exploitation techniques for multi-agent architectures including inter-agent injection, trust boundary violations, cascading compromises, and A2A protocol attacks.

What is Permission Boundary Bypass?

Escalating from limited to elevated permissions in AI agent systems through scope creep, implicit permission inheritance, and capability confusion.

What is Markdown Image Injection?

Injecting markdown image tags with attacker-controlled URLs to exfiltrate conversation data via HTTP image requests.

What is Link-Based Exfiltration?

Using hyperlinks, redirects, or URL parameters to exfiltrate data from AI systems through markdown links, tool-generated URLs, and API callback exploitation.

What is Tool Use Exploitation (Advanced)?

Comprehensive techniques for exploiting how AI agents call external tools and APIs, including tool description poisoning, overly permissive access abuse, and tool output manipulation.

What is Agent Memory Poisoning?

Techniques for injecting malicious content into agent memory systems -- conversation history, RAG stores, and vector databases -- to achieve persistent cross-session compromise.

What is Agent Goal Hijacking?

Techniques for redirecting AI agent objectives through poisoned inputs, indirect prompt injection, and multi-step manipulation -- the #1 ranked risk in OWASP's 2026 Agentic Top 10.

What is Agent Delegation Attacks?

Exploiting multi-agent delegation patterns to achieve lateral movement, privilege escalation, and command-and-control through impersonation and insecure inter-agent communication.

Agent & Agentic Exploitation

beginner6 min readUpdated 2026-03-12

Security overview of autonomous AI agents, covering the expanded attack surface created by tool use, persistent memory, multi-step reasoning, and multi-agent coordination.

agents agentic-ai tool-use mcp security

AI agents represent a fundamental expansion of the LLM attack surface. While a basic chatbot can only produce text, an agent can execute code, browse the web, send emails, modify files, and interact with external services. Every tool an agent can access becomes a potential attack vector.

What Makes Agents Different

Traditional LLM applications are stateless text-in, text-out systems. Agents add:

Tool access — Functions the agent can call (file system, APIs, databases, code execution)
Persistent memory — State that carries across conversations and sessions
Multi-step reasoning — The agent plans and executes sequences of actions
Environment interaction — The agent reads from and writes to external systems
Autonomy — The agent makes decisions without human approval for each step

Each of these capabilities creates new attack surfaces that do not exist in simple chat interfaces.

The Agent Attack Surface

                    ┌─────────────────────┐
                    │   Tool Definitions   │ ← Tool poisoning
                    └──────────┬──────────┘
                               │
User Input ──→ Agent LLM ──→ Tool Calls ──→ External Systems
     ↑              │              │               │
     │         ┌────┴────┐   ┌────┴────┐     ┌────┴────┐
     │         │ Memory  │   │ Results │     │  Side   │
     │         │ Store   │   │ Parsing │     │ Effects │
     │         └─────────┘   └─────────┘     └─────────┘
     │          ↑ Memory         ↑ Indirect       ↑ Real-world
     │            poisoning        injection        impact
     └──── Prompt injection via any input channel

Attack Category	Description	Impact
Tool abuse	Manipulate which tools the agent calls and with what parameters	Code execution, data exfiltration, privilege escalation
CoT manipulation	Steer the agent's reasoning process to reach attacker-desired conclusions	Subtle behavior modification, goal hijacking
Multi-agent attacks	Exploit trust relationships between cooperating agents	Cascade failures, inter-agent injection
Memory poisoning	Inject persistent instructions into the agent's memory	Long-term backdoors, cross-session attacks

The MCP Attack Surface

The Model Context Protocol (MCP) standardizes how agents discover and invoke tools. MCP introduces specific attack vectors:

Tool enumeration — An attacker can see what tools are available to an agent
Tool description manipulation — Malicious tool descriptions can steer agent behavior
Parameter injection — Crafted inputs cause the agent to pass attacker-controlled values to tools
Transport-level attacks — Man-in-the-middle on the stdio or HTTP/SSE transport

Key Principles for Agent Red Teaming

Map the tool surface first — Before testing injections, enumerate every tool the agent can access and understand its capabilities and permissions
Tools amplify injection impact — Every tool is a potential exfiltration channel or destructive capability
Memory creates persistence — Injections stored in agent memory persist beyond the current session
Trust boundaries are implicit — Agents typically trust tool outputs and other agents without verification
Autonomy increases blast radius — Agents that act without human confirmation are higher-impact targets

Learning Path

Start with Tool Use Exploitation to understand the most common and impactful agent attack vector, then progress to Chain-of-Thought Manipulation for subtler techniques, and finally Multi-Agent Attacks for the most complex scenarios.

Prompt Injection & Jailbreaks — The foundational vulnerability that agent exploitation amplifies
Agent Architectures — Understanding ReAct, tool use, and memory patterns that create the attack surface
Lab: Agent Exploitation — Hands-on practice exploiting agent tool use and reasoning
API Security — Securing the tool interfaces and transport layers agents depend on
MCP Security — Attack vectors specific to the Model Context Protocol

References

Greshake, K. et al. (2023). "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection"
Zhan, Q. et al. (2024). "InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated LLM Agents"
Ruan, Y. et al. (2024). "Identifying the Risks of LM Agents with an LM-Emulated Sandbox"
OWASP (2025). OWASP Top 10 for LLM Applications
Xi, Z. et al. (2023). "The Rise and Potential of Large Language Model Based Agents: A Survey"

Knowledge Check

Why does tool access fundamentally change the risk profile of prompt injection?

Agent & Agentic Exploitation

Learning Path

Related articles

Agent & Agentic Exploitation

Learning Path

Related articles