AI Attack Surface Mapping
Systematic methodology for identifying all attack vectors in AI systems: input channels, data flows, tool integrations, and trust boundaries.
AI Attack Surface Mapping
Attack surface mapping for AI systems is the process of identifying every point of interaction between an attacker and the AI deployment. Unlike traditional web application attack surfaces that focus on HTTP endpoints and input fields, AI attack surfaces include semantic channels (natural language inputs that influence model behavior), data channels (documents, knowledge bases, and training data), and capability channels (tools and APIs that the model can invoke).
The AI Attack Surface Model
AI systems have attack surfaces across six dimensions:
AI Attack Surface Dimensions:
1. Input Channels
├── Direct user input (chat, API)
├── Indirect data (documents, emails, web pages)
├── Multimodal inputs (images, audio, video)
└── System configuration (prompts, parameters)
2. Data Channels
├── RAG knowledge bases
├── Training/fine-tuning data
├── User conversation history
└── External data sources
3. Capability Channels
├── Tool/function calling
├── Code execution
├── API integrations
└── File system access
4. Model Internals
├── Weights and activations
├── Tokenizer
├── Embedding space
└── Attention mechanics
5. Infrastructure
├── API endpoints
├── Authentication/authorization
├── Rate limiting
└── Logging and monitoring
6. Human Channels
├── Operators and administrators
├── Content moderators
├── End users (social engineering)
└── Developer team
Mapping Methodology
Phase 1: System Decomposition
Break the AI system into its component parts:
class AISystemDecomposition:
"""Decompose an AI system into mappable components."""
def __init__(self, system_name):
self.system = system_name
self.components = {
"inputs": [],
"processing": [],
"outputs": [],
"data_stores": [],
"integrations": [],
"humans": []
}
def map_input_channels(self):
"""Identify all input channels to the AI system."""
channels = [
{
"name": "User chat interface",
"type": "direct",
"format": "natural language",
"attacker_controllable": True,
"filtered": True # Has content filter
},
{
"name": "Document upload",
"type": "indirect",
"format": "PDF, DOCX, TXT",
"attacker_controllable": True,
"filtered": False # No injection filter on uploads
},
{
"name": "RAG knowledge base",
"type": "indirect",
"format": "text chunks",
"attacker_controllable": "partially",
"filtered": False
},
{
"name": "System prompt",
"type": "configuration",
"format": "text",
"attacker_controllable": False,
"filtered": False # Trusted input
}
]
self.components["inputs"] = channels
return channelsPhase 2: Trust Boundary Analysis
Identify where data crosses trust boundaries:
def map_trust_boundaries(system_components):
"""
Identify trust boundaries in the AI system.
Each boundary transition is a potential attack point.
"""
boundaries = []
# Boundary 1: External -> Application
boundaries.append({
"name": "User Input Boundary",
"from_zone": "untrusted (external)",
"to_zone": "application layer",
"data_crossing": ["user messages", "uploaded files"],
"controls": ["input validation", "content filter"],
"attack_types": ["prompt injection", "file upload attacks"]
})
# Boundary 2: Application -> Model
boundaries.append({
"name": "Model Context Boundary",
"from_zone": "application layer",
"to_zone": "model context window",
"data_crossing": ["system prompt", "user message",
"retrieved documents", "conversation history"],
"controls": ["instruction hierarchy"],
"attack_types": ["indirect injection via retrieved docs",
"context window manipulation"]
})
# Boundary 3: Model -> Tools
boundaries.append({
"name": "Tool Execution Boundary",
"from_zone": "model decisions",
"to_zone": "external systems",
"data_crossing": ["tool calls", "API requests",
"database queries", "file operations"],
"controls": ["tool allowlisting", "parameter validation"],
"attack_types": ["tool abuse", "data exfiltration",
"privilege escalation"]
})
# Boundary 4: Data Store -> Application
boundaries.append({
"name": "Data Retrieval Boundary",
"from_zone": "data stores",
"to_zone": "model context",
"data_crossing": ["knowledge base chunks",
"user history", "cached responses"],
"controls": ["access control", "content sanitization"],
"attack_types": ["knowledge base poisoning",
"data extraction"]
})
return boundariesPhase 3: Data Flow Mapping
Trace how data moves through the system:
def map_data_flows(system):
"""Map all data flows through the AI system."""
flows = []
# User input flow
flows.append({
"name": "User Query Processing",
"path": [
"User Input",
"Input Filter",
"Query Embedding",
"RAG Retrieval",
"Context Assembly",
"Model Inference",
"Output Filter",
"Response Delivery"
],
"sensitive_data": ["user query (may contain PII)"],
"injection_points": ["User Input", "RAG Retrieval"]
})
# RAG retrieval flow
flows.append({
"name": "Knowledge Retrieval",
"path": [
"Query Embedding",
"Vector Store Search",
"Chunk Retrieval",
"Re-ranking",
"Context Insertion"
],
"sensitive_data": ["knowledge base content"],
"injection_points": ["Vector Store (if poisoned)",
"Context Insertion (if unfiltered)"]
})
# Tool execution flow
flows.append({
"name": "Tool Call Execution",
"path": [
"Model Decision",
"Tool Call Parsing",
"Parameter Validation",
"Tool Execution",
"Result Processing",
"Response Integration"
],
"sensitive_data": ["tool parameters", "tool results"],
"injection_points": ["Model Decision (via injection)",
"Parameter Validation (bypass)"]
})
return flowsPhase 4: Attack Vector Prioritization
def prioritize_attack_vectors(trust_boundaries, data_flows):
"""Prioritize attack vectors by risk and accessibility."""
vectors = []
for boundary in trust_boundaries:
for attack_type in boundary["attack_types"]:
vectors.append({
"vector": attack_type,
"boundary": boundary["name"],
"accessibility": score_accessibility(boundary),
"impact": score_impact(attack_type),
"defense_coverage": score_defenses(boundary["controls"]),
"priority": None # Calculated below
})
# Calculate priority score
for vector in vectors:
vector["priority"] = (
vector["accessibility"] * 0.3 +
vector["impact"] * 0.4 +
(1 - vector["defense_coverage"]) * 0.3
)
# Sort by priority (highest first)
vectors.sort(key=lambda x: x["priority"], reverse=True)
return vectorsProducing the Attack Surface Map
The final output is a structured document that guides red team operations:
Attack Surface Map: [Target System Name]
Date: [Assessment Date]
1. SYSTEM OVERVIEW
- Architecture type: RAG-augmented chatbot with tool access
- Model: [identified or suspected]
- Infrastructure: [cloud provider, deployment type]
2. INPUT CHANNELS (ranked by exploitability)
Priority 1: Document upload (no injection filtering)
Priority 2: Direct chat input (basic content filter)
Priority 3: RAG knowledge base (limited write access)
3. TRUST BOUNDARIES
[Diagram showing boundary transitions]
4. DATA FLOWS
[Diagram showing sensitive data movement]
5. TOOL CAPABILITIES
- Email sending (high impact)
- Database queries (data exposure risk)
- File operations (limited to uploads directory)
6. RECOMMENDED ATTACK SEQUENCE
1. Indirect injection via document upload
2. System prompt extraction via chat
3. Tool abuse for data exfiltration
4. Knowledge base poisoning for persistence
Related Topics
- Target Profiling — Initial reconnaissance
- Model Identification — Identifying the model
- API Enumeration — Discovering API capabilities
- Defense Taxonomy — Understanding what defenses to expect
An AI system accepts user chat (filtered), processes uploaded documents (unfiltered), and has access to email-sending tools. Which attack path should a red team prioritize?
References
- OWASP, "Attack Surface Analysis Cheat Sheet" (2023)
- MITRE, "ATLAS: Adversarial Threat Landscape for AI Systems" (2023)
- Microsoft, "Threat Modeling for AI/ML Systems" (2023)