Mapping the Attack Surface of AI Systems
Systematic walkthrough for identifying and mapping every attack surface in an AI system, from user inputs through model inference to output delivery and tool integrations.
Attack surface mapping for AI systems goes beyond the traditional web application model of "find all the endpoints and parameters." AI systems introduce attack surfaces at the semantic level: the meaning of inputs matters as much as their format. A syntactically valid input that is semantically adversarial is the core of prompt injection. This walkthrough provides a systematic methodology for discovering every point where an attacker can influence AI system behavior.
The goal is not to test these surfaces yet but to create a comprehensive map that ensures nothing is overlooked when testing begins. Think of this as reconnaissance, not exploitation.
Step 1: Enumerate Input Vectors
Every path through which data enters the AI system is a potential attack vector. AI systems have more input vectors than most teams realize.
Direct Input Vectors
These are inputs the user explicitly provides:
# Direct Input Vector Inventory
| ID | Input Vector | Type | Validation | Auth Required | Notes |
|----|-------------|------|-----------|---------------|-------|
| DI-1 | Chat message text | Free text | Character limit only | Yes | Primary interaction |
| DI-2 | File uploads | PDF, DOCX, images | File type filter | Yes | Processed by model |
| DI-3 | Image inputs | JPEG, PNG | Size limit | Yes | Multimodal models |
| DI-4 | Voice input | Audio stream | Duration limit | Yes | Speech-to-text pipeline |
| DI-5 | API parameters | JSON fields | Schema validation | API key | Programmatic access |
| DI-6 | Conversation history | Prior messages | Session-bound | Yes | Context window |
| DI-7 | User profile fields | Structured data | Input sanitization | Yes | May enter context |
| DI-8 | System configuration | Admin settings | Role-restricted | Admin | Prompts, parameters |Indirect Input Vectors
These are data sources the model consumes that an attacker may be able to influence:
# Indirect Input Vector Inventory
| ID | Input Vector | Data Source | Attacker Controllable? | Entry Point |
|----|-------------|-------------|----------------------|-------------|
| II-1 | RAG documents | Knowledge base | Possibly (document upload) | Retrieved by query similarity |
| II-2 | Web search results | Search API | Yes (SEO, content injection) | Retrieved dynamically |
| II-3 | Email content | Email integration | Yes (send email to system) | Parsed and summarized |
| II-4 | Calendar events | Calendar API | Yes (send meeting invite) | Event details processed |
| II-5 | Database records | Application DB | Possibly (data injection) | Retrieved by function calls |
| II-6 | API responses | External APIs | Possibly (compromised API) | Tool call responses |
| II-7 | Code repository | Git integration | Yes (pull request content) | Code review/analysis |
| II-8 | Shared documents | Collaboration tools | Yes (shared doc content) | Summarization, search |Hidden Input Vectors
These inputs are not obvious from documentation but exist in the system:
# Hidden Input Vector Checklist
- [ ] HTTP headers (User-Agent, Referer, Accept-Language) -- processed by any middleware?
- [ ] URL parameters beyond documented ones
- [ ] Cookie values that influence model behavior
- [ ] Client metadata (device type, browser, locale) used in prompt construction
- [ ] Session state that carries between conversations
- [ ] Error messages from other system components fed back to model
- [ ] Monitoring data or metrics visible to the model
- [ ] Cached responses from previous users (shared cache vulnerability)Step 2: Map the Processing Pipeline
Trace how each input flows through the system from entry to output. Every processing stage is a potential point of manipulation or failure.
Pipeline Stage Mapping
"""
pipeline_mapper.py
Template for documenting the processing pipeline stages.
Fill in details from architecture review and black-box observation.
"""
PIPELINE_STAGES = {
"1_input_reception": {
"description": "Raw input received from client",
"components": ["API gateway", "Load balancer", "WAF"],
"attack_surfaces": [
"Header injection",
"Oversized payload",
"Encoding manipulation (UTF-8, Unicode)",
"Content-type confusion"
],
"controls": ["Rate limiting", "Size limits", "WAF rules"]
},
"2_authentication": {
"description": "User identity verified",
"components": ["Auth service", "Token validation"],
"attack_surfaces": [
"Token theft or forgery",
"Session hijacking",
"Privilege escalation",
"API key leakage"
],
"controls": ["OAuth 2.0", "API key rotation", "Session management"]
},
"3_input_preprocessing": {
"description": "Input cleaned and formatted for model consumption",
"components": ["Input sanitizer", "Prompt template engine"],
"attack_surfaces": [
"Filter bypass via encoding",
"Template injection",
"Input truncation manipulation",
"Unicode normalization attacks"
],
"controls": ["Blocklist filtering", "Input length limits"]
},
"4_context_assembly": {
"description": "System prompt + user input + retrieved context assembled",
"components": ["Prompt builder", "RAG retriever", "History manager"],
"attack_surfaces": [
"System prompt injection point",
"RAG poisoning via retrieved documents",
"History manipulation",
"Context window overflow",
"Instruction hierarchy confusion"
],
"controls": ["Prompt hardening", "Retrieval filtering"]
},
"5_model_inference": {
"description": "Assembled prompt sent to LLM for inference",
"components": ["Model API", "Inference endpoint"],
"attack_surfaces": [
"Jailbreaking via crafted prompts",
"Model extraction via systematic querying",
"Training data extraction",
"Logprob-based information leakage"
],
"controls": ["Safety training", "Content filter", "Temperature settings"]
},
"6_tool_execution": {
"description": "Model-requested tool/function calls executed",
"components": ["Function router", "Tool executors", "Result handler"],
"attack_surfaces": [
"Unauthorized function invocation",
"Parameter injection (SQL, command, path)",
"SSRF through URL parameters",
"Chained tool abuse",
"Result manipulation"
],
"controls": ["Function allow-list", "Parameter validation"]
},
"7_output_processing": {
"description": "Model output filtered and formatted for delivery",
"components": ["Output filter", "PII detector", "Formatter"],
"attack_surfaces": [
"Filter bypass via encoding or formatting",
"PII detector evasion",
"Output injection into downstream systems",
"Markdown/HTML injection in rendered output"
],
"controls": ["Content moderation", "PII redaction", "Output sanitization"]
},
"8_response_delivery": {
"description": "Processed response returned to client",
"components": ["API response handler", "Streaming handler"],
"attack_surfaces": [
"Response interception",
"Streaming partial response exposure",
"Cache poisoning",
"Response header manipulation"
],
"controls": ["TLS", "Response headers", "Cache controls"]
}
}Data Flow Diagram
Document how data flows between pipeline stages, noting where data format changes occur (format changes are common vulnerability points):
User Input (text)
│
▼
[Input Filter] ──── Blocked inputs logged
│
▼
User Input (sanitized text)
│
├──▶ [RAG Retriever] ──── Query (embedding) ──▶ Vector DB
│ │ │
│ ◀──── Retrieved chunks (text) ───────────┘
│
▼
[Prompt Builder] ◀── System Prompt (template)
│ Conversation History (text)
│ Retrieved Context (text)
│ User Profile (structured → text)
▼
Assembled Prompt (text)
│
▼
[LLM API] ──── Model Response (text + tool calls)
│
├──▶ [Tool Router] ──── Function Call (JSON)
│ │ │
│ │ [External API]
│ │ │
│ ◀──── Tool Result (JSON → text)
│
▼
[Output Filter] ──── Blocked outputs logged
│
▼
[PII Redactor]
│
▼
Response to User (text)
Step 3: Identify Trust Boundary Crossings
Every time data crosses a trust boundary, there is an opportunity for attack. Map each crossing and the controls (or lack thereof) at each boundary.
Trust Boundary Analysis
# Trust Boundary Crossing Inventory
| ID | Boundary | Data Crossing | Direction | Controls | Risk |
|----|----------|--------------|-----------|----------|------|
| TB-1 | User → Application | User message text | Inbound | Input filter, rate limit | Medium |
| TB-2 | Application → Model | Assembled prompt | Outbound | None (trusted by model) | High |
| TB-3 | RAG Store → Model | Retrieved documents | Inbound | Relevance filtering only | High |
| TB-4 | Model → Tool Layer | Function call + params | Outbound | Allow-list check | High |
| TB-5 | Tool Layer → External API | API request | Outbound | Auth, URL validation | Medium |
| TB-6 | External API → Tool Layer | API response | Inbound | Response validation | Medium |
| TB-7 | Tool Layer → Model | Tool results | Inbound | Result formatting | High |
| TB-8 | Model → Output Filter | Raw model output | Outbound | Content moderation | Medium |
| TB-9 | Output Filter → User | Filtered response | Outbound | PII redaction | Low |The highest-risk boundaries are those where controls are weakest or absent. In many AI systems, the boundary between the application and the model (TB-2) has no controls at all since the assembled prompt is passed directly to the model with full trust.
Step 4: Map External Dependencies and Integrations
External dependencies expand the attack surface beyond the application boundary.
Integration Inventory
# External Integration Inventory
| Integration | Purpose | Data Exchanged | Auth Method | Failure Mode |
|------------|---------|---------------|-------------|--------------|
| OpenAI API | Model inference | Prompts, responses | API key | 503, rate limit |
| Pinecone | Vector storage | Embeddings, queries | API key | Timeout, stale data |
| PostgreSQL | User data, config | Structured queries | Connection string | Query failure |
| Stripe API | Payment processing | Transaction data | API key | Payment failure |
| SendGrid | Email delivery | Email content | API key | Delivery failure |
| Google Search | Web search tool | Search queries, results | API key | No results |
| GitHub API | Code analysis tool | Repository content | OAuth token | Rate limit |
## Dependency Risk Assessment
| Integration | Compromise Impact | Supply Chain Risk | Monitoring |
|------------|------------------|------------------|------------|
| OpenAI API | Full model behavior exposure | Medium (provider risk) | API logs |
| Pinecone | Knowledge base manipulation | Low | Query logs |
| PostgreSQL | Data exfiltration, modification | Low (self-hosted) | DB audit log |Step 5: Document API Endpoints and Parameters
For each API endpoint, document every parameter and its potential for abuse.
Endpoint Mapping Template
# API Endpoint Inventory
## POST /api/v1/chat
| Parameter | Type | Required | Validation | Attack Potential |
|-----------|------|----------|-----------|-----------------|
| message | string | Yes | 4096 char limit | Prompt injection |
| conversation_id | UUID | No | Format check | Session hijacking |
| model | string | No | Enum validation | Model switching |
| temperature | float | No | 0.0-2.0 range | Output manipulation |
| tools_enabled | boolean | No | Boolean check | Tool use bypass |
| stream | boolean | No | Boolean check | Partial response exposure |
## POST /api/v1/upload
| Parameter | Type | Required | Validation | Attack Potential |
|-----------|------|----------|-----------|-----------------|
| file | binary | Yes | Type + size check | Malicious file processing |
| purpose | string | Yes | Enum validation | Processing path manipulation |
| extract_text | boolean | No | Boolean check | OCR injection |
## GET /api/v1/conversations/{id}
| Parameter | Type | Required | Validation | Attack Potential |
|-----------|------|----------|-----------|-----------------|
| id | UUID | Yes | Format + ownership | IDOR (other user conversations) |
| include_system | boolean | No | Boolean check | System prompt exposure |
| format | string | No | Enum validation | Response format manipulation |Step 6: Compile the Attack Surface Map
Combine all findings into a structured attack surface map document.
Attack Surface Summary Matrix
| Attack Surface Category | Count | Critical | High | Medium | Low |
|---|---|---|---|---|---|
| Direct input vectors | 8 | 2 | 3 | 2 | 1 |
| Indirect input vectors | 8 | 3 | 3 | 2 | 0 |
| Hidden input vectors | 4 | 0 | 2 | 1 | 1 |
| Pipeline stages | 8 | 2 | 3 | 2 | 1 |
| Trust boundary crossings | 9 | 3 | 3 | 2 | 1 |
| External integrations | 7 | 1 | 2 | 3 | 1 |
| API endpoints/parameters | 15 | 2 | 5 | 5 | 3 |
| Total | 59 | 13 | 21 | 17 | 8 |
Priority Testing Targets
Based on the attack surface analysis, prioritize testing targets:
# Testing Priority List
## Priority 1 (Test First)
1. Prompt injection through primary chat input (DI-1)
2. RAG document injection (II-1) -- if knowledge base is writable
3. System prompt extraction via chat (DI-1 → Stage 4)
4. Function calling parameter injection (Stage 6, TB-4)
5. Cross-tenant data leakage (TB-3, if multi-tenant)
## Priority 2 (Test Second)
6. Indirect injection via web search results (II-2)
7. File upload processing manipulation (DI-2)
8. Output filter bypass (Stage 7, TB-8)
9. Conversation history manipulation (DI-6)
10. API parameter abuse (endpoints inventory)
## Priority 3 (Test If Time Permits)
11. Hidden input vectors (HTTP headers, cookies)
12. Model extraction through systematic querying
13. Cache poisoning (Stage 8)
14. External API response manipulation (TB-6)
15. Error message information disclosureCommon Attack Surface Mapping Mistakes
-
Only mapping direct inputs. The most impactful AI attacks come through indirect channels: RAG documents, web search results, email content, and other data sources the model reads. If your attack surface map only covers the chat input, you are missing the majority of the attack surface.
-
Ignoring the prompt assembly stage. How the system constructs the final prompt from multiple sources (system prompt, user input, retrieved context, conversation history) creates composition vulnerabilities. Two individually safe inputs can become dangerous when combined.
-
Treating the model as a single component. The model inference stage has multiple sub-surfaces: the prompt interpretation, the safety alignment, the content policy, and the tool use decision logic. Each can be attacked independently.
-
Not mapping failure modes. What happens when a component fails? Does the system fall back to an unfiltered mode? Does it expose error messages that reveal system internals? Failure modes are attack surfaces too.
-
Mapping once and never updating. Attack surfaces change when new features are added, integrations change, or models are updated. Schedule periodic attack surface reviews, especially after significant system changes.
Which type of attack surface is most commonly overlooked in AI system assessments?
Related Topics
- AI Threat Model Creation -- Using the attack surface map to build threat models
- Test Plan Development -- Converting attack surface priorities into test plans
- Mapping to MITRE ATLAS -- Mapping attack surfaces to ATLAS techniques
- Reconnaissance Workflow -- Detailed reconnaissance procedures for discovering attack surfaces