Mapping the 攻擊 Surface of AI Systems
Systematic walkthrough for identifying and mapping every attack surface in an AI system, from user inputs through model inference to output delivery and tool integrations.
攻擊面 mapping for AI systems goes beyond the traditional web application model of "find all the endpoints and parameters." AI systems introduce attack surfaces at the semantic level: the meaning of inputs matters as much as their format. A syntactically valid 輸入 that is semantically 對抗性 is the core of 提示詞注入. This walkthrough provides a systematic methodology for discovering every point where 攻擊者 can influence AI system behavior.
The goal is not to 測試 these surfaces yet but to create a comprehensive map that ensures nothing is overlooked when 測試 begins. Think of this as reconnaissance, not 利用.
Step 1: Enumerate 輸入 Vectors
Every path through which data enters the AI system is a potential attack vector. AI systems have more 輸入 vectors than most teams realize.
Direct 輸入 Vectors
These are inputs 使用者 explicitly provides:
# Direct 輸入 Vector Inventory
| ID | 輸入 Vector | Type | Validation | Auth Required | Notes |
|----|-------------|------|-----------|---------------|-------|
| DI-1 | Chat message text | Free text | Character limit only | Yes | Primary interaction |
| DI-2 | File uploads | PDF, DOCX, images | File type filter | Yes | Processed by model |
| DI-3 | Image inputs | JPEG, PNG | Size limit | Yes | Multimodal models |
| DI-4 | Voice 輸入 | Audio stream | Duration limit | Yes | Speech-to-text pipeline |
| DI-5 | API parameters | JSON fields | Schema validation | API key | Programmatic access |
| DI-6 | Conversation history | Prior messages | Session-bound | Yes | Context window |
| DI-7 | User profile fields | Structured data | 輸入 sanitization | Yes | May enter context |
| DI-8 | System configuration | Admin settings | Role-restricted | Admin | Prompts, parameters |Indirect 輸入 Vectors
These are data sources 模型 consumes that 攻擊者 may be able to influence:
# Indirect 輸入 Vector Inventory
| ID | 輸入 Vector | Data Source | Attacker Controllable? | Entry Point |
|----|-------------|-------------|----------------------|-------------|
| II-1 | RAG documents | 知識庫 | Possibly (document upload) | Retrieved by query similarity |
| II-2 | Web search results | Search API | Yes (SEO, content injection) | Retrieved dynamically |
| II-3 | Email content | Email integration | Yes (send email to system) | Parsed and summarized |
| II-4 | Calendar events | Calendar API | Yes (send meeting invite) | Event details processed |
| II-5 | 資料庫 records | Application DB | Possibly (data injection) | Retrieved by function calls |
| II-6 | API responses | External APIs | Possibly (compromised API) | Tool call responses |
| II-7 | Code repository | Git integration | Yes (pull request content) | Code review/analysis |
| II-8 | Shared documents | Collaboration tools | Yes (shared doc content) | Summarization, search |Hidden 輸入 Vectors
These inputs are not obvious from documentation but exist in 系統:
# Hidden 輸入 Vector Checklist
- [ ] HTTP headers (User-代理, Referer, Accept-Language) -- processed by any middleware?
- [ ] URL parameters beyond documented ones
- [ ] Cookie values that influence model behavior
- [ ] Client metadata (device type, browser, locale) used in prompt construction
- [ ] Session state that carries between conversations
- [ ] Error messages from other system components fed back to model
- [ ] 監控 data or metrics visible to 模型
- [ ] Cached responses from previous users (shared cache 漏洞)Step 2: Map the Processing Pipeline
Trace how each 輸入 flows through 系統 from entry to 輸出. Every processing stage is a potential point of manipulation or failure.
Pipeline Stage Mapping
"""
pipeline_mapper.py
Template for documenting the processing pipeline stages.
Fill in details from architecture review and black-box observation.
"""
PIPELINE_STAGES = {
"1_input_reception": {
"description": "Raw 輸入 received from client",
"components": ["API gateway", "Load balancer", "WAF"],
"attack_surfaces": [
"Header injection",
"Oversized payload",
"Encoding manipulation (UTF-8, Unicode)",
"Content-type confusion"
],
"controls": ["Rate limiting", "Size limits", "WAF rules"]
},
"2_authentication": {
"description": "User identity verified",
"components": ["Auth service", "Token validation"],
"attack_surfaces": [
"Token theft or forgery",
"Session hijacking",
"Privilege escalation",
"API key leakage"
],
"controls": ["OAuth 2.0", "API key rotation", "Session management"]
},
"3_input_preprocessing": {
"description": "輸入 cleaned and formatted for model consumption",
"components": ["輸入 sanitizer", "Prompt template engine"],
"attack_surfaces": [
"Filter bypass via encoding",
"Template injection",
"輸入 truncation manipulation",
"Unicode normalization attacks"
],
"controls": ["Blocklist filtering", "輸入 length limits"]
},
"4_context_assembly": {
"description": "系統提示詞 + 使用者輸入 + retrieved context assembled",
"components": ["Prompt builder", "RAG retriever", "History manager"],
"attack_surfaces": [
"System 提示詞注入 point",
"RAG 投毒 via retrieved documents",
"History manipulation",
"Context window overflow",
"Instruction hierarchy confusion"
],
"controls": ["Prompt hardening", "Retrieval filtering"]
},
"5_model_inference": {
"description": "Assembled prompt sent to LLM for 推論",
"components": ["Model API", "Inference endpoint"],
"attack_surfaces": [
"Jailbreaking via crafted prompts",
"Model extraction via systematic querying",
"訓練資料 extraction",
"Logprob-based information leakage"
],
"controls": ["安全 訓練", "Content filter", "Temperature settings"]
},
"6_tool_execution": {
"description": "Model-requested tool/function calls executed",
"components": ["Function router", "Tool executors", "Result handler"],
"attack_surfaces": [
"Unauthorized function invocation",
"Parameter injection (SQL, command, path)",
"SSRF through URL parameters",
"Chained tool abuse",
"Result manipulation"
],
"controls": ["Function allow-list", "Parameter validation"]
},
"7_output_processing": {
"description": "Model 輸出 filtered and formatted for delivery",
"components": ["輸出 filter", "PII detector", "Formatter"],
"attack_surfaces": [
"Filter bypass via encoding or formatting",
"PII detector evasion",
"輸出 injection into downstream systems",
"Markdown/HTML injection in rendered 輸出"
],
"controls": ["Content moderation", "PII redaction", "輸出 sanitization"]
},
"8_response_delivery": {
"description": "Processed response returned to client",
"components": ["API response handler", "Streaming handler"],
"attack_surfaces": [
"Response interception",
"Streaming partial response exposure",
"Cache 投毒",
"Response header manipulation"
],
"controls": ["TLS", "Response headers", "Cache controls"]
}
}Data Flow Diagram
Document how data flows between pipeline stages, noting where data format changes occur (format changes are common 漏洞 points):
User 輸入 (text)
│
▼
[輸入 Filter] ──── Blocked inputs logged
│
▼
User 輸入 (sanitized text)
│
├──▶ [RAG Retriever] ──── Query (嵌入向量) ──▶ Vector DB
│ │ │
│ ◀──── Retrieved chunks (text) ───────────┘
│
▼
[Prompt Builder] ◀── System Prompt (template)
│ Conversation History (text)
│ Retrieved Context (text)
│ User Profile (structured → text)
▼
Assembled Prompt (text)
│
▼
[LLM API] ──── Model Response (text + tool calls)
│
├──▶ [Tool Router] ──── Function Call (JSON)
│ │ │
│ │ [External API]
│ │ │
│ ◀──── Tool Result (JSON → text)
│
▼
[輸出 Filter] ──── Blocked outputs logged
│
▼
[PII Redactor]
│
▼
Response to User (text)
Step 3: 識別 Trust Boundary Crossings
Every time data crosses a trust boundary, 存在 an opportunity for attack. Map each crossing and the controls (or lack thereof) at each boundary.
Trust Boundary Analysis
# Trust Boundary Crossing Inventory
| ID | Boundary | Data Crossing | Direction | Controls | Risk |
|----|----------|--------------|-----------|----------|------|
| TB-1 | User → Application | User message text | Inbound | 輸入 filter, rate limit | Medium |
| TB-2 | Application → Model | Assembled prompt | Outbound | None (trusted by model) | High |
| TB-3 | RAG Store → Model | Retrieved documents | Inbound | Relevance filtering only | High |
| TB-4 | Model → Tool Layer | Function call + params | Outbound | Allow-list check | High |
| TB-5 | Tool Layer → External API | API request | Outbound | Auth, URL validation | Medium |
| TB-6 | External API → Tool Layer | API response | Inbound | Response validation | Medium |
| TB-7 | Tool Layer → Model | Tool results | Inbound | Result formatting | High |
| TB-8 | Model → 輸出 Filter | Raw model 輸出 | Outbound | Content moderation | Medium |
| TB-9 | 輸出 Filter → User | Filtered response | Outbound | PII redaction | Low |The highest-risk boundaries are those where controls are weakest or absent. In many AI systems, the boundary between the application and 模型 (TB-2) has no controls at all since the assembled prompt is passed directly to 模型 with full trust.
Step 4: Map External Dependencies and Integrations
External dependencies expand the 攻擊面 beyond the application boundary.
Integration Inventory
# External Integration Inventory
| Integration | Purpose | Data Exchanged | Auth Method | Failure Mode |
|------------|---------|---------------|-------------|--------------|
| OpenAI API | Model 推論 | Prompts, responses | API key | 503, rate limit |
| Pinecone | Vector storage | Embeddings, queries | API key | Timeout, stale data |
| PostgreSQL | User data, config | Structured queries | Connection string | Query failure |
| Stripe API | Payment processing | Transaction data | API key | Payment failure |
| SendGrid | Email delivery | Email content | API key | Delivery failure |
| Google Search | Web search tool | Search queries, results | API key | No results |
| GitHub API | Code analysis tool | Repository content | OAuth 符元 | Rate limit |
## Dependency Risk 評估
| Integration | Compromise Impact | Supply Chain Risk | 監控 |
|------------|------------------|------------------|------------|
| OpenAI API | Full model behavior exposure | Medium (provider risk) | API logs |
| Pinecone | 知識庫 manipulation | Low | Query logs |
| PostgreSQL | Data exfiltration, modification | Low (self-hosted) | DB audit log |Step 5: Document API Endpoints and Parameters
對每個 API endpoint, document every parameter and its potential for abuse.
Endpoint Mapping Template
# API Endpoint Inventory
## POST /api/v1/chat
| Parameter | Type | Required | Validation | 攻擊 Potential |
|-----------|------|----------|-----------|-----------------|
| message | string | Yes | 4096 char limit | Prompt injection |
| conversation_id | UUID | No | Format check | Session hijacking |
| model | string | No | Enum validation | Model switching |
| temperature | float | No | 0.0-2.0 range | 輸出 manipulation |
| tools_enabled | boolean | No | Boolean check | 工具使用 bypass |
| stream | boolean | No | Boolean check | Partial response exposure |
## POST /api/v1/upload
| Parameter | Type | Required | Validation | 攻擊 Potential |
|-----------|------|----------|-----------|-----------------|
| file | binary | Yes | Type + size check | Malicious file processing |
| purpose | string | Yes | Enum validation | Processing path manipulation |
| extract_text | boolean | No | Boolean check | OCR injection |
## GET /api/v1/conversations/{id}
| Parameter | Type | Required | Validation | 攻擊 Potential |
|-----------|------|----------|-----------|-----------------|
| id | UUID | Yes | Format + ownership | IDOR (other user conversations) |
| include_system | boolean | No | Boolean check | 系統提示詞 exposure |
| format | string | No | Enum validation | Response format manipulation |Step 6: Compile the 攻擊 Surface Map
Combine all findings into a structured 攻擊面 map document.
攻擊 Surface 總結 Matrix
| 攻擊 Surface Category | Count | Critical | High | Medium | Low |
|---|---|---|---|---|---|
| Direct 輸入 vectors | 8 | 2 | 3 | 2 | 1 |
| Indirect 輸入 vectors | 8 | 3 | 3 | 2 | 0 |
| Hidden 輸入 vectors | 4 | 0 | 2 | 1 | 1 |
| Pipeline stages | 8 | 2 | 3 | 2 | 1 |
| Trust boundary crossings | 9 | 3 | 3 | 2 | 1 |
| External integrations | 7 | 1 | 2 | 3 | 1 |
| API endpoints/parameters | 15 | 2 | 5 | 5 | 3 |
| Total | 59 | 13 | 21 | 17 | 8 |
Priority 測試 Targets
Based on the 攻擊面 analysis, prioritize 測試 targets:
# 測試 Priority List
## Priority 1 (測試 First)
1. Prompt injection through primary chat 輸入 (DI-1)
2. RAG document injection (II-1) -- if 知識庫 is writable
3. 系統提示詞 extraction via chat (DI-1 → Stage 4)
4. Function calling parameter injection (Stage 6, TB-4)
5. Cross-tenant data leakage (TB-3, if multi-tenant)
## Priority 2 (測試 Second)
6. Indirect injection via web search results (II-2)
7. File upload processing manipulation (DI-2)
8. 輸出 filter bypass (Stage 7, TB-8)
9. Conversation history manipulation (DI-6)
10. API parameter abuse (endpoints inventory)
## Priority 3 (測試 If Time Permits)
11. Hidden 輸入 vectors (HTTP headers, cookies)
12. Model extraction through systematic querying
13. Cache 投毒 (Stage 8)
14. External API response manipulation (TB-6)
15. Error message information disclosureCommon 攻擊 Surface Mapping Mistakes
-
Only mapping direct inputs. The most impactful AI attacks come through indirect channels: RAG documents, web search results, email content, and other data sources 模型 reads. If your 攻擊面 map only covers the chat 輸入, you are missing the majority of the 攻擊面.
-
Ignoring the prompt assembly stage. How 系統 constructs the final prompt from multiple sources (系統提示詞, 使用者輸入, retrieved context, conversation history) creates composition 漏洞. Two individually safe inputs can become dangerous when combined.
-
Treating 模型 as a single component. 模型 推論 stage has multiple sub-surfaces: the prompt interpretation, the 安全 對齊, the content policy, and the 工具使用 decision logic. Each can be attacked independently.
-
Not mapping failure modes. What happens when a component fails? Does 系統 fall back to an unfiltered mode? Does it expose error messages that reveal system internals? Failure modes are attack surfaces too.
-
Mapping once and never updating. 攻擊 surfaces change when new features are added, integrations change, or models are updated. Schedule periodic 攻擊面 reviews, especially after significant system changes.
Which type of 攻擊面 is most commonly overlooked in AI system assessments?
相關主題
- AI Threat Model Creation -- Using the 攻擊面 map to build threat models
- 測試 Plan Development -- Converting 攻擊面 priorities into 測試 plans
- Mapping to MITRE ATLAS -- Mapping attack surfaces to ATLAS techniques
- Reconnaissance Workflow -- Detailed reconnaissance procedures for discovering attack surfaces