Mapping van het AI-aanvalsoppervlak
Systematische methodologie om alle aanvalsvectoren in AI-systemen te identificeren: inputkanalen, datastromen, tool-integraties en trust boundaries.
Mapping van het AI-aanvalsoppervlak
Attack surface mapping voor AI-systemen is het proces waarbij je elk interactiepunt tussen een aanvaller en de AI-deployment identificeert. Anders dan bij traditionele aanvalsoppervlakken van webapplicaties, die zich richten op HTTP-endpoints en invoervelden, omvatten AI-aanvalsoppervlakken semantische kanalen (natuurlijketaal-input die het gedrag van het model beïnvloedt), datakanalen (documenten, kennisbanken en trainingsdata) en capability-kanalen (tools en API's die het model kan aanroepen).
Het AI-aanvalsoppervlakmodel
AI-systemen hebben aanvalsoppervlak in zes dimensies:
AI Attack Surface Dimensions:
1. Input Channels
├── Direct user input (chat, API)
├── Indirect data (documents, emails, web pages)
├── Multimodal inputs (images, audio, video)
└── System configuration (prompts, parameters)
2. Data Channels
├── RAG knowledge bases
├── Training/fine-tuning data
├── User conversation history
└── External data sources
3. Capability Channels
├── Tool/function calling
├── Code execution
├── API integrations
└── File system access
4. Model Internals
├── Weights and activations
├── Tokenizer
├── Embedding space
└── Attention mechanics
5. Infrastructure
├── API endpoints
├── Authentication/authorization
├── Rate limiting
└── Logging and monitoring
6. Human Channels
├── Operators and administrators
├── Content moderators
├── End users (social engineering)
└── Developer team
Mapping-methodologie
Fase 1: het systeem ontleden
Splits het AI-systeem op in zijn onderdelen:
class AISystemDecomposition:
"""Ontleed een AI-systeem in in kaart te brengen componenten."""
def __init__(self, system_name):
self.system = system_name
self.components = {
"inputs": [],
"processing": [],
"outputs": [],
"data_stores": [],
"integrations": [],
"humans": []
}
def map_input_channels(self):
"""Identificeer alle inputkanalen naar het AI-systeem."""
channels = [
{
"name": "User chat interface",
"type": "direct",
"format": "natural language",
"attacker_controllable": True,
"filtered": True # Heeft een contentfilter
},
{
"name": "Document upload",
"type": "indirect",
"format": "PDF, DOCX, TXT",
"attacker_controllable": True,
"filtered": False # Geen injectiefilter op uploads
},
{
"name": "RAG knowledge base",
"type": "indirect",
"format": "text chunks",
"attacker_controllable": "partially",
"filtered": False
},
{
"name": "System prompt",
"type": "configuration",
"format": "text",
"attacker_controllable": False,
"filtered": False # Vertrouwde input
}
]
self.components["inputs"] = channels
return channelsFase 2: analyse van trust boundaries
Bepaal waar data trust boundaries overschrijdt:
def map_trust_boundaries(system_components):
"""
Identificeer trust boundaries in het AI-systeem.
Elke overgang van een boundary is een potentieel aanvalspunt.
"""
boundaries = []
# Boundary 1: Extern -> Applicatie
boundaries.append({
"name": "User Input Boundary",
"from_zone": "untrusted (external)",
"to_zone": "application layer",
"data_crossing": ["user messages", "uploaded files"],
"controls": ["input validation", "content filter"],
"attack_types": ["prompt injection", "file upload attacks"]
})
# Boundary 2: Applicatie -> Model
boundaries.append({
"name": "Model Context Boundary",
"from_zone": "application layer",
"to_zone": "model context window",
"data_crossing": ["system prompt", "user message",
"retrieved documents", "conversation history"],
"controls": ["instruction hierarchy"],
"attack_types": ["indirect injection via retrieved docs",
"context window manipulation"]
})
# Boundary 3: Model -> Tools
boundaries.append({
"name": "Tool Execution Boundary",
"from_zone": "model decisions",
"to_zone": "external systems",
"data_crossing": ["tool calls", "API requests",
"database queries", "file operations"],
"controls": ["tool allowlisting", "parameter validation"],
"attack_types": ["tool abuse", "data exfiltration",
"privilege escalation"]
})
# Boundary 4: Data Store -> Applicatie
boundaries.append({
"name": "Data Retrieval Boundary",
"from_zone": "data stores",
"to_zone": "model context",
"data_crossing": ["knowledge base chunks",
"user history", "cached responses"],
"controls": ["access control", "content sanitization"],
"attack_types": ["knowledge base poisoning",
"data extraction"]
})
return boundariesFase 3: datastromen in kaart brengen
Volg hoe data door het systeem beweegt:
def map_data_flows(system):
"""Breng alle datastromen door het AI-systeem in kaart."""
flows = []
# Stroom van gebruikersinput
flows.append({
"name": "User Query Processing",
"path": [
"User Input",
"Input Filter",
"Query Embedding",
"RAG Retrieval",
"Context Assembly",
"Model Inference",
"Output Filter",
"Response Delivery"
],
"sensitive_data": ["user query (may contain PII)"],
"injection_points": ["User Input", "RAG Retrieval"]
})
# RAG-retrievalstroom
flows.append({
"name": "Knowledge Retrieval",
"path": [
"Query Embedding",
"Vector Store Search",
"Chunk Retrieval",
"Re-ranking",
"Context Insertion"
],
"sensitive_data": ["knowledge base content"],
"injection_points": ["Vector Store (if poisoned)",
"Context Insertion (if unfiltered)"]
})
# Stroom van tool-uitvoering
flows.append({
"name": "Tool Call Execution",
"path": [
"Model Decision",
"Tool Call Parsing",
"Parameter Validation",
"Tool Execution",
"Result Processing",
"Response Integration"
],
"sensitive_data": ["tool parameters", "tool results"],
"injection_points": ["Model Decision (via injection)",
"Parameter Validation (bypass)"]
})
return flowsFase 4: aanvalsvectoren prioriteren
def prioritize_attack_vectors(trust_boundaries, data_flows):
"""Prioriteer aanvalsvectoren op risico en toegankelijkheid."""
vectors = []
for boundary in trust_boundaries:
for attack_type in boundary["attack_types"]:
vectors.append({
"vector": attack_type,
"boundary": boundary["name"],
"accessibility": score_accessibility(boundary),
"impact": score_impact(attack_type),
"defense_coverage": score_defenses(boundary["controls"]),
"priority": None # Hieronder berekend
})
# Bereken de prioriteitsscore
for vector in vectors:
vector["priority"] = (
vector["accessibility"] * 0.3 +
vector["impact"] * 0.4 +
(1 - vector["defense_coverage"]) * 0.3
)
# Sorteer op prioriteit (hoogste eerst)
vectors.sort(key=lambda x: x["priority"], reverse=True)
return vectorsDe attack-surface-map opstellen
De uiteindelijke output is een gestructureerd document dat red team-operaties stuurt:
Attack Surface Map: [Target System Name]
Date: [Assessment Date]
1. SYSTEM OVERVIEW
- Architecture type: RAG-augmented chatbot with tool access
- Model: [identified or suspected]
- Infrastructure: [cloud provider, deployment type]
2. INPUT CHANNELS (ranked by exploitability)
Priority 1: Document upload (no injection filtering)
Priority 2: Direct chat input (basic content filter)
Priority 3: RAG knowledge base (limited write access)
3. TRUST BOUNDARIES
[Diagram showing boundary transitions]
4. DATA FLOWS
[Diagram showing sensitive data movement]
5. TOOL CAPABILITIES
- Email sending (high impact)
- Database queries (data exposure risk)
- File operations (limited to uploads directory)
6. RECOMMENDED ATTACK SEQUENCE
1. Indirect injection via document upload
2. System prompt extraction via chat
3. Tool abuse for data exfiltration
4. Knowledge base poisoning for persistence
Gerelateerde onderwerpen
- Target Profiling — Eerste verkenning
- Model Identification — Het model identificeren
- API Enumeration — API-capaciteiten ontdekken
- Defense Taxonomy — Begrijpen welke verdedigingen je kunt verwachten
Een AI-systeem accepteert gebruikerschat (gefilterd), verwerkt geüploade documenten (ongefilterd) en heeft toegang tot tools om e-mail te versturen. Welk aanvalspad moet een red team prioriteren?
Referenties
- OWASP, "Attack Surface Analysis Cheat Sheet" (2023)
- MITRE, "ATLAS: Adversarial Threat Landscape for AI Systems" (2023)
- Microsoft, "Threat Modeling for AI/ML Systems" (2023)