AI Attack Surface Mapping

intermediate7 min readUpdated 2026-03-15

Systematic methodology for identifying all attack vectors in AI systems: input channels, data flows, tool integrations, and trust boundaries.

attack-surface mapping methodology threat-modeling vectors

AI Attack Surface Mapping

Attack surface mapping for AI systems is the process of identifying every point of interaction between an attacker and the AI deployment. Unlike traditional web application attack surfaces that focus on HTTP endpoints and input fields, AI attack surfaces include semantic channels (natural language inputs that influence model behavior), data channels (documents, knowledge bases, and training data), and capability channels (tools and APIs that the model can invoke).

The AI Attack Surface Model

AI systems have attack surfaces across six dimensions:

AI Attack Surface Dimensions:

1. Input Channels
   ├── Direct user input (chat, API)
   ├── Indirect data (documents, emails, web pages)
   ├── Multimodal inputs (images, audio, video)
   └── System configuration (prompts, parameters)

2. Data Channels
   ├── RAG knowledge bases
   ├── Training/fine-tuning data
   ├── User conversation history
   └── External data sources

3. Capability Channels
   ├── Tool/function calling
   ├── Code execution
   ├── API integrations
   └── File system access

4. Model Internals
   ├── Weights and activations
   ├── Tokenizer
   ├── Embedding space
   └── Attention mechanics

5. Infrastructure
   ├── API endpoints
   ├── Authentication/authorization
   ├── Rate limiting
   └── Logging and monitoring

6. Human Channels
   ├── Operators and administrators
   ├── Content moderators
   ├── End users (social engineering)
   └── Developer team

Mapping Methodology

Phase 1: System Decomposition

Break the AI system into its component parts:

class AISystemDecomposition:
    """Decompose an AI system into mappable components."""
 
    def __init__(self, system_name):
        self.system = system_name
        self.components = {
            "inputs": [],
            "processing": [],
            "outputs": [],
            "data_stores": [],
            "integrations": [],
            "humans": []
        }
 
    def map_input_channels(self):
        """Identify all input channels to the AI system."""
        channels = [
            {
                "name": "User chat interface",
                "type": "direct",
                "format": "natural language",
                "attacker_controllable": True,
                "filtered": True  # Has content filter
            },
            {
                "name": "Document upload",
                "type": "indirect",
                "format": "PDF, DOCX, TXT",
                "attacker_controllable": True,
                "filtered": False  # No injection filter on uploads
            },
            {
                "name": "RAG knowledge base",
                "type": "indirect",
                "format": "text chunks",
                "attacker_controllable": "partially",
                "filtered": False
            },
            {
                "name": "System prompt",
                "type": "configuration",
                "format": "text",
                "attacker_controllable": False,
                "filtered": False  # Trusted input
            }
        ]
 
        self.components["inputs"] = channels
        return channels

Phase 2: Trust Boundary Analysis

Identify where data crosses trust boundaries:

def map_trust_boundaries(system_components):
    """
    Identify trust boundaries in the AI system.
    Each boundary transition is a potential attack point.
    """
    boundaries = []
 
    # Boundary 1: External -> Application
    boundaries.append({
        "name": "User Input Boundary",
        "from_zone": "untrusted (external)",
        "to_zone": "application layer",
        "data_crossing": ["user messages", "uploaded files"],
        "controls": ["input validation", "content filter"],
        "attack_types": ["prompt injection", "file upload attacks"]
    })
 
    # Boundary 2: Application -> Model
    boundaries.append({
        "name": "Model Context Boundary",
        "from_zone": "application layer",
        "to_zone": "model context window",
        "data_crossing": ["system prompt", "user message",
                          "retrieved documents", "conversation history"],
        "controls": ["instruction hierarchy"],
        "attack_types": ["indirect injection via retrieved docs",
                         "context window manipulation"]
    })
 
    # Boundary 3: Model -> Tools
    boundaries.append({
        "name": "Tool Execution Boundary",
        "from_zone": "model decisions",
        "to_zone": "external systems",
        "data_crossing": ["tool calls", "API requests",
                          "database queries", "file operations"],
        "controls": ["tool allowlisting", "parameter validation"],
        "attack_types": ["tool abuse", "data exfiltration",
                         "privilege escalation"]
    })
 
    # Boundary 4: Data Store -> Application
    boundaries.append({
        "name": "Data Retrieval Boundary",
        "from_zone": "data stores",
        "to_zone": "model context",
        "data_crossing": ["knowledge base chunks",
                          "user history", "cached responses"],
        "controls": ["access control", "content sanitization"],
        "attack_types": ["knowledge base poisoning",
                         "data extraction"]
    })
 
    return boundaries

Phase 3: Data Flow Mapping

Trace how data moves through the system:

def map_data_flows(system):
    """Map all data flows through the AI system."""
    flows = []
 
    # User input flow
    flows.append({
        "name": "User Query Processing",
        "path": [
            "User Input",
            "Input Filter",
            "Query Embedding",
            "RAG Retrieval",
            "Context Assembly",
            "Model Inference",
            "Output Filter",
            "Response Delivery"
        ],
        "sensitive_data": ["user query (may contain PII)"],
        "injection_points": ["User Input", "RAG Retrieval"]
    })
 
    # RAG retrieval flow
    flows.append({
        "name": "Knowledge Retrieval",
        "path": [
            "Query Embedding",
            "Vector Store Search",
            "Chunk Retrieval",
            "Re-ranking",
            "Context Insertion"
        ],
        "sensitive_data": ["knowledge base content"],
        "injection_points": ["Vector Store (if poisoned)",
                              "Context Insertion (if unfiltered)"]
    })
 
    # Tool execution flow
    flows.append({
        "name": "Tool Call Execution",
        "path": [
            "Model Decision",
            "Tool Call Parsing",
            "Parameter Validation",
            "Tool Execution",
            "Result Processing",
            "Response Integration"
        ],
        "sensitive_data": ["tool parameters", "tool results"],
        "injection_points": ["Model Decision (via injection)",
                              "Parameter Validation (bypass)"]
    })
 
    return flows

Phase 4: Attack Vector Prioritization

def prioritize_attack_vectors(trust_boundaries, data_flows):
    """Prioritize attack vectors by risk and accessibility."""
    vectors = []
 
    for boundary in trust_boundaries:
        for attack_type in boundary["attack_types"]:
            vectors.append({
                "vector": attack_type,
                "boundary": boundary["name"],
                "accessibility": score_accessibility(boundary),
                "impact": score_impact(attack_type),
                "defense_coverage": score_defenses(boundary["controls"]),
                "priority": None  # Calculated below
            })
 
    # Calculate priority score
    for vector in vectors:
        vector["priority"] = (
            vector["accessibility"] * 0.3 +
            vector["impact"] * 0.4 +
            (1 - vector["defense_coverage"]) * 0.3
        )
 
    # Sort by priority (highest first)
    vectors.sort(key=lambda x: x["priority"], reverse=True)
 
    return vectors

Producing the Attack Surface Map

The final output is a structured document that guides red team operations:

Attack Surface Map: [Target System Name]
Date: [Assessment Date]

1. SYSTEM OVERVIEW
   - Architecture type: RAG-augmented chatbot with tool access
   - Model: [identified or suspected]
   - Infrastructure: [cloud provider, deployment type]

2. INPUT CHANNELS (ranked by exploitability)
   Priority 1: Document upload (no injection filtering)
   Priority 2: Direct chat input (basic content filter)
   Priority 3: RAG knowledge base (limited write access)

3. TRUST BOUNDARIES
   [Diagram showing boundary transitions]

4. DATA FLOWS
   [Diagram showing sensitive data movement]

5. TOOL CAPABILITIES
   - Email sending (high impact)
   - Database queries (data exposure risk)
   - File operations (limited to uploads directory)

6. RECOMMENDED ATTACK SEQUENCE
   1. Indirect injection via document upload
   2. System prompt extraction via chat
   3. Tool abuse for data exfiltration
   4. Knowledge base poisoning for persistence

Target Profiling — Initial reconnaissance
Model Identification — Identifying the model
API Enumeration — Discovering API capabilities
Defense Taxonomy — Understanding what defenses to expect

Knowledge Check

An AI system accepts user chat (filtered), processes uploaded documents (unfiltered), and has access to email-sending tools. Which attack path should a red team prioritize?

References

OWASP, "Attack Surface Analysis Cheat Sheet" (2023)
MITRE, "ATLAS: Adversarial Threat Landscape for AI Systems" (2023)
Microsoft, "Threat Modeling for AI/ML Systems" (2023)

Edit this page on GitHub

AI Attack Surface Mapping

intermediate7 min readUpdated 2026-03-15

Systematic methodology for identifying all attack vectors in AI systems: input channels, data flows, tool integrations, and trust boundaries.

attack-surface mapping methodology threat-modeling vectors

AI Attack Surface Mapping

The AI Attack Surface Model

AI systems have attack surfaces across six dimensions:

AI Attack Surface Dimensions:

1. Input Channels
   ├── Direct user input (chat, API)
   ├── Indirect data (documents, emails, web pages)
   ├── Multimodal inputs (images, audio, video)
   └── System configuration (prompts, parameters)

2. Data Channels
   ├── RAG knowledge bases
   ├── Training/fine-tuning data
   ├── User conversation history
   └── External data sources

3. Capability Channels
   ├── Tool/function calling
   ├── Code execution
   ├── API integrations
   └── File system access

4. Model Internals
   ├── Weights and activations
   ├── Tokenizer
   ├── Embedding space
   └── Attention mechanics

5. Infrastructure
   ├── API endpoints
   ├── Authentication/authorization
   ├── Rate limiting
   └── Logging and monitoring

6. Human Channels
   ├── Operators and administrators
   ├── Content moderators
   ├── End users (social engineering)
   └── Developer team

Mapping Methodology

Phase 1: System Decomposition

Break the AI system into its component parts:

class AISystemDecomposition:
    """Decompose an AI system into mappable components."""
 
    def __init__(self, system_name):
        self.system = system_name
        self.components = {
            "inputs": [],
            "processing": [],
            "outputs": [],
            "data_stores": [],
            "integrations": [],
            "humans": []
        }
 
    def map_input_channels(self):
        """Identify all input channels to the AI system."""
        channels = [
            {
                "name": "User chat interface",
                "type": "direct",
                "format": "natural language",
                "attacker_controllable": True,
                "filtered": True  # Has content filter
            },
            {
                "name": "Document upload",
                "type": "indirect",
                "format": "PDF, DOCX, TXT",
                "attacker_controllable": True,
                "filtered": False  # No injection filter on uploads
            },
            {
                "name": "RAG knowledge base",
                "type": "indirect",
                "format": "text chunks",
                "attacker_controllable": "partially",
                "filtered": False
            },
            {
                "name": "System prompt",
                "type": "configuration",
                "format": "text",
                "attacker_controllable": False,
                "filtered": False  # Trusted input
            }
        ]
 
        self.components["inputs"] = channels
        return channels

Phase 2: Trust Boundary Analysis

Identify where data crosses trust boundaries:

def map_trust_boundaries(system_components):
    """
    Identify trust boundaries in the AI system.
    Each boundary transition is a potential attack point.
    """
    boundaries = []
 
    # Boundary 1: External -> Application
    boundaries.append({
        "name": "User Input Boundary",
        "from_zone": "untrusted (external)",
        "to_zone": "application layer",
        "data_crossing": ["user messages", "uploaded files"],
        "controls": ["input validation", "content filter"],
        "attack_types": ["prompt injection", "file upload attacks"]
    })
 
    # Boundary 2: Application -> Model
    boundaries.append({
        "name": "Model Context Boundary",
        "from_zone": "application layer",
        "to_zone": "model context window",
        "data_crossing": ["system prompt", "user message",
                          "retrieved documents", "conversation history"],
        "controls": ["instruction hierarchy"],
        "attack_types": ["indirect injection via retrieved docs",
                         "context window manipulation"]
    })
 
    # Boundary 3: Model -> Tools
    boundaries.append({
        "name": "Tool Execution Boundary",
        "from_zone": "model decisions",
        "to_zone": "external systems",
        "data_crossing": ["tool calls", "API requests",
                          "database queries", "file operations"],
        "controls": ["tool allowlisting", "parameter validation"],
        "attack_types": ["tool abuse", "data exfiltration",
                         "privilege escalation"]
    })
 
    # Boundary 4: Data Store -> Application
    boundaries.append({
        "name": "Data Retrieval Boundary",
        "from_zone": "data stores",
        "to_zone": "model context",
        "data_crossing": ["knowledge base chunks",
                          "user history", "cached responses"],
        "controls": ["access control", "content sanitization"],
        "attack_types": ["knowledge base poisoning",
                         "data extraction"]
    })
 
    return boundaries

Phase 3: Data Flow Mapping

Trace how data moves through the system:

def map_data_flows(system):
    """Map all data flows through the AI system."""
    flows = []
 
    # User input flow
    flows.append({
        "name": "User Query Processing",
        "path": [
            "User Input",
            "Input Filter",
            "Query Embedding",
            "RAG Retrieval",
            "Context Assembly",
            "Model Inference",
            "Output Filter",
            "Response Delivery"
        ],
        "sensitive_data": ["user query (may contain PII)"],
        "injection_points": ["User Input", "RAG Retrieval"]
    })
 
    # RAG retrieval flow
    flows.append({
        "name": "Knowledge Retrieval",
        "path": [
            "Query Embedding",
            "Vector Store Search",
            "Chunk Retrieval",
            "Re-ranking",
            "Context Insertion"
        ],
        "sensitive_data": ["knowledge base content"],
        "injection_points": ["Vector Store (if poisoned)",
                              "Context Insertion (if unfiltered)"]
    })
 
    # Tool execution flow
    flows.append({
        "name": "Tool Call Execution",
        "path": [
            "Model Decision",
            "Tool Call Parsing",
            "Parameter Validation",
            "Tool Execution",
            "Result Processing",
            "Response Integration"
        ],
        "sensitive_data": ["tool parameters", "tool results"],
        "injection_points": ["Model Decision (via injection)",
                              "Parameter Validation (bypass)"]
    })
 
    return flows

Phase 4: Attack Vector Prioritization

def prioritize_attack_vectors(trust_boundaries, data_flows):
    """Prioritize attack vectors by risk and accessibility."""
    vectors = []
 
    for boundary in trust_boundaries:
        for attack_type in boundary["attack_types"]:
            vectors.append({
                "vector": attack_type,
                "boundary": boundary["name"],
                "accessibility": score_accessibility(boundary),
                "impact": score_impact(attack_type),
                "defense_coverage": score_defenses(boundary["controls"]),
                "priority": None  # Calculated below
            })
 
    # Calculate priority score
    for vector in vectors:
        vector["priority"] = (
            vector["accessibility"] * 0.3 +
            vector["impact"] * 0.4 +
            (1 - vector["defense_coverage"]) * 0.3
        )
 
    # Sort by priority (highest first)
    vectors.sort(key=lambda x: x["priority"], reverse=True)
 
    return vectors

Producing the Attack Surface Map

The final output is a structured document that guides red team operations:

Attack Surface Map: [Target System Name]
Date: [Assessment Date]

1. SYSTEM OVERVIEW
   - Architecture type: RAG-augmented chatbot with tool access
   - Model: [identified or suspected]
   - Infrastructure: [cloud provider, deployment type]

2. INPUT CHANNELS (ranked by exploitability)
   Priority 1: Document upload (no injection filtering)
   Priority 2: Direct chat input (basic content filter)
   Priority 3: RAG knowledge base (limited write access)

3. TRUST BOUNDARIES
   [Diagram showing boundary transitions]

4. DATA FLOWS
   [Diagram showing sensitive data movement]

5. TOOL CAPABILITIES
   - Email sending (high impact)
   - Database queries (data exposure risk)
   - File operations (limited to uploads directory)

6. RECOMMENDED ATTACK SEQUENCE
   1. Indirect injection via document upload
   2. System prompt extraction via chat
   3. Tool abuse for data exfiltration
   4. Knowledge base poisoning for persistence

Target Profiling — Initial reconnaissance
Model Identification — Identifying the model
API Enumeration — Discovering API capabilities
Defense Taxonomy — Understanding what defenses to expect

Knowledge Check

An AI system accepts user chat (filtered), processes uploaded documents (unfiltered), and has access to email-sending tools. Which attack path should a red team prioritize?

References

OWASP, "Attack Surface Analysis Cheat Sheet" (2023)
MITRE, "ATLAS: Adversarial Threat Landscape for AI Systems" (2023)
Microsoft, "Threat Modeling for AI/ML Systems" (2023)

Edit this page on GitHub

AI Attack Surface Mapping

Related articles

AI Attack Surface Mapping

Related articles