AI System Architecture for Red Teamers
How AI systems are deployed in production — model API, prompt templates, orchestration, tools, memory, and guardrails — with attack surface analysis at each layer.
Beyond the Model: AI Systems in Production
When red teaming AI, you are almost never attacking a bare model. Production AI systems are layered architectures with multiple components, each introducing its own attack surface. Understanding this architecture is step one of any engagement.
A typical production AI system includes:
┌─────────────────────────────────────────────────┐
│ User Interface │
├─────────────────────────────────────────────────┤
│ Input Guardrails │
├─────────────────────────────────────────────────┤
│ Orchestration Layer │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Prompt │ │ Memory │ │ Tool/API │ │
│ │ Template │ │ Store │ │ Connectors │ │
│ └──────────┘ └──────────┘ └──────────────┘ │
├─────────────────────────────────────────────────┤
│ Model API │
├─────────────────────────────────────────────────┤
│ Output Guardrails │
├─────────────────────────────────────────────────┤
│ User Interface │
└─────────────────────────────────────────────────┘
Component Breakdown
1. Model API
The LLM itself, accessed via API (OpenAI, Anthropic, local deployment, etc.).
| Aspect | Details |
|---|---|
| What it does | Generates text based on prompt input |
| Trust level | High — treated as the "brain" of the system |
| Attack surface | Prompt injection, jailbreaking, parameter manipulation |
| Key parameters | model, temperature, max_tokens, system prompt |
See Anatomy of an LLM API Call for a deep dive.
2. Prompt Template
The system prompt and template that shapes model behavior for the specific use case.
SYSTEM_PROMPT = """You are a customer support agent for Acme Corp.
Rules:
- Only answer questions about Acme products
- Never reveal internal pricing formulas
- Always be polite and professional
- If unsure, escalate to a human agent
Context: {retrieved_context}
"""| Aspect | Details |
|---|---|
| What it does | Defines the model's role, constraints, and available context |
| Trust level | Developer-controlled, should be treated as confidential |
| Attack surface | System prompt extraction, instruction override, template injection |
3. Orchestration Layer
The application logic that coordinates between the model, tools, memory, and user interface. Frameworks like LangChain, LlamaIndex, and custom code.
| Aspect | Details |
|---|---|
| What it does | Routes requests, manages conversation flow, handles tool calls |
| Trust level | Application code — varies by implementation quality |
| Attack surface | Logic bugs, improper input validation, state manipulation |
4. Tool Connectors
External tools and APIs the model can invoke — databases, web search, code execution, file systems, third-party APIs.
| Aspect | Details |
|---|---|
| What it does | Extends model capabilities with real-world actions |
| Trust level | HIGH RISK — tools can have side effects (write data, send emails, execute code) |
| Attack surface | Tool invocation manipulation, parameter injection, privilege escalation |
5. Memory / State
Conversation history, user preferences, session state, and long-term memory stores.
| Aspect | Details |
|---|---|
| What it does | Maintains context across interactions |
| Trust level | Contains prior model outputs and user data — mixed trust |
| Attack surface | Memory poisoning, context manipulation, cross-session attacks |
6. Guardrails
Input and output filters that enforce safety and business rules.
| Type | Placement | Techniques |
|---|---|---|
| Input guardrails | Before model | Content classifiers, keyword filters, prompt injection detectors |
| Output guardrails | After model | Toxicity filters, PII redaction, format validation |
| Structural guardrails | Orchestration | Rate limiting, output length limits, tool call validation |
Trust Boundaries
A trust boundary exists wherever data moves between components with different trust levels:
| Boundary | From → To | Key Risk |
|---|---|---|
| User → System | Untrusted → Trusted | Prompt injection |
| RAG retrieval → Prompt | Semi-trusted → Trusted | Indirect injection |
| Model → Tool call | Model-generated → Executed | Arbitrary tool invocation |
| Tool result → Model | External data → Trusted context | Result injection |
| Memory → Prompt | Stored data → Active context | Persistent injection |
Quick-Reference: Attack Surface by Layer
| Layer | Primary Attacks | Impact |
|---|---|---|
| User interface | Input crafting, encoding tricks | Low — filtered by downstream layers |
| Input guardrails | Filter bypass, evasion | Medium — gains access to model |
| Prompt template | System prompt extraction, override | Medium — changes model behavior |
| Orchestration | Logic exploitation, state manipulation | High — can alter control flow |
| Model API | Jailbreaking, prompt injection | High — controls generated output |
| Tool connectors | Parameter injection, unauthorized calls | Critical — real-world side effects |
| Memory | Poisoning, cross-session injection | High — persistent compromise |
| Output guardrails | Output manipulation, encoding bypass | Medium — evades safety filters |
Related Topics
- Anatomy of an LLM API Call — deep dive into the model interface layer
- Agent Architectures & Tool Use Patterns — orchestration patterns and tool use
- Common AI Deployment Patterns — how these components are configured for different use cases
- Lab: Mapping an AI System's Attack Surface — hands-on practice
References
- "OWASP Top 10 for LLM Applications" - OWASP (2025) - Industry-standard classification of LLM application security risks including insecure plugin design and excessive agency
- "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" - Greshake et al. (2023) - Research demonstrating how trust boundaries in AI systems can be exploited through indirect injection
- "Architectural Risk Analysis of Large Language Models" - Trail of Bits (2024) - Systematic analysis of attack surfaces across production AI system architectures
- "The Dual LLM Pattern for Building AI Assistants That Can Resist Prompt Injection" - Simon Willison (2023) - Practical architectural pattern for separating trusted and untrusted contexts in AI systems
Which component in a production AI system typically has the highest-impact attack surface?