Defense-in-Depth for LLM Apps

advanced8 min readUpdated 2026-03-13

Layered defense strategy for AI applications covering network, application, model, and output layers, how each layer contributes, and why single-layer defense always fails.

defense-in-depth layered-defense architecture security-strategy llm-security

Defense-in-depth is the principle that no single security control should be trusted alone. For LLM applications, this means building multiple independent defense layers so that when one layer fails -- and it will -- others catch the threat.

The Four Defense Layers

Layer 1: Network and Infrastructure

This layer operates before any AI-specific processing occurs. It handles authentication, authorization, rate limiting, and transport security.

Control	Purpose	Attack It Mitigates
Authentication	Verify user identity	Anonymous abuse
Rate limiting	Cap requests per user/time	Automated attacks, DoS
API key rotation	Limit exposure from leaked keys	Credential theft
TLS/mTLS	Encrypt transport	Traffic interception
IP allowlisting	Restrict access by network	Unauthorized access
WAF rules	Block known attack patterns	Common web attacks

# Rate limiting middleware example
from datetime import datetime, timedelta
 
class RateLimiter:
    def __init__(self, max_requests: int = 60, window_seconds: int = 60):
        self.max_requests = max_requests
        self.window = timedelta(seconds=window_seconds)
        self.requests: dict[str, list[datetime]] = {}
 
    def check(self, user_id: str) -> bool:
        now = datetime.utcnow()
        user_requests = self.requests.get(user_id, [])
        # Remove expired entries
        user_requests = [r for r in user_requests if now - r < self.window]
        if len(user_requests) >= self.max_requests:
            return False
        user_requests.append(now)
        self.requests[user_id] = user_requests
        return True

What this layer cannot do: It cannot inspect content for semantic threats. A perfectly crafted jailbreak at 1 request per minute passes through all network-layer controls.

Layer 2: Application (Input Processing)

This layer analyzes and sanitizes inputs before they reach the model.

Control	Purpose	Attack It Mitigates
Input size limits	Prevent context window abuse	Token exhaustion, attention dilution
Prompt shield	Detect injection attempts	Direct and indirect injection
Content safety API	Flag harmful input content	Harmful request attempts
Input sanitization	Remove/escape special tokens	Delimiter escape, format mimicry
Session management	Track multi-turn behavior	Gradual escalation attacks

class InputProcessor:
    def __init__(self):
        self.prompt_shield = PromptShield()
        self.content_safety = ContentSafetyClient()
        self.max_input_tokens = 4096
 
    def process(self, user_input: str, session: Session) -> ProcessResult:
        # Size check
        if count_tokens(user_input) > self.max_input_tokens:
            return ProcessResult.blocked("Input exceeds maximum length")
 
        # Prompt injection detection
        if self.prompt_shield.is_injection(user_input):
            return ProcessResult.blocked("Input flagged as injection")
 
        # Content safety check
        safety = self.content_safety.analyze(user_input)
        if safety.max_severity > THRESHOLD:
            return ProcessResult.blocked(f"Content safety: {safety.category}")
 
        # Session-level behavior analysis
        session.add_message(user_input)
        if session.escalation_score > ESCALATION_THRESHOLD:
            return ProcessResult.blocked("Behavioral escalation detected")
 
        return ProcessResult.allowed(user_input)

What this layer cannot do: It cannot prevent the model from generating harmful content in response to benign-looking inputs, and it cannot detect attacks that the shield model has not been trained on.

Layer 3: Model (Inference Controls)

This layer constrains the model's behavior during generation.

Control	Purpose	Attack It Mitigates
System prompt hardening	Explicit behavioral constraints	Instruction override
Instruction hierarchy	Prioritize system over user instructions	Priority manipulation
Temperature limits	Reduce output randomness	Stochastic bypass
Token budget limits	Cap output length	Data exfiltration
Stop sequences	Halt generation at boundary markers	Runaway generation
Tool permission scoping	Restrict available tools	Tool abuse, privilege escalation

What this layer cannot do: It relies on the model following instructions, which is exactly what prompt injection attacks subvert. Model-layer controls are necessary but never sufficient alone.

Layer 4: Output (Post-Processing)

This layer analyzes and sanitizes the model's output before returning it to the user.

Control	Purpose	Attack It Mitigates
Content classifier	Detect harmful generated content	Jailbreak success
PII detection/redaction	Remove personal data from output	Data leakage
LLM judge	Nuanced policy evaluation	Subtle policy violations
Schema validation	Ensure output matches expected format	Injection via structured output
Citation verification	Validate referenced sources	Hallucinated citations
Code execution sandbox	Isolate generated code	Malicious code generation

Why Single-Layer Defense Fails

Consider this attack progression against a system with only input filtering:

Step	Attacker Action	Input Filter Result	Outcome
1	Send direct injection	Blocked	Defense holds
2	Paraphrase injection	Blocked	Defense holds
3	Use truncation bypass	Passes	No other layers to catch it
4	Model generates harmful output	No output filter	Harmful content delivered

Now the same attack against a defense-in-depth system:

Step	Attacker Action	Layer 1	Layer 2	Layer 3	Layer 4	Outcome
3	Truncation bypass	Passes	Passes	System prompt may resist	Output classifier catches harmful output	Defense holds

Even when one layer fails, the remaining layers maintain protection.

Defense-in-Depth Checklist

Use this checklist to evaluate an application's defense coverage:

Layer	Control	Present?	Notes
Network	Authentication required
Network	Rate limiting configured
Network	API keys rotated regularly
Application	Input size limits enforced
Application	Prompt shield deployed
Application	Content safety API active
Application	Multi-turn tracking enabled
Model	System prompt hardened
Model	Tool permissions scoped
Model	Output length limited
Output	Content classifier active
Output	PII detection/redaction
Output	Structured output validation
Cross-cutting	Logging and monitoring
Cross-cutting	Alerting on anomalies

References

"Defense in Depth: A Practical Strategy for Achieving Information Assurance" - NSA (2012) - Foundational document on layered defense strategy adapted for AI applications in this page
"NIST Cybersecurity Framework" - NIST (2024) - Framework for organizing defense controls across identify, protect, detect, respond, and recover functions
"Securing LLM-Integrated Applications" - Microsoft Security (2024) - Practical guidance on implementing multi-layer defense for LLM applications
"OWASP Top 10 for LLM Applications" - OWASP (2025) - Risk classification that maps to specific defense layers in a depth strategy

Knowledge Check

An LLM application has a strong prompt shield on input but no output filtering. An attacker uses a benign-looking input that the shield allows, and the model generates harmful content. What defense-in-depth principle was violated?

Defense-in-Depth for LLM Apps

The Four Defense Layers

Layer 1: Network and Infrastructure

Layer 2: Application (Input Processing)

Layer 3: Model (Inference Controls)

Layer 4: Output (Post-Processing)

Why Single-Layer Defense Fails

Defense-in-Depth Checklist

Further Reading

References

Defense-in-Depth for LLM Apps

The Four Defense Layers

Layer 1: Network and Infrastructure

Layer 2: Application (Input Processing)

Layer 3: Model (Inference Controls)

Layer 4: Output (Post-Processing)

Why Single-Layer Defense Fails

Defense-in-Depth Checklist

Further Reading

References

Defense-in-Depth for LLM Apps

Related articles

Defense-in-Depth for LLM Apps

Related articles