Layered 防禦 Strategy

Intermediate8 min readUpdated 2026-03-15

Implementing defense in depth for AI applications: designing independent defense layers, ensuring orthogonal coverage, and managing the complexity of multi-layer security.

defense-in-depth layered-defense security-architecture multi-layer strategy

Layered 防禦 Strategy

Layered 防禦 (防禦 in depth) is the most effective strategy for addressing the fundamental asymmetry between attack and 防禦 in AI systems. No single 防禦 mechanism can cover all attack types, and every individual 防禦 has known bypass techniques. By layering multiple independent 防禦, the probability that 攻擊者 bypasses all layers simultaneously drops dramatically -- provided the layers are truly independent.

防禦 Layer Architecture

Reference Architecture

A production AI system with layered 防禦:

Request Flow:

User 輸入
    │
    ▼
┌─────────────────────────────────────────────┐
│ Layer 1: Perimeter 防禦                   │
│  ├── Rate limiting                          │
│  ├── Authentication & 授權         │
│  └── 輸入 size and format validation       │
└──────────────────────┬──────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────┐
│ Layer 2: 輸入 Analysis                      │
│  ├── Prompt shield (ML classifier)          │
│  ├── Unicode/encoding normalization         │
│  └── Known pattern blocklist                │
└──────────────────────┬──────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────┐
│ Layer 3: Context 安全                    │
│  ├── Instruction hierarchy enforcement      │
│  ├── Retrieved content sanitization         │
│  └── Data source isolation                  │
└──────────────────────┬──────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────┐
│ Layer 4: Model + Inference                   │
│  ├── 安全-aligned model (RLHF/DPO)       │
│  ├── 系統提示詞 with 安全 instructions │
│  └── Temperature and sampling controls      │
└──────────────────────┬──────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────┐
│ Layer 5: 輸出 Validation                   │
│  ├── Content 安全 classifier              │
│  ├── PII 偵測 and redaction            │
│  ├── Tool call validation                   │
│  └── Response format enforcement            │
└──────────────────────┬──────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────┐
│ Layer 6: Post-Delivery 監控            │
│  ├── Audit logging                          │
│  ├── Anomaly 偵測                      │
│  └── Incident alerting                      │
└──────────────────────┬──────────────────────┘
                       │
                       ▼
                   Response

Layer Independence

The key property that makes layered 防禦 effective is independence: the failure of one layer should not increase the probability of another layer failing.

Independent Layers	Correlated Layers (Bad)
Keyword filter + semantic classifier	Two different keyword filters
輸入 analysis + 輸出 validation	Same ML model for 輸入 and 輸出
Rate limiting + content classifier	Two classifiers trained on same data
Architectural constraints + ML 防禦	ML 防禦 + ML monitor (same model)

def verify_layer_independence(defense_layers, attack_set):
    """
    測試 whether 防禦 layers fail independently.
    Correlated failures indicate shared weaknesses.
    """
    layer_results = {}
 
    for attack in attack_set:
        for layer_name, layer in defense_layers.items():
            blocked = layer.check(attack)["blocked"]
            if layer_name not in layer_results:
                layer_results[layer_name] = []
            layer_results[layer_name].append(blocked)
 
    # Check pairwise correlation
    correlations = {}
    layer_names = list(layer_results.keys())
 
    for i, name_a in enumerate(layer_names):
        for j, name_b in enumerate(layer_names):
            if i < j:
                results_a = layer_results[name_a]
                results_b = layer_results[name_b]
 
                # Compute correlation
                both_fail = sum(
                    1 for a, b in zip(results_a, results_b)
                    if not a and not b
                )
                either_fail = sum(
                    1 for a, b in zip(results_a, results_b)
                    if not a or not b
                )
 
                correlation = both_fail / max(either_fail, 1)
                correlations[f"{name_a} <-> {name_b}"] = {
                    "correlation": correlation,
                    "independent": correlation < 0.3
                }
 
    return correlations

實作 Patterns

Pattern 1: Fail-Open vs Fail-Closed

Each layer must decide what to do when it encounters an error:

class FailClosedLayer:
    """Block request if the 防禦 layer encounters an error."""
 
    def check(self, input_text):
        try:
            result = self.classifier.classify(input_text)
            return result
        except Exception as e:
            # 防禦 error -> block the request (conservative)
            logger.error(f"防禦 layer error: {e}")
            return {"blocked": True, "reason": "defense_error"}
 
class FailOpenLayer:
    """Allow request through if the 防禦 layer encounters an error."""
 
    def check(self, input_text):
        try:
            result = self.classifier.classify(input_text)
            return result
        except Exception as e:
            # 防禦 error -> allow through (permissive)
            logger.warning(f"防禦 layer error, failing open: {e}")
            return {"blocked": False, "reason": "defense_error_failopen"}

Pattern 2: Async vs Sync Layers

Not all layers need to be synchronous. Some can operate asynchronously without blocking the response:

class LayeredDefenseOrchestrator:
    """Orchestrate sync and async 防禦 layers."""
 
    def __init__(self):
        self.sync_layers = []    # Must pass before response
        self.async_layers = []   # Run in background
 
    def process(self, request):
        # Synchronous layers: block if any fails
        for layer in self.sync_layers:
            result = layer.check(request)
            if result["blocked"]:
                return self.block_response(result)
 
        # Process request (generate response)
        response = self.generate_response(request)
 
        # Synchronous 輸出 layers
        for layer in self.sync_output_layers:
            result = layer.check(response)
            if result["blocked"]:
                return self.block_response(result)
 
        # Async layers: log and monitor without blocking
        for layer in self.async_layers:
            self.background_queue.put((layer, request, response))
 
        return response

Pattern 3: Escalation Chains

Some attacks warrant escalating from automated 防禦 to human review:

class EscalationChain:
    """Escalate suspicious requests through increasing scrutiny."""
 
    def __init__(self):
        self.levels = [
            {"name": "automated_fast", "threshold": 0.3, "action": "allow"},
            {"name": "automated_deep", "threshold": 0.6, "action": "enhanced_check"},
            {"name": "human_review_queue", "threshold": 0.8, "action": "queue"},
            {"name": "block", "threshold": 0.95, "action": "block"},
        ]
 
    def 評估(self, request, risk_score):
        """Route request based on risk score."""
        for level in self.levels:
            if risk_score < level["threshold"]:
                return {
                    "action": level["action"],
                    "level": level["name"],
                    "risk_score": risk_score
                }
 
        return {"action": "block", "level": "maximum", "risk_score": risk_score}

Performance Management

Layered 防禦 adds latency. Managing 這是 critical for user experience:

Latency Budget Allocation

# Total latency budget: 2000ms (typical for chat applications)
LATENCY_BUDGET = {
    "perimeter": 5,           # Rate limit check: 5ms
    "input_analysis": 50,     # ML classifier: 50ms
    "context_security": 20,   # Template checking: 20ms
    "model_inference": 1500,  # LLM generation: 1500ms
    "output_validation": 100, # 安全 classifier: 100ms
    "post_delivery": 0,       # Async, no added latency
    "overhead": 25,           # Network, serialization: 25ms
    "buffer": 300,            # Headroom for variance: 300ms
}
# Total: 2000ms

Parallel Layer Execution

import asyncio
 
async def parallel_input_defense(request):
    """Run independent 輸入 防禦 layers in parallel."""
    results = await asyncio.gather(
        prompt_shield.check_async(request),
        encoding_normalizer.check_async(request),
        pattern_blocklist.check_async(request),
        return_exceptions=True
    )
 
    # Block if any layer flags the request
    for result in results:
        if isinstance(result, Exception):
            continue  # Individual layer failure, continue
        if result.get("blocked"):
            return result
 
    return {"blocked": False}

測試 Layered 防禦

Layer-by-Layer 測試

測試 each layer in isolation to 理解 its individual contribution:

def layer_contribution_analysis(layers, attack_set):
    """Measure each layer's marginal contribution to 防禦."""
    all_layers_result = evaluate_defense(layers, attack_set)
 
    contributions = {}
    for layer_name in layers:
        # Remove this layer and 評估
        remaining = {k: v for k, v in layers.items() if k != layer_name}
        without_layer = evaluate_defense(remaining, attack_set)
 
        contributions[layer_name] = {
            "block_rate_with": all_layers_result["block_rate"],
            "block_rate_without": without_layer["block_rate"],
            "marginal_contribution": (
                all_layers_result["block_rate"] -
                without_layer["block_rate"]
            ),
            "is_redundant": (
                all_layers_result["block_rate"] ==
                without_layer["block_rate"]
            )
        }
 
    return contributions

參考文獻

NIST, "Framework for Improving Critical Infrastructure Cybersecurity" (2018)
OWASP, "防禦 in Depth for LLM Applications" (2024)
Microsoft, "Azure AI Content 安全: Multi-layered 防禦" (2024)

Layered 防禦 Strategy

Intermediate8 min readUpdated 2026-03-15

Implementing defense in depth for AI applications: designing independent defense layers, ensuring orthogonal coverage, and managing the complexity of multi-layer security.

defense-in-depth layered-defense security-architecture multi-layer strategy

Layered 防禦 Strategy

防禦 Layer Architecture

Reference Architecture

A production AI system with layered 防禦:

Request Flow:

User 輸入
    │
    ▼
┌─────────────────────────────────────────────┐
│ Layer 1: Perimeter 防禦                   │
│  ├── Rate limiting                          │
│  ├── Authentication & 授權         │
│  └── 輸入 size and format validation       │
└──────────────────────┬──────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────┐
│ Layer 2: 輸入 Analysis                      │
│  ├── Prompt shield (ML classifier)          │
│  ├── Unicode/encoding normalization         │
│  └── Known pattern blocklist                │
└──────────────────────┬──────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────┐
│ Layer 3: Context 安全                    │
│  ├── Instruction hierarchy enforcement      │
│  ├── Retrieved content sanitization         │
│  └── Data source isolation                  │
└──────────────────────┬──────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────┐
│ Layer 4: Model + Inference                   │
│  ├── 安全-aligned model (RLHF/DPO)       │
│  ├── 系統提示詞 with 安全 instructions │
│  └── Temperature and sampling controls      │
└──────────────────────┬──────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────┐
│ Layer 5: 輸出 Validation                   │
│  ├── Content 安全 classifier              │
│  ├── PII 偵測 and redaction            │
│  ├── Tool call validation                   │
│  └── Response format enforcement            │
└──────────────────────┬──────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────┐
│ Layer 6: Post-Delivery 監控            │
│  ├── Audit logging                          │
│  ├── Anomaly 偵測                      │
│  └── Incident alerting                      │
└──────────────────────┬──────────────────────┘
                       │
                       ▼
                   Response

Layer Independence

The key property that makes layered 防禦 effective is independence: the failure of one layer should not increase the probability of another layer failing.

Independent Layers	Correlated Layers (Bad)
Keyword filter + semantic classifier	Two different keyword filters
輸入 analysis + 輸出 validation	Same ML model for 輸入 and 輸出
Rate limiting + content classifier	Two classifiers trained on same data
Architectural constraints + ML 防禦	ML 防禦 + ML monitor (same model)

def verify_layer_independence(defense_layers, attack_set):
    """
    測試 whether 防禦 layers fail independently.
    Correlated failures indicate shared weaknesses.
    """
    layer_results = {}
 
    for attack in attack_set:
        for layer_name, layer in defense_layers.items():
            blocked = layer.check(attack)["blocked"]
            if layer_name not in layer_results:
                layer_results[layer_name] = []
            layer_results[layer_name].append(blocked)
 
    # Check pairwise correlation
    correlations = {}
    layer_names = list(layer_results.keys())
 
    for i, name_a in enumerate(layer_names):
        for j, name_b in enumerate(layer_names):
            if i < j:
                results_a = layer_results[name_a]
                results_b = layer_results[name_b]
 
                # Compute correlation
                both_fail = sum(
                    1 for a, b in zip(results_a, results_b)
                    if not a and not b
                )
                either_fail = sum(
                    1 for a, b in zip(results_a, results_b)
                    if not a or not b
                )
 
                correlation = both_fail / max(either_fail, 1)
                correlations[f"{name_a} <-> {name_b}"] = {
                    "correlation": correlation,
                    "independent": correlation < 0.3
                }
 
    return correlations

實作 Patterns

Pattern 1: Fail-Open vs Fail-Closed

Each layer must decide what to do when it encounters an error:

class FailClosedLayer:
    """Block request if the 防禦 layer encounters an error."""
 
    def check(self, input_text):
        try:
            result = self.classifier.classify(input_text)
            return result
        except Exception as e:
            # 防禦 error -> block the request (conservative)
            logger.error(f"防禦 layer error: {e}")
            return {"blocked": True, "reason": "defense_error"}
 
class FailOpenLayer:
    """Allow request through if the 防禦 layer encounters an error."""
 
    def check(self, input_text):
        try:
            result = self.classifier.classify(input_text)
            return result
        except Exception as e:
            # 防禦 error -> allow through (permissive)
            logger.warning(f"防禦 layer error, failing open: {e}")
            return {"blocked": False, "reason": "defense_error_failopen"}

Pattern 2: Async vs Sync Layers

Not all layers need to be synchronous. Some can operate asynchronously without blocking the response:

class LayeredDefenseOrchestrator:
    """Orchestrate sync and async 防禦 layers."""
 
    def __init__(self):
        self.sync_layers = []    # Must pass before response
        self.async_layers = []   # Run in background
 
    def process(self, request):
        # Synchronous layers: block if any fails
        for layer in self.sync_layers:
            result = layer.check(request)
            if result["blocked"]:
                return self.block_response(result)
 
        # Process request (generate response)
        response = self.generate_response(request)
 
        # Synchronous 輸出 layers
        for layer in self.sync_output_layers:
            result = layer.check(response)
            if result["blocked"]:
                return self.block_response(result)
 
        # Async layers: log and monitor without blocking
        for layer in self.async_layers:
            self.background_queue.put((layer, request, response))
 
        return response

Pattern 3: Escalation Chains

Some attacks warrant escalating from automated 防禦 to human review:

class EscalationChain:
    """Escalate suspicious requests through increasing scrutiny."""
 
    def __init__(self):
        self.levels = [
            {"name": "automated_fast", "threshold": 0.3, "action": "allow"},
            {"name": "automated_deep", "threshold": 0.6, "action": "enhanced_check"},
            {"name": "human_review_queue", "threshold": 0.8, "action": "queue"},
            {"name": "block", "threshold": 0.95, "action": "block"},
        ]
 
    def 評估(self, request, risk_score):
        """Route request based on risk score."""
        for level in self.levels:
            if risk_score < level["threshold"]:
                return {
                    "action": level["action"],
                    "level": level["name"],
                    "risk_score": risk_score
                }
 
        return {"action": "block", "level": "maximum", "risk_score": risk_score}

Performance Management

Layered 防禦 adds latency. Managing 這是 critical for user experience:

Latency Budget Allocation

# Total latency budget: 2000ms (typical for chat applications)
LATENCY_BUDGET = {
    "perimeter": 5,           # Rate limit check: 5ms
    "input_analysis": 50,     # ML classifier: 50ms
    "context_security": 20,   # Template checking: 20ms
    "model_inference": 1500,  # LLM generation: 1500ms
    "output_validation": 100, # 安全 classifier: 100ms
    "post_delivery": 0,       # Async, no added latency
    "overhead": 25,           # Network, serialization: 25ms
    "buffer": 300,            # Headroom for variance: 300ms
}
# Total: 2000ms

Parallel Layer Execution

import asyncio
 
async def parallel_input_defense(request):
    """Run independent 輸入 防禦 layers in parallel."""
    results = await asyncio.gather(
        prompt_shield.check_async(request),
        encoding_normalizer.check_async(request),
        pattern_blocklist.check_async(request),
        return_exceptions=True
    )
 
    # Block if any layer flags the request
    for result in results:
        if isinstance(result, Exception):
            continue  # Individual layer failure, continue
        if result.get("blocked"):
            return result
 
    return {"blocked": False}

測試 Layered 防禦

Layer-by-Layer 測試

測試 each layer in isolation to 理解 its individual contribution:

def layer_contribution_analysis(layers, attack_set):
    """Measure each layer's marginal contribution to 防禦."""
    all_layers_result = evaluate_defense(layers, attack_set)
 
    contributions = {}
    for layer_name in layers:
        # Remove this layer and 評估
        remaining = {k: v for k, v in layers.items() if k != layer_name}
        without_layer = evaluate_defense(remaining, attack_set)
 
        contributions[layer_name] = {
            "block_rate_with": all_layers_result["block_rate"],
            "block_rate_without": without_layer["block_rate"],
            "marginal_contribution": (
                all_layers_result["block_rate"] -
                without_layer["block_rate"]
            ),
            "is_redundant": (
                all_layers_result["block_rate"] ==
                without_layer["block_rate"]
            )
        }
 
    return contributions

參考文獻

NIST, "Framework for Improving Critical Infrastructure Cybersecurity" (2018)
OWASP, "防禦 in Depth for LLM Applications" (2024)
Microsoft, "Azure AI Content 安全: Multi-layered 防禦" (2024)

Layered 防禦 Strategy

Layered 防禦 Strategy

防禦 Layer Architecture

Reference Architecture

Layer Independence

實作 Patterns

Pattern 1: Fail-Open vs Fail-Closed

Pattern 2: Async vs Sync Layers

Pattern 3: Escalation Chains

Performance Management

Latency Budget Allocation

Parallel Layer Execution

測試 Layered 防禦

Layer-by-Layer 測試

相關主題

參考文獻

Layered 防禦 Strategy

Layered 防禦 Strategy

防禦 Layer Architecture

Reference Architecture

Layer Independence

實作 Patterns

Pattern 1: Fail-Open vs Fail-Closed

Pattern 2: Async vs Sync Layers

Pattern 3: Escalation Chains

Performance Management

Latency Budget Allocation

Parallel Layer Execution

測試 Layered 防禦

Layer-by-Layer 測試

相關主題

參考文獻

Layered 防禦 Strategy

Related articles

Layered 防禦 Strategy

Related articles