Layered 防禦 Strategy
Implementing defense in depth for AI applications: designing independent defense layers, ensuring orthogonal coverage, and managing the complexity of multi-layer security.
Layered 防禦 Strategy
Layered 防禦 (防禦 in depth) is the most effective strategy for addressing the fundamental asymmetry between attack and 防禦 in AI systems. No single 防禦 mechanism can cover all attack types, and every individual 防禦 has known bypass techniques. By layering multiple independent 防禦, the probability that 攻擊者 bypasses all layers simultaneously drops dramatically -- provided the layers are truly independent.
防禦 Layer Architecture
Reference Architecture
A production AI system with layered 防禦:
Request Flow:
User 輸入
│
▼
┌─────────────────────────────────────────────┐
│ Layer 1: Perimeter 防禦 │
│ ├── Rate limiting │
│ ├── Authentication & 授權 │
│ └── 輸入 size and format validation │
└──────────────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Layer 2: 輸入 Analysis │
│ ├── Prompt shield (ML classifier) │
│ ├── Unicode/encoding normalization │
│ └── Known pattern blocklist │
└──────────────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Layer 3: Context 安全 │
│ ├── Instruction hierarchy enforcement │
│ ├── Retrieved content sanitization │
│ └── Data source isolation │
└──────────────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Layer 4: Model + Inference │
│ ├── 安全-aligned model (RLHF/DPO) │
│ ├── 系統提示詞 with 安全 instructions │
│ └── Temperature and sampling controls │
└──────────────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Layer 5: 輸出 Validation │
│ ├── Content 安全 classifier │
│ ├── PII 偵測 and redaction │
│ ├── Tool call validation │
│ └── Response format enforcement │
└──────────────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Layer 6: Post-Delivery 監控 │
│ ├── Audit logging │
│ ├── Anomaly 偵測 │
│ └── Incident alerting │
└──────────────────────┬──────────────────────┘
│
▼
Response
Layer Independence
The key property that makes layered 防禦 effective is independence: the failure of one layer should not increase the probability of another layer failing.
| Independent Layers | Correlated Layers (Bad) |
|---|---|
| Keyword filter + semantic classifier | Two different keyword filters |
| 輸入 analysis + 輸出 validation | Same ML model for 輸入 and 輸出 |
| Rate limiting + content classifier | Two classifiers trained on same data |
| Architectural constraints + ML 防禦 | ML 防禦 + ML monitor (same model) |
def verify_layer_independence(defense_layers, attack_set):
"""
測試 whether 防禦 layers fail independently.
Correlated failures indicate shared weaknesses.
"""
layer_results = {}
for attack in attack_set:
for layer_name, layer in defense_layers.items():
blocked = layer.check(attack)["blocked"]
if layer_name not in layer_results:
layer_results[layer_name] = []
layer_results[layer_name].append(blocked)
# Check pairwise correlation
correlations = {}
layer_names = list(layer_results.keys())
for i, name_a in enumerate(layer_names):
for j, name_b in enumerate(layer_names):
if i < j:
results_a = layer_results[name_a]
results_b = layer_results[name_b]
# Compute correlation
both_fail = sum(
1 for a, b in zip(results_a, results_b)
if not a and not b
)
either_fail = sum(
1 for a, b in zip(results_a, results_b)
if not a or not b
)
correlation = both_fail / max(either_fail, 1)
correlations[f"{name_a} <-> {name_b}"] = {
"correlation": correlation,
"independent": correlation < 0.3
}
return correlations實作 Patterns
Pattern 1: Fail-Open vs Fail-Closed
Each layer must decide what to do when it encounters an error:
class FailClosedLayer:
"""Block request if the 防禦 layer encounters an error."""
def check(self, input_text):
try:
result = self.classifier.classify(input_text)
return result
except Exception as e:
# 防禦 error -> block the request (conservative)
logger.error(f"防禦 layer error: {e}")
return {"blocked": True, "reason": "defense_error"}
class FailOpenLayer:
"""Allow request through if the 防禦 layer encounters an error."""
def check(self, input_text):
try:
result = self.classifier.classify(input_text)
return result
except Exception as e:
# 防禦 error -> allow through (permissive)
logger.warning(f"防禦 layer error, failing open: {e}")
return {"blocked": False, "reason": "defense_error_failopen"}Pattern 2: Async vs Sync Layers
Not all layers need to be synchronous. Some can operate asynchronously without blocking the response:
class LayeredDefenseOrchestrator:
"""Orchestrate sync and async 防禦 layers."""
def __init__(self):
self.sync_layers = [] # Must pass before response
self.async_layers = [] # Run in background
def process(self, request):
# Synchronous layers: block if any fails
for layer in self.sync_layers:
result = layer.check(request)
if result["blocked"]:
return self.block_response(result)
# Process request (generate response)
response = self.generate_response(request)
# Synchronous 輸出 layers
for layer in self.sync_output_layers:
result = layer.check(response)
if result["blocked"]:
return self.block_response(result)
# Async layers: log and monitor without blocking
for layer in self.async_layers:
self.background_queue.put((layer, request, response))
return responsePattern 3: Escalation Chains
Some attacks warrant escalating from automated 防禦 to human review:
class EscalationChain:
"""Escalate suspicious requests through increasing scrutiny."""
def __init__(self):
self.levels = [
{"name": "automated_fast", "threshold": 0.3, "action": "allow"},
{"name": "automated_deep", "threshold": 0.6, "action": "enhanced_check"},
{"name": "human_review_queue", "threshold": 0.8, "action": "queue"},
{"name": "block", "threshold": 0.95, "action": "block"},
]
def 評估(self, request, risk_score):
"""Route request based on risk score."""
for level in self.levels:
if risk_score < level["threshold"]:
return {
"action": level["action"],
"level": level["name"],
"risk_score": risk_score
}
return {"action": "block", "level": "maximum", "risk_score": risk_score}Performance Management
Layered 防禦 adds latency. Managing 這是 critical for user experience:
Latency Budget Allocation
# Total latency budget: 2000ms (typical for chat applications)
LATENCY_BUDGET = {
"perimeter": 5, # Rate limit check: 5ms
"input_analysis": 50, # ML classifier: 50ms
"context_security": 20, # Template checking: 20ms
"model_inference": 1500, # LLM generation: 1500ms
"output_validation": 100, # 安全 classifier: 100ms
"post_delivery": 0, # Async, no added latency
"overhead": 25, # Network, serialization: 25ms
"buffer": 300, # Headroom for variance: 300ms
}
# Total: 2000msParallel Layer Execution
import asyncio
async def parallel_input_defense(request):
"""Run independent 輸入 防禦 layers in parallel."""
results = await asyncio.gather(
prompt_shield.check_async(request),
encoding_normalizer.check_async(request),
pattern_blocklist.check_async(request),
return_exceptions=True
)
# Block if any layer flags the request
for result in results:
if isinstance(result, Exception):
continue # Individual layer failure, continue
if result.get("blocked"):
return result
return {"blocked": False}測試 Layered 防禦
Layer-by-Layer 測試
測試 each layer in isolation to 理解 its individual contribution:
def layer_contribution_analysis(layers, attack_set):
"""Measure each layer's marginal contribution to 防禦."""
all_layers_result = evaluate_defense(layers, attack_set)
contributions = {}
for layer_name in layers:
# Remove this layer and 評估
remaining = {k: v for k, v in layers.items() if k != layer_name}
without_layer = evaluate_defense(remaining, attack_set)
contributions[layer_name] = {
"block_rate_with": all_layers_result["block_rate"],
"block_rate_without": without_layer["block_rate"],
"marginal_contribution": (
all_layers_result["block_rate"] -
without_layer["block_rate"]
),
"is_redundant": (
all_layers_result["block_rate"] ==
without_layer["block_rate"]
)
}
return contributions相關主題
- 防禦 Taxonomy — Catalog of available 防禦
- 防禦 評估 — Measuring 防禦 effectiveness
- 防禦 Economics — Cost-benefit of each layer
A system has three 防禦 layers. 測試 reveals that when Layer 1 (keyword filter) is bypassed, Layer 2 (ML classifier trained on keyword filter 訓練資料) is also bypassed 80% of the time. What is the problem?
參考文獻
- NIST, "Framework for Improving Critical Infrastructure Cybersecurity" (2018)
- OWASP, "防禦 in Depth for LLM Applications" (2024)
- Microsoft, "Azure AI Content 安全: Multi-layered 防禦" (2024)