Model Architecture Attack Vectors
How model architecture decisions create exploitable attack surfaces, including attention mechanisms, MoE routing, KV cache, and context window vulnerabilities.
Every architectural choice in a modern LLM -- from the attention mechanism to the inference optimization stack -- creates potential attack surface. Unlike prompt-level attacks that exploit the model's learned behavior, architecture-level attacks exploit the mathematical and engineering foundations that the model is built on. These vulnerabilities exist regardless of how well the model is aligned.
The Architecture Attack Surface
Modern LLM deployments stack multiple architectural components, each introducing distinct vulnerability classes:
| Component | Attack Surface | Example Exploit |
|---|---|---|
| Attention mechanism | Quadratic complexity, positional encoding | Attention dilution, position hijacking |
| MoE routing | Expert selection logic, load balancing | Routing manipulation, expert poisoning |
| KV cache | Shared memory across requests | Cache poisoning, cross-tenant leakage |
| Context window | Fixed token budget, positional bias | Context stuffing, lost-in-the-middle |
| Quantization | Precision reduction | Safety degradation, adversarial precision loss |
| Speculative decoding | Draft model predictions | Draft model manipulation |
| Batching | Request grouping for throughput | Cross-request interference |
Attention Mechanism Vulnerabilities
The self-attention mechanism computes pairwise relationships between all tokens. This creates two fundamental attack vectors.
Attention Hijacking
An attacker can craft inputs that dominate attention scores, causing the model to disproportionately weight adversarial content over legitimate instructions:
# Attention score computation (simplified)
# Q, K, V = query, key, value matrices
scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(d_k)
# Adversarial tokens designed to produce high dot products with
# instruction-following query vectors will dominate attention
attn_weights = F.softmax(scores, dim=-1)
output = torch.matmul(attn_weights, V)Positional Encoding Exploitation
Models using RoPE or ALiBi positional encodings exhibit systematic biases based on token position. Tokens near the beginning and end of the context receive disproportionate attention -- a property attackers exploit by placing injections at these high-attention positions.
Architecture Categories
This section covers eight classes of architecture-level attacks, organized from foundational to advanced:
MoE Routing Manipulation
How Mixture-of-Experts routing decisions can be manipulated to activate specific experts, bypass safety-specialized experts, or exploit load balancing. See MoE Routing Manipulation.
KV Cache Poisoning
Exploiting shared KV caches in multi-tenant deployments for cross-request data leakage and behavior manipulation. See KV Cache Poisoning.
Context Window Exploitation
Using context length limits, attention distribution, and positional biases as attack vectors. See Context Window Exploitation.
Inference Optimization Attacks
Targeting speculative decoding, continuous batching, and other throughput optimizations. See Inference Optimization Attacks.
Quantization Attacks
How reduced-precision inference degrades safety properties and enables new adversarial examples. See Quantization Attacks.
Distillation-Based Extraction
Using knowledge distillation as a model theft vector. See Distillation Extraction.
Threat Model: Who Exploits Architecture?
Architecture attacks require different access levels than prompt-level attacks:
| Attacker Profile | Access Level | Typical Targets |
|---|---|---|
| External user | API access only | Context window, attention biases |
| Co-tenant | Shared infrastructure | KV cache, batching side channels |
| Model operator | Deployment config | Quantization, optimization settings |
| Supply chain | Model weights | MoE routing, distillation |
| Infrastructure insider | Full stack | All architecture vectors |
Architecture Security Assessment Checklist
When red-teaming an LLM deployment, evaluate these architecture-level concerns:
- Attention bias profiling -- Test injection success rates at different context positions to map the model's positional attention distribution
- Quantization safety audit -- Compare safety benchmark results between the full-precision and deployed quantized model
- KV cache isolation -- Verify that cache prefixes cannot leak between tenants in shared deployments
- Context window boundary testing -- Test behavior when context is at 90%, 95%, and 100% capacity
- Inference optimization validation -- Confirm that speculative decoding and batching produce identical outputs to naive inference
Related Topics
- MoE Routing Manipulation -- Expert selection attacks
- KV Cache Poisoning -- Cache-based cross-tenant attacks
- Context Window Exploitation -- Positional bias exploitation
- Training & Fine-Tuning Attacks -- Training-time attack surface
- LLM Internals -- Foundational transformer concepts
Why are architecture-level attacks fundamentally different from prompt-level jailbreaks?
References
- Attention Is All You Need (Vaswani et al., 2017) -- Transformer architecture
- Lost in the Middle (Liu et al., 2023) -- Positional attention bias in LLMs