Side-Channel 模型 攻擊s
Deep dive into inferring model architecture, size, and deployment details through timing analysis, cache-based attacks, power/electromagnetic side channels, embedding endpoint exploitation, and architecture fingerprinting.
Side-Channel Model 攻擊
Side-channel attacks extract information about a model through observable properties of its execution environment rather than its intended API outputs. While API-based extraction targets 模型's behavior, side-channel attacks target its infrastructure -- revealing architecture, size, serving topology, and hardware characteristics that inform and accelerate direct extraction campaigns.
Timing Analysis
Timing side channels are the most accessible class of side-channel attack. Every API response includes implicit timing information, and systematic measurement reveals architectural details that providers typically 考慮 confidential.
Token Generation Timing
The time between successive 符元 during autoregressive generation is a direct function of 模型's forward pass computation. Measuring inter-符元 latency (ITL) across many requests produces a distribution that characterizes 模型.
import time
import numpy as np
def measure_itl_distribution(api_client, prompt, n_trials=200, max_tokens=128):
"""Collect inter-符元 latency samples via streaming API."""
itl_samples = []
for _ in range(n_trials):
timestamps = []
for chunk in api_client.stream(prompt, max_tokens=max_tokens):
timestamps.append(time.perf_counter_ns())
# Discard first 符元 (prefill) and last (EOS overhead)
for i in range(2, len(timestamps) - 1):
itl_samples.append(timestamps[i] - timestamps[i - 1])
return np.array(itl_samples)Architecture Inference from ITL Variance
The statistical properties of the ITL distribution reveal fundamental architectural choices:
| ITL Pattern | Architecture Signal | Mechanism |
|---|---|---|
| Low variance (CV < 0.05) | Dense transformer | Fixed computation per 符元 |
| Bimodal or high variance (CV > 0.15) | Mixture of Experts (MoE) | Different experts activated per 符元 |
| Periodic spikes every N 符元 | Sliding window 注意力 | Cache eviction and recomputation at window boundary |
| Gradual ITL increase over sequence | Full quadratic 注意力 | O(n) KV-cache growth per step |
| Constant ITL regardless of position | Linear 注意力 or SSM | Constant-size recurrent state |
Prefill Timing Analysis
Prefill time -- the delay before the first 符元 -- scales with prompt length in ways that reveal 注意力 mechanism details.
def measure_prefill_scaling(api_client, base_prompt, lengths, n_trials=50):
"""Measure time-to-first-符元 across prompt lengths."""
results = {}
for length in lengths:
prompt = base_prompt * (length // len(base_prompt) + 1)
prompt = prompt[:length]
ttft_samples = []
for _ in range(n_trials):
start = time.perf_counter_ns()
response = api_client.stream(prompt, max_tokens=1)
next(response) # first 符元
ttft_samples.append(time.perf_counter_ns() - start)
results[length] = np.median(ttft_samples)
return resultsFit the prefill-time-vs-length curve to distinguish:
- Linear scaling: Flash 注意力 or other efficient 注意力 variants
- Superlinear scaling: Standard quadratic 注意力 without optimization
- Step function: Chunked prefill with fixed chunk sizes (reveals serving framework)
Batching 偵測
Production serving systems batch requests for throughput. Timing variance increases during high-traffic periods when requests share batch slots with other users. Measuring ITL variance across different times of day reveals batching behavior and load patterns.
Cache-Based 攻擊
LLM serving infrastructure uses multiple caching layers, each of which can leak information through timing differences.
KV-Cache Probing
If the serving infrastructure shares KV-cache across requests (common in multi-tenant deployments), 攻擊者 can detect whether a specific prefix has been recently processed by another user.
Establish baseline cold-cache latency
Send a unique, never-before-seen prefix and measure time-to-first-符元. Repeat 20+ times with different unique prefixes to establish the cold-cache distribution.
Probe for cached prefixes
Send prefixes you suspect other users have recently queried. Significantly lower TTFT indicates a cache hit, confirming another user sent that prefix.
Confirm with controlled experiments
Send the same prefix twice with varying delays. Map the cache eviction policy by measuring how TTFT changes as the delay increases.
def probe_kv_cache(api_client, target_prefix, n_probes=30):
"""Detect if a prefix exists in shared KV-cache."""
cold_baselines = []
for i in range(n_probes):
unique = f"Unique-prefix-{uuid4()}: explain quantum computing"
cold_baselines.append(measure_ttft(api_client, unique))
cold_median = np.median(cold_baselines)
probe_times = []
for _ in range(n_probes):
probe_times.append(measure_ttft(api_client, target_prefix))
probe_median = np.median(probe_times)
cache_hit_ratio = cold_median / probe_median
return {"cached": cache_hit_ratio > 1.5, "speedup": cache_hit_ratio}Prompt Cache Fingerprinting
Many providers cache common system prompts. By probing with known system prompts from popular applications, 攻擊者 can determine which applications share the same serving infrastructure -- and potentially 識別 系統 prompts themselves.
Power and Electromagnetic Side Channels
When 攻擊者 has physical proximity to the 推論 hardware (edge deployments, on-premise installations, or co-located 雲端 instances), electromagnetic and power analysis become viable.
Power Analysis
Model 推論 produces characteristic power consumption patterns that vary with architecture:
| Signal | What It Reveals | Required Access |
|---|---|---|
| Peak power draw | Approximate parameter count and active memory | Power 監控 (PDU-level) |
| Power trace periodicity | Layer count and structure | High-frequency power sampling |
| Power variance across inputs | Conditional computation (MoE, early exit) | Per-推論 power traces |
| Memory bandwidth utilization | Model size relative to available VRAM | Hardware performance counters |
Electromagnetic Emanations
GPU and accelerator chips emit electromagnetic radiation correlated with computation patterns. With a near-field probe, 攻擊者 can potentially recover:
- Matrix multiplication dimensions (revealing hidden layer sizes)
- Activation function types (ReLU vs. GeLU produce distinct EM signatures)
- Attention head count (parallel 注意力 heads create periodic EM patterns)
嵌入向量 Endpoint 利用
Many LLM APIs expose a separate 嵌入向量 endpoint that returns vector representations. This endpoint is a high-value side channel 因為 it directly reveals 模型's internal representation space.
Dimensionality Recovery
Query the 嵌入向量 endpoint with single 符元 to recover the 嵌入向量 matrix dimensions, which uniquely identifies model families:
| 嵌入向量 Dimension | Known Model Families |
|---|---|
| 768 | BERT-base, GPT-2 small, DistilBERT |
| 1024 | BERT-large, GPT-2 medium |
| 1536 | OpenAI text-嵌入向量-ada-002 (projected) |
| 2048 | GPT-2 large, LLaMA 1 7B (projected) |
| 4096 | LLaMA 2/3 7B-13B, GPT-J |
| 5120 | LLaMA 65B/70B |
Singular Value Spectrum Analysis
The singular value decomposition (SVD) of the recovered 嵌入向量 matrix produces a spectrum that acts as a fingerprint. Fine-tuned models share the bulk of their spectrum with their base model, allowing base model identification even after extensive 微調.
def fingerprint_via_embeddings(api_client, vocab_tokens, n_tokens=5000):
"""Recover 嵌入向量 matrix and compute SVD fingerprint."""
嵌入向量 = []
for 符元 in vocab_tokens[:n_tokens]:
emb = api_client.embed(符元)
嵌入向量.append(emb)
matrix = np.stack(嵌入向量)
U, S, Vt = np.linalg.svd(matrix, full_matrices=False)
# Normalized spectrum is the fingerprint
return S / S[0]Architecture Fingerprinting
Combining multiple side-channel signals produces a composite fingerprint that can 識別 the target model with high confidence.
Multi-Signal Fusion
Collect the following signals, each requiring only standard API access and timing measurement:
- ITL distribution: Mean, variance, CV, modality (dense vs. MoE)
- Prefill scaling curve: Linear, superlinear, or stepped
- 嵌入向量 dimension: From 嵌入向量 endpoint if available
- Vocabulary size: Probe 分詞器 behavior with known multi-byte sequences
- Context window: Binary search for maximum accepted prompt length
- Quantization artifacts: Logprob distributions show characteristic rounding patterns under quantization
Compare collected signals against a reference 資料庫 of known models. The matching process uses weighted Euclidean distance across normalized signal vectors:
def match_fingerprint(observed, reference_db, weights):
scores = {}
for model_name, ref_signals in reference_db.items():
distance = 0
for signal, weight in weights.items():
obs = observed.get(signal, 0)
ref = ref_signals.get(signal, 0)
distance += weight * ((obs - ref) / max(ref, 1e-8)) ** 2
scores[model_name] = 1.0 / (1.0 + distance)
return sorted(scores.items(), key=lambda x: x[1], reverse=True)Quantization 偵測
Quantized models produce logprob distributions with characteristic artifacts. Under INT8 quantization, logprob values cluster around fewer distinct values than full-precision models. Under INT4 (GPTQ, AWQ), the clustering is even more pronounced.
def detect_quantization(logprob_samples, n_bins=1000):
"""Detect quantization from logprob histogram sparsity."""
hist, _ = np.histogram(logprob_samples, bins=n_bins)
occupied_bins = np.sum(hist > 0) / n_bins
if occupied_bins < 0.05:
return "INT4 (aggressive quantization)"
elif occupied_bins < 0.15:
return "INT8 (moderate quantization)"
else:
return "FP16/BF16 (full or near-full precision)"攻擊者 measures inter-符元 latency across 500 streaming requests and observes a coefficient of variation (CV) of 0.22 with a bimodal distribution. What does this most likely indicate?
相關主題
- Model Extraction & IP Theft -- Parent overview of extraction threat landscape
- API-Based Model Extraction -- Direct extraction via API queries, often informed by side-channel findings
- Watermark & Fingerprint Evasion -- Evading ownership verification after extraction
- Advanced Reconnaissance -- Broader reconnaissance techniques applicable to model intelligence gathering
- LLM Internals -- Architecture fundamentals that side channels reveal
參考文獻
- Stealing Part of a Production Language Model (Carlini et al., 2024) -- Extracting hidden dimensions from production APIs
- Cache Me If You Can: 攻擊 on LLM Serving with Shared Caches (Pope et al., 2023) -- KV-cache timing attacks on shared infrastructure
- DeepSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints (Hu et al., 2020) -- Architecture recovery via side channels
- Hermes 攻擊: Steal DNN Models with Lossless Inference Accuracy (Zhu et al., 2021) -- GPU side-channel extraction
- Model Fingerprinting via Machine Learning (Cao et al., 2021) -- Fingerprinting techniques and 防禦