LLM Proxy 安全
安全 assessment of LLM proxy and gateway solutions including LiteLLM, Portkey, and custom API gateways.
概覽
LLM proxies and gateways sit between applications and LLM providers, abstracting the provider-specific API differences behind a unified interface. They enable organizations to switch between providers (OpenAI, Anthropic, Azure, local models), enforce rate limits, log usage, cache responses, and apply content policies — all without changing application code.
The 安全 significance of LLM proxies is that they are credential aggregators and traffic brokers. A single proxy instance holds API keys for every LLM provider the organization uses, processes all prompts and completions, and has the ability to modify requests and responses in transit. Compromising the LLM proxy gives 攻擊者 access to all provider credentials, visibility into all prompts (which may contain sensitive business data), and the ability to manipulate model responses.
Common LLM proxy solutions include LiteLLM (open-source, most widely deployed), Portkey, Helicone, and custom API gateways. This article covers the 攻擊面 of LLM proxy architectures with a focus on LiteLLM (as the most prevalent open-source option), provides 評估 techniques, and details hardening measures.
LLM Proxy Architecture
Trust Boundary Analysis
┌─────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Application │────▶│ LLM Proxy │────▶│ LLM Providers │
│ (Client) │ │ (LiteLLM, etc.) │ │ (OpenAI, etc.) │
└─────────────┘ │ │ └──────────────────┘
│ ┌──────────────┐ │
│ │ API Keys DB │ │ ┌──────────────────┐
│ │ Usage Logs │ │ │ Cache Backend │
│ │ Config │ │ │ (Redis, etc.) │
│ └──────────────┘ │ └──────────────────┘
└──────────────────┘
The proxy occupies a privileged position in the data flow:
| Data Access | What the Proxy Sees |
|---|---|
| Provider API keys | All keys for all providers — high-value target |
| Prompts | All user and system prompts, including sensitive business data |
| Completions | All model responses |
| Usage metadata | Who uses what model, how often, for what |
| Application identity | API keys/符元 of all client applications |
Common Proxy Solutions
| Solution | Type | Key 安全 Features | Default Authentication |
|---|---|---|---|
| LiteLLM | OSS Python | Virtual keys, spend tracking, logging | None (configurable) |
| Portkey | SaaS/OSS | 護欄, caching, fallbacks | API key |
| Helicone | SaaS | Logging, caching, rate limiting | API key |
| Custom NGINX/Envoy | Infrastructure | TLS, basic auth, rate limiting | Configurable |
LiteLLM 安全 評估
Default Configuration Risks
LiteLLM's proxy server can be started with minimal configuration:
# Minimal LiteLLM proxy startup — examine the 安全 implications
litellm --model gpt-4 --api_base http://0.0.0.0:4000
# Or with a config file:
litellm --config config.yamlimport requests
from typing import Dict, List, Any, Optional
class LiteLLMSecurityScanner:
"""Scan a LiteLLM proxy deployment for 安全 issues."""
def __init__(self, proxy_url: str):
self.base_url = proxy_url.rstrip("/")
def check_authentication(self) -> Dict[str, Any]:
"""測試 if the LiteLLM proxy requires 認證."""
endpoints = [
("/v1/models", "GET"),
("/v1/chat/completions", "POST"),
("/health", "GET"),
("/key/info", "GET"),
]
results = {"requires_auth": True, "open_endpoints": []}
for endpoint, method in endpoints:
try:
if method == "GET":
resp = requests.get(
f"{self.base_url}{endpoint}", timeout=5
)
else:
resp = requests.post(
f"{self.base_url}{endpoint}",
json={"model": "gpt-4", "messages": [{"role": "user", "content": "測試"}]},
timeout=5,
)
if resp.status_code not in (401, 403):
results["requires_auth"] = False
results["open_endpoints"].append({
"endpoint": endpoint,
"method": method,
"status": resp.status_code,
})
except requests.RequestException:
pass
return results
def check_admin_endpoints(self, admin_key: Optional[str] = None) -> Dict:
"""Check for exposed admin endpoints."""
admin_endpoints = [
"/key/generate",
"/key/delete",
"/key/info",
"/user/new",
"/user/info",
"/model/info",
"/model/new",
"/global/spend",
"/global/spend/logs",
]
headers = {}
if admin_key:
headers["Authorization"] = f"Bearer {admin_key}"
findings = []
for endpoint in admin_endpoints:
try:
resp = requests.get(
f"{self.base_url}{endpoint}",
headers=headers,
timeout=5,
)
if resp.status_code == 200:
findings.append({
"endpoint": endpoint,
"accessible": True,
"severity": "high",
"response_preview": str(resp.text)[:200],
})
except requests.RequestException:
pass
return {"admin_endpoints": findings}
def check_model_access(self) -> Dict:
"""Enumerate accessible models and their configurations."""
try:
resp = requests.get(f"{self.base_url}/v1/models", timeout=5)
if resp.status_code == 200:
models = resp.json().get("data", [])
return {
"models_exposed": True,
"model_count": len(models),
"models": [m.get("id") for m in models[:20]],
"severity": "medium",
"finding": "Model list is publicly accessible — reveals infrastructure details",
}
except requests.RequestException:
pass
return {"models_exposed": False}
def check_spend_tracking(self) -> Dict:
"""Check if spend tracking data is accessible."""
try:
resp = requests.get(f"{self.base_url}/global/spend", timeout=5)
if resp.status_code == 200:
return {
"spend_exposed": True,
"severity": "high",
"finding": "Global spend data is publicly accessible",
"data_preview": str(resp.text)[:200],
}
except requests.RequestException:
pass
return {"spend_exposed": False}
def full_scan(self) -> Dict:
"""Run a comprehensive 安全 scan."""
return {
"target": self.base_url,
"認證": self.check_authentication(),
"admin_access": self.check_admin_endpoints(),
"model_exposure": self.check_model_access(),
"spend_exposure": self.check_spend_tracking(),
}Credential Storage 漏洞
LiteLLM stores provider API keys in its configuration file or environment variables. The configuration file often contains plaintext API keys:
# 範例 LiteLLM config.yaml — note the plaintext API keys
# 這是 how many deployments are configured
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: sk-proj-XXXXXXXXXXXXXXXXXXXXXXXXXX # OpenAI key
- model_name: claude-3
litellm_params:
model: anthropic/claude-3-sonnet-20240229
api_key: sk-ant-XXXXXXXXXXXXXXXXXXXXXXXXXX # Anthropic key
- model_name: llama-local
litellm_params:
model: openai/llama-3
api_base: http://vllm-server:8000/v1
general_settings:
master_key: sk-litellm-master-XXXX # Admin key
database_url: postgresql://litellm:password@db:5432/litellmimport os
from pathlib import Path
from typing import Dict, List
import re
class LiteLLMConfigAuditor:
"""Audit LiteLLM configuration for 安全 issues."""
SECRET_PATTERNS = [
(r"sk-[a-zA-Z0-9]{20,}", "OpenAI API key"),
(r"sk-ant-[a-zA-Z0-9-]{20,}", "Anthropic API key"),
(r"AIza[a-zA-Z0-9_-]{35}", "Google API key"),
(r"[a-f0-9]{32}", "Potential API key (32-char hex)"),
(r"password['\"]?\s*[:=]\s*['\"]?[^\s'\"]+", "Hardcoded password"),
]
def audit_config_file(self, config_path: str) -> List[Dict]:
"""Audit a LiteLLM config file for 安全 issues."""
findings = []
path = Path(config_path)
if not path.exists():
return [{"severity": "error", "finding": f"Config file not found: {config_path}"}]
content = path.read_text()
# Check for plaintext secrets
for pattern, description in self.SECRET_PATTERNS:
matches = re.findall(pattern, content)
for match in matches:
findings.append({
"severity": "critical",
"finding": f"Plaintext secret found: {description}",
"value_preview": match[:8] + "..." + match[-4:] if len(match) > 12 else "***",
"remediation": "Use environment variables or a secrets manager instead",
})
# Check file 權限
file_stat = os.stat(config_path)
if file_stat.st_mode & 0o077:
findings.append({
"severity": "high",
"finding": "Config file is readable by group/others",
"權限": oct(file_stat.st_mode)[-3:],
"remediation": "Set 權限 to 600 (owner read/write only)",
})
# Check for 資料庫 URL with embedded credentials
if re.search(r"database_url.*://\w+:\w+@", content):
findings.append({
"severity": "high",
"finding": "資料庫 URL contains embedded credentials",
"remediation": "Use environment variable for 資料庫 URL",
})
# Check for master key strength
master_key_match = re.search(r"master_key['\"]?\s*[:=]\s*['\"]?(\S+)", content)
if master_key_match:
key = master_key_match.group(1).strip("'\"")
if len(key) < 32:
findings.append({
"severity": "high",
"finding": "Master key is too short",
"key_length": len(key),
"remediation": "Use a master key of at least 32 characters",
})
return findings
def audit_environment(self) -> List[Dict]:
"""Audit environment variables for LiteLLM 安全."""
findings = []
required_vars = [
"LITELLM_MASTER_KEY",
"DATABASE_URL",
]
for var in required_vars:
value = os.environ.get(var)
if value is None:
findings.append({
"severity": "medium",
"finding": f"Environment variable {var} not set",
})
elif var == "LITELLM_MASTER_KEY" and len(value) < 32:
findings.append({
"severity": "high",
"finding": f"{var} is too short ({len(value)} chars)",
})
return findingsCommon 漏洞 Patterns
SSRF via Custom API Base
LLM proxies support configuring custom api_base URLs to route requests to different backends. If the proxy does not validate these URLs, 攻擊者 with configuration access can route requests to internal services:
from typing import Dict
from urllib.parse import urlparse
class SSRFValidator:
"""Validate API base URLs to prevent SSRF attacks."""
BLOCKED_HOSTS = {
"169.254.169.254", # AWS metadata
"metadata.google.internal", # GCP metadata
"100.100.100.200", # Alibaba 雲端 metadata
"localhost",
"127.0.0.1",
"0.0.0.0",
"::1",
}
BLOCKED_SCHEMES = {"file", "ftp", "gopher", "dict"}
BLOCKED_PORTS = {22, 25, 445, 3389, 5432, 3306, 6379, 27017}
def validate_api_base(self, url: str) -> Dict:
"""Validate an API base URL for SSRF risks."""
try:
parsed = urlparse(url)
except Exception:
return {"valid": False, "reason": "Invalid URL format"}
# Check scheme
if parsed.scheme in self.BLOCKED_SCHEMES:
return {"valid": False, "reason": f"Blocked scheme: {parsed.scheme}"}
if parsed.scheme not in ("http", "https"):
return {"valid": False, "reason": f"Unsupported scheme: {parsed.scheme}"}
# Check host
hostname = parsed.hostname or ""
if hostname in self.BLOCKED_HOSTS:
return {"valid": False, "reason": f"Blocked host: {hostname}"}
# Check for IP address in private ranges
try:
import ipaddress
ip = ipaddress.ip_address(hostname)
if ip.is_private or ip.is_loopback or ip.is_link_local:
return {"valid": False, "reason": f"Private/internal IP: {hostname}"}
except ValueError:
pass # Not an IP address — hostname is fine
# Check port
port = parsed.port
if port and port in self.BLOCKED_PORTS:
return {"valid": False, "reason": f"Blocked port: {port}"}
return {"valid": True}Prompt Logging and Data Exposure
LLM proxies log prompts and completions for usage tracking, debugging, and cost allocation. This creates a concentrated data store of sensitive information:
import json
import re
from typing import Dict, List
from pathlib import Path
class LogSecurityAuditor:
"""Audit LLM proxy logs for sensitive data exposure."""
PII_PATTERNS = {
"email": re.compile(r"\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b"),
"phone": re.compile(r"\b(?:\+?1[-.]?)?\(?\d{3}\)?[-.]?\d{3}[-.]?\d{4}\b"),
"ssn": re.compile(r"\b\d{3}-\d{2}-\d{4}\b"),
"credit_card": re.compile(r"\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14})\b"),
"api_key": re.compile(r"\b(?:sk-[a-zA-Z0-9]{20,}|AIza[a-zA-Z0-9_-]{35})\b"),
}
def scan_log_file(self, log_path: str, sample_size: int = 1000) -> Dict:
"""Scan proxy logs for PII and sensitive data."""
findings = {pattern: 0 for pattern in self.PII_PATTERNS}
lines_scanned = 0
with open(log_path) as f:
for i, line in enumerate(f):
if i >= sample_size:
break
lines_scanned += 1
for pattern_name, pattern in self.PII_PATTERNS.items():
matches = pattern.findall(line)
findings[pattern_name] += len(matches)
return {
"log_file": log_path,
"lines_scanned": lines_scanned,
"pii_findings": {k: v for k, v in findings.items() if v > 0},
"severity": "critical" if any(findings.values()) else "info",
"recommendation": "Enable PII scrubbing in proxy logging configuration",
}
def check_log_permissions(self, log_dir: str) -> List[Dict]:
"""Check 權限 on proxy log files."""
findings = []
for log_file in Path(log_dir).glob("*.log*"):
stat = os.stat(log_file)
if stat.st_mode & 0o077:
findings.append({
"severity": "high",
"file": str(log_file),
"finding": "Log file readable by group/others",
"權限": oct(stat.st_mode)[-3:],
})
return findingsVirtual Key Bypass
LiteLLM supports virtual keys that map to underlying provider keys. If the virtual key system has 實作 flaws, attackers may be able to bypass key restrictions:
import requests
from typing import Dict
class VirtualKeyTester:
"""測試 virtual key enforcement in LLM proxies."""
def __init__(self, proxy_url: str):
self.base_url = proxy_url.rstrip("/")
def test_key_bypass(self, valid_key: str) -> Dict:
"""測試 for common virtual key bypass patterns."""
results = {"bypasses_found": []}
bypass_attempts = [
# Try with no key
("no_key", {}),
# Try with empty bearer 符元
("empty_bearer", {"Authorization": "Bearer "}),
# Try with 'null' key
("null_key", {"Authorization": "Bearer null"}),
# Try with the master key format
("master_format", {"Authorization": "Bearer sk-litellm-master-key"}),
]
test_payload = {
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 5,
}
for name, headers in bypass_attempts:
try:
resp = requests.post(
f"{self.base_url}/v1/chat/completions",
json=test_payload,
headers=headers,
timeout=10,
)
if resp.status_code == 200:
results["bypasses_found"].append({
"method": name,
"severity": "critical",
"finding": f"Authentication bypassed with method: {name}",
})
except requests.RequestException:
pass
return results
def test_key_scope_enforcement(
self,
limited_key: str,
allowed_model: str,
blocked_model: str,
) -> Dict:
"""測試 if model-scoped virtual keys are properly enforced."""
headers = {"Authorization": f"Bearer {limited_key}"}
test_payload = {
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 5,
}
results = {}
# 測試 allowed model
test_payload["model"] = allowed_model
try:
resp = requests.post(
f"{self.base_url}/v1/chat/completions",
json=test_payload,
headers=headers,
timeout=10,
)
results["allowed_model"] = {
"model": allowed_model,
"status": resp.status_code,
"accessible": resp.status_code == 200,
}
except requests.RequestException as e:
results["allowed_model"] = {"error": str(e)}
# 測試 blocked model
test_payload["model"] = blocked_model
try:
resp = requests.post(
f"{self.base_url}/v1/chat/completions",
json=test_payload,
headers=headers,
timeout=10,
)
results["blocked_model"] = {
"model": blocked_model,
"status": resp.status_code,
"accessible": resp.status_code == 200,
"severity": "high" if resp.status_code == 200 else "info",
}
except requests.RequestException as e:
results["blocked_model"] = {"error": str(e)}
return resultsHardening LLM Proxy Deployments
Secure Configuration
# Hardened LiteLLM config.yaml
# API keys should be in environment variables, not here
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_API_KEY # Reference env var
- model_name: claude-3
litellm_params:
model: anthropic/claude-3-sonnet-20240229
api_key: os.environ/ANTHROPIC_API_KEY
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
# 安全 settings
allow_user_auth: true
enforce_user_param: true
max_budget: 100.0 # Global max spend in USD
budget_duration: "monthly"
litellm_settings:
# Drop excessive parameters that could be used for abuse
drop_params: true
# Set default limits
max_tokens: 4096
# Enable request logging
success_callback: ["langfuse"] # External logging service
failure_callback: ["langfuse"]#!/bin/bash
# Secure LiteLLM deployment with Docker
docker run -d \
--name litellm-proxy \
--restart unless-stopped \
-p 127.0.0.1:4000:4000 \
-v /etc/litellm/config.yaml:/app/config.yaml:ro \
-e OPENAI_API_KEY="${OPENAI_API_KEY}" \
-e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}" \
-e LITELLM_MASTER_KEY="${LITELLM_MASTER_KEY}" \
-e DATABASE_URL="${DATABASE_URL}" \
--read-only \
--tmpfs /tmp:noexec,nosuid,size=100m \
--安全-opt no-new-privileges \
--cap-drop ALL \
--memory 2g \
--cpus 2 \
ghcr.io/berriai/litellm:main-latest \
--config /app/config.yaml \
--port 4000 \
--detailed_debug falseDefensive Recommendations
- Always enable 認證 — set a strong master key and use virtual keys for all clients
- Store API keys in a secrets manager — never in config files or environment files checked into version control
- Bind to localhost and use a reverse proxy with TLS for external access
- Validate API base URLs to prevent SSRF through custom model endpoints
- 實作 PII scrubbing in proxy logs — prompts contain sensitive business data
- Set spending limits per virtual key to prevent billing abuse
- Monitor for anomalous usage patterns — sudden spikes in requests or 符元 consumption
- Restrict admin endpoints to specific IP ranges or networks
- Audit virtual key 權限 regularly — ensure keys are scoped to only required models
- Encrypt the proxy 資料庫 — it contains API keys, usage data, and potentially cached responses
參考文獻
- LiteLLM Documentation — https://docs.litellm.ai/
- LiteLLM GitHub — https://github.com/BerriAI/litellm
- OWASP LLM Top 10 2025 — LLM06 (Excessive Agency), LLM10 (Unbounded Consumption)
- MITRE ATLAS — AML.T0044 (Full ML Model Access)
- NIST AI RMF — Govern 1.4 (安全 controls for AI systems)