LLM Proxy Security
Security assessment of LLM proxy and gateway solutions including LiteLLM, Portkey, and custom API gateways.
Overview
LLM proxies and gateways sit between applications and LLM providers, abstracting the provider-specific API differences behind a unified interface. They enable organizations to switch between providers (OpenAI, Anthropic, Azure, local models), enforce rate limits, log usage, cache responses, and apply content policies — all without changing application code.
The security significance of LLM proxies is that they are credential aggregators and traffic brokers. A single proxy instance holds API keys for every LLM provider the organization uses, processes all prompts and completions, and has the ability to modify requests and responses in transit. Compromising the LLM proxy gives an attacker access to all provider credentials, visibility into all prompts (which may contain sensitive business data), and the ability to manipulate model responses.
Common LLM proxy solutions include LiteLLM (open-source, most widely deployed), Portkey, Helicone, and custom API gateways. This article covers the attack surface of LLM proxy architectures with a focus on LiteLLM (as the most prevalent open-source option), provides assessment techniques, and details hardening measures.
LLM Proxy Architecture
Trust Boundary Analysis
┌─────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Application │────▶│ LLM Proxy │────▶│ LLM Providers │
│ (Client) │ │ (LiteLLM, etc.) │ │ (OpenAI, etc.) │
└─────────────┘ │ │ └──────────────────┘
│ ┌──────────────┐ │
│ │ API Keys DB │ │ ┌──────────────────┐
│ │ Usage Logs │ │ │ Cache Backend │
│ │ Config │ │ │ (Redis, etc.) │
│ └──────────────┘ │ └──────────────────┘
└──────────────────┘
The proxy occupies a privileged position in the data flow:
| Data Access | What the Proxy Sees |
|---|---|
| Provider API keys | All keys for all providers — high-value target |
| Prompts | All user and system prompts, including sensitive business data |
| Completions | All model responses |
| Usage metadata | Who uses what model, how often, for what |
| Application identity | API keys/tokens of all client applications |
Common Proxy Solutions
| Solution | Type | Key Security Features | Default Authentication |
|---|---|---|---|
| LiteLLM | OSS Python | Virtual keys, spend tracking, logging | None (configurable) |
| Portkey | SaaS/OSS | Guardrails, caching, fallbacks | API key |
| Helicone | SaaS | Logging, caching, rate limiting | API key |
| Custom NGINX/Envoy | Infrastructure | TLS, basic auth, rate limiting | Configurable |
LiteLLM Security Assessment
Default Configuration Risks
LiteLLM's proxy server can be started with minimal configuration:
# Minimal LiteLLM proxy startup — examine the security implications
litellm --model gpt-4 --api_base http://0.0.0.0:4000
# Or with a config file:
litellm --config config.yamlimport requests
from typing import Dict, List, Any, Optional
class LiteLLMSecurityScanner:
"""Scan a LiteLLM proxy deployment for security issues."""
def __init__(self, proxy_url: str):
self.base_url = proxy_url.rstrip("/")
def check_authentication(self) -> Dict[str, Any]:
"""Test if the LiteLLM proxy requires authentication."""
endpoints = [
("/v1/models", "GET"),
("/v1/chat/completions", "POST"),
("/health", "GET"),
("/key/info", "GET"),
]
results = {"requires_auth": True, "open_endpoints": []}
for endpoint, method in endpoints:
try:
if method == "GET":
resp = requests.get(
f"{self.base_url}{endpoint}", timeout=5
)
else:
resp = requests.post(
f"{self.base_url}{endpoint}",
json={"model": "gpt-4", "messages": [{"role": "user", "content": "test"}]},
timeout=5,
)
if resp.status_code not in (401, 403):
results["requires_auth"] = False
results["open_endpoints"].append({
"endpoint": endpoint,
"method": method,
"status": resp.status_code,
})
except requests.RequestException:
pass
return results
def check_admin_endpoints(self, admin_key: Optional[str] = None) -> Dict:
"""Check for exposed admin endpoints."""
admin_endpoints = [
"/key/generate",
"/key/delete",
"/key/info",
"/user/new",
"/user/info",
"/model/info",
"/model/new",
"/global/spend",
"/global/spend/logs",
]
headers = {}
if admin_key:
headers["Authorization"] = f"Bearer {admin_key}"
findings = []
for endpoint in admin_endpoints:
try:
resp = requests.get(
f"{self.base_url}{endpoint}",
headers=headers,
timeout=5,
)
if resp.status_code == 200:
findings.append({
"endpoint": endpoint,
"accessible": True,
"severity": "high",
"response_preview": str(resp.text)[:200],
})
except requests.RequestException:
pass
return {"admin_endpoints": findings}
def check_model_access(self) -> Dict:
"""Enumerate accessible models and their configurations."""
try:
resp = requests.get(f"{self.base_url}/v1/models", timeout=5)
if resp.status_code == 200:
models = resp.json().get("data", [])
return {
"models_exposed": True,
"model_count": len(models),
"models": [m.get("id") for m in models[:20]],
"severity": "medium",
"finding": "Model list is publicly accessible — reveals infrastructure details",
}
except requests.RequestException:
pass
return {"models_exposed": False}
def check_spend_tracking(self) -> Dict:
"""Check if spend tracking data is accessible."""
try:
resp = requests.get(f"{self.base_url}/global/spend", timeout=5)
if resp.status_code == 200:
return {
"spend_exposed": True,
"severity": "high",
"finding": "Global spend data is publicly accessible",
"data_preview": str(resp.text)[:200],
}
except requests.RequestException:
pass
return {"spend_exposed": False}
def full_scan(self) -> Dict:
"""Run a comprehensive security scan."""
return {
"target": self.base_url,
"authentication": self.check_authentication(),
"admin_access": self.check_admin_endpoints(),
"model_exposure": self.check_model_access(),
"spend_exposure": self.check_spend_tracking(),
}Credential Storage Vulnerabilities
LiteLLM stores provider API keys in its configuration file or environment variables. The configuration file often contains plaintext API keys:
# Example LiteLLM config.yaml — note the plaintext API keys
# This is how many deployments are configured
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: sk-proj-XXXXXXXXXXXXXXXXXXXXXXXXXX # OpenAI key
- model_name: claude-3
litellm_params:
model: anthropic/claude-3-sonnet-20240229
api_key: sk-ant-XXXXXXXXXXXXXXXXXXXXXXXXXX # Anthropic key
- model_name: llama-local
litellm_params:
model: openai/llama-3
api_base: http://vllm-server:8000/v1
general_settings:
master_key: sk-litellm-master-XXXX # Admin key
database_url: postgresql://litellm:password@db:5432/litellmimport os
from pathlib import Path
from typing import Dict, List
import re
class LiteLLMConfigAuditor:
"""Audit LiteLLM configuration for security issues."""
SECRET_PATTERNS = [
(r"sk-[a-zA-Z0-9]{20,}", "OpenAI API key"),
(r"sk-ant-[a-zA-Z0-9-]{20,}", "Anthropic API key"),
(r"AIza[a-zA-Z0-9_-]{35}", "Google API key"),
(r"[a-f0-9]{32}", "Potential API key (32-char hex)"),
(r"password['\"]?\s*[:=]\s*['\"]?[^\s'\"]+", "Hardcoded password"),
]
def audit_config_file(self, config_path: str) -> List[Dict]:
"""Audit a LiteLLM config file for security issues."""
findings = []
path = Path(config_path)
if not path.exists():
return [{"severity": "error", "finding": f"Config file not found: {config_path}"}]
content = path.read_text()
# Check for plaintext secrets
for pattern, description in self.SECRET_PATTERNS:
matches = re.findall(pattern, content)
for match in matches:
findings.append({
"severity": "critical",
"finding": f"Plaintext secret found: {description}",
"value_preview": match[:8] + "..." + match[-4:] if len(match) > 12 else "***",
"remediation": "Use environment variables or a secrets manager instead",
})
# Check file permissions
file_stat = os.stat(config_path)
if file_stat.st_mode & 0o077:
findings.append({
"severity": "high",
"finding": "Config file is readable by group/others",
"permissions": oct(file_stat.st_mode)[-3:],
"remediation": "Set permissions to 600 (owner read/write only)",
})
# Check for database URL with embedded credentials
if re.search(r"database_url.*://\w+:\w+@", content):
findings.append({
"severity": "high",
"finding": "Database URL contains embedded credentials",
"remediation": "Use environment variable for database URL",
})
# Check for master key strength
master_key_match = re.search(r"master_key['\"]?\s*[:=]\s*['\"]?(\S+)", content)
if master_key_match:
key = master_key_match.group(1).strip("'\"")
if len(key) < 32:
findings.append({
"severity": "high",
"finding": "Master key is too short",
"key_length": len(key),
"remediation": "Use a master key of at least 32 characters",
})
return findings
def audit_environment(self) -> List[Dict]:
"""Audit environment variables for LiteLLM security."""
findings = []
required_vars = [
"LITELLM_MASTER_KEY",
"DATABASE_URL",
]
for var in required_vars:
value = os.environ.get(var)
if value is None:
findings.append({
"severity": "medium",
"finding": f"Environment variable {var} not set",
})
elif var == "LITELLM_MASTER_KEY" and len(value) < 32:
findings.append({
"severity": "high",
"finding": f"{var} is too short ({len(value)} chars)",
})
return findingsCommon Vulnerability Patterns
SSRF via Custom API Base
LLM proxies support configuring custom api_base URLs to route requests to different backends. If the proxy does not validate these URLs, an attacker with configuration access can route requests to internal services:
from typing import Dict
from urllib.parse import urlparse
class SSRFValidator:
"""Validate API base URLs to prevent SSRF attacks."""
BLOCKED_HOSTS = {
"169.254.169.254", # AWS metadata
"metadata.google.internal", # GCP metadata
"100.100.100.200", # Alibaba Cloud metadata
"localhost",
"127.0.0.1",
"0.0.0.0",
"::1",
}
BLOCKED_SCHEMES = {"file", "ftp", "gopher", "dict"}
BLOCKED_PORTS = {22, 25, 445, 3389, 5432, 3306, 6379, 27017}
def validate_api_base(self, url: str) -> Dict:
"""Validate an API base URL for SSRF risks."""
try:
parsed = urlparse(url)
except Exception:
return {"valid": False, "reason": "Invalid URL format"}
# Check scheme
if parsed.scheme in self.BLOCKED_SCHEMES:
return {"valid": False, "reason": f"Blocked scheme: {parsed.scheme}"}
if parsed.scheme not in ("http", "https"):
return {"valid": False, "reason": f"Unsupported scheme: {parsed.scheme}"}
# Check host
hostname = parsed.hostname or ""
if hostname in self.BLOCKED_HOSTS:
return {"valid": False, "reason": f"Blocked host: {hostname}"}
# Check for IP address in private ranges
try:
import ipaddress
ip = ipaddress.ip_address(hostname)
if ip.is_private or ip.is_loopback or ip.is_link_local:
return {"valid": False, "reason": f"Private/internal IP: {hostname}"}
except ValueError:
pass # Not an IP address — hostname is fine
# Check port
port = parsed.port
if port and port in self.BLOCKED_PORTS:
return {"valid": False, "reason": f"Blocked port: {port}"}
return {"valid": True}Prompt Logging and Data Exposure
LLM proxies log prompts and completions for usage tracking, debugging, and cost allocation. This creates a concentrated data store of sensitive information:
import json
import re
from typing import Dict, List
from pathlib import Path
class LogSecurityAuditor:
"""Audit LLM proxy logs for sensitive data exposure."""
PII_PATTERNS = {
"email": re.compile(r"\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b"),
"phone": re.compile(r"\b(?:\+?1[-.]?)?\(?\d{3}\)?[-.]?\d{3}[-.]?\d{4}\b"),
"ssn": re.compile(r"\b\d{3}-\d{2}-\d{4}\b"),
"credit_card": re.compile(r"\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14})\b"),
"api_key": re.compile(r"\b(?:sk-[a-zA-Z0-9]{20,}|AIza[a-zA-Z0-9_-]{35})\b"),
}
def scan_log_file(self, log_path: str, sample_size: int = 1000) -> Dict:
"""Scan proxy logs for PII and sensitive data."""
findings = {pattern: 0 for pattern in self.PII_PATTERNS}
lines_scanned = 0
with open(log_path) as f:
for i, line in enumerate(f):
if i >= sample_size:
break
lines_scanned += 1
for pattern_name, pattern in self.PII_PATTERNS.items():
matches = pattern.findall(line)
findings[pattern_name] += len(matches)
return {
"log_file": log_path,
"lines_scanned": lines_scanned,
"pii_findings": {k: v for k, v in findings.items() if v > 0},
"severity": "critical" if any(findings.values()) else "info",
"recommendation": "Enable PII scrubbing in proxy logging configuration",
}
def check_log_permissions(self, log_dir: str) -> List[Dict]:
"""Check permissions on proxy log files."""
findings = []
for log_file in Path(log_dir).glob("*.log*"):
stat = os.stat(log_file)
if stat.st_mode & 0o077:
findings.append({
"severity": "high",
"file": str(log_file),
"finding": "Log file readable by group/others",
"permissions": oct(stat.st_mode)[-3:],
})
return findingsVirtual Key Bypass
LiteLLM supports virtual keys that map to underlying provider keys. If the virtual key system has implementation flaws, attackers may be able to bypass key restrictions:
import requests
from typing import Dict
class VirtualKeyTester:
"""Test virtual key enforcement in LLM proxies."""
def __init__(self, proxy_url: str):
self.base_url = proxy_url.rstrip("/")
def test_key_bypass(self, valid_key: str) -> Dict:
"""Test for common virtual key bypass patterns."""
results = {"bypasses_found": []}
bypass_attempts = [
# Try with no key
("no_key", {}),
# Try with empty bearer token
("empty_bearer", {"Authorization": "Bearer "}),
# Try with 'null' key
("null_key", {"Authorization": "Bearer null"}),
# Try with the master key format
("master_format", {"Authorization": "Bearer sk-litellm-master-key"}),
]
test_payload = {
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 5,
}
for name, headers in bypass_attempts:
try:
resp = requests.post(
f"{self.base_url}/v1/chat/completions",
json=test_payload,
headers=headers,
timeout=10,
)
if resp.status_code == 200:
results["bypasses_found"].append({
"method": name,
"severity": "critical",
"finding": f"Authentication bypassed with method: {name}",
})
except requests.RequestException:
pass
return results
def test_key_scope_enforcement(
self,
limited_key: str,
allowed_model: str,
blocked_model: str,
) -> Dict:
"""Test if model-scoped virtual keys are properly enforced."""
headers = {"Authorization": f"Bearer {limited_key}"}
test_payload = {
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 5,
}
results = {}
# Test allowed model
test_payload["model"] = allowed_model
try:
resp = requests.post(
f"{self.base_url}/v1/chat/completions",
json=test_payload,
headers=headers,
timeout=10,
)
results["allowed_model"] = {
"model": allowed_model,
"status": resp.status_code,
"accessible": resp.status_code == 200,
}
except requests.RequestException as e:
results["allowed_model"] = {"error": str(e)}
# Test blocked model
test_payload["model"] = blocked_model
try:
resp = requests.post(
f"{self.base_url}/v1/chat/completions",
json=test_payload,
headers=headers,
timeout=10,
)
results["blocked_model"] = {
"model": blocked_model,
"status": resp.status_code,
"accessible": resp.status_code == 200,
"severity": "high" if resp.status_code == 200 else "info",
}
except requests.RequestException as e:
results["blocked_model"] = {"error": str(e)}
return resultsHardening LLM Proxy Deployments
Secure Configuration
# Hardened LiteLLM config.yaml
# API keys should be in environment variables, not here
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_API_KEY # Reference env var
- model_name: claude-3
litellm_params:
model: anthropic/claude-3-sonnet-20240229
api_key: os.environ/ANTHROPIC_API_KEY
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
# Security settings
allow_user_auth: true
enforce_user_param: true
max_budget: 100.0 # Global max spend in USD
budget_duration: "monthly"
litellm_settings:
# Drop excessive parameters that could be used for abuse
drop_params: true
# Set default limits
max_tokens: 4096
# Enable request logging
success_callback: ["langfuse"] # External logging service
failure_callback: ["langfuse"]#!/bin/bash
# Secure LiteLLM deployment with Docker
docker run -d \
--name litellm-proxy \
--restart unless-stopped \
-p 127.0.0.1:4000:4000 \
-v /etc/litellm/config.yaml:/app/config.yaml:ro \
-e OPENAI_API_KEY="${OPENAI_API_KEY}" \
-e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}" \
-e LITELLM_MASTER_KEY="${LITELLM_MASTER_KEY}" \
-e DATABASE_URL="${DATABASE_URL}" \
--read-only \
--tmpfs /tmp:noexec,nosuid,size=100m \
--security-opt no-new-privileges \
--cap-drop ALL \
--memory 2g \
--cpus 2 \
ghcr.io/berriai/litellm:main-latest \
--config /app/config.yaml \
--port 4000 \
--detailed_debug falseDefensive Recommendations
- Always enable authentication — set a strong master key and use virtual keys for all clients
- Store API keys in a secrets manager — never in config files or environment files checked into version control
- Bind to localhost and use a reverse proxy with TLS for external access
- Validate API base URLs to prevent SSRF through custom model endpoints
- Implement PII scrubbing in proxy logs — prompts contain sensitive business data
- Set spending limits per virtual key to prevent billing abuse
- Monitor for anomalous usage patterns — sudden spikes in requests or token consumption
- Restrict admin endpoints to specific IP ranges or networks
- Audit virtual key permissions regularly — ensure keys are scoped to only required models
- Encrypt the proxy database — it contains API keys, usage data, and potentially cached responses
References
- LiteLLM Documentation — https://docs.litellm.ai/
- LiteLLM GitHub — https://github.com/BerriAI/litellm
- OWASP LLM Top 10 2025 — LLM06 (Excessive Agency), LLM10 (Unbounded Consumption)
- MITRE ATLAS — AML.T0044 (Full ML Model Access)
- NIST AI RMF — Govern 1.4 (Security controls for AI systems)