LLM-proxy-beveiliging
Beveiligingsbeoordeling van LLM-proxy- en gateway-oplossingen waaronder LiteLLM, Portkey en custom API-gateways.
Overzicht
LLM-proxy's en gateways zitten tussen applicaties en LLM-providers in en abstraheren de provider-specifieke API-verschillen achter een uniforme interface. Ze stellen organisaties in staat om te wisselen tussen providers (OpenAI, Anthropic, Azure, lokale modellen), rate limits af te dwingen, gebruik te loggen, responses te cachen en contentbeleid toe te passen — allemaal zonder de applicatiecode te wijzigen.
De beveiligingsrelevantie van LLM-proxy's is dat ze credential-aggregators en verkeersmakelaars zijn. Eén enkele proxy-instantie bevat API-keys voor elke LLM-provider die de organisatie gebruikt, verwerkt alle prompts en completions en heeft de mogelijkheid om requests en responses onderweg te wijzigen. Het compromitteren van de LLM-proxy geeft een aanvaller toegang tot alle provider-credentials, zicht op alle prompts (die gevoelige bedrijfsgegevens kunnen bevatten) en de mogelijkheid om modelresponses te manipuleren.
Veelvoorkomende LLM-proxy-oplossingen zijn LiteLLM (open-source, meest uitgerold), Portkey, Helicone en custom API-gateways. Dit artikel behandelt het aanvalsoppervlak van LLM-proxy-architecturen met een focus op LiteLLM (als de meest gangbare open-source-optie), biedt beoordelingstechnieken en gaat in detail in op hardeningmaatregelen.
LLM-proxy-architectuur
Analyse van de vertrouwensgrens
┌─────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Application │────▶│ LLM Proxy │────▶│ LLM Providers │
│ (Client) │ │ (LiteLLM, etc.) │ │ (OpenAI, etc.) │
└─────────────┘ │ │ └──────────────────┘
│ ┌──────────────┐ │
│ │ API Keys DB │ │ ┌──────────────────┐
│ │ Usage Logs │ │ │ Cache Backend │
│ │ Config │ │ │ (Redis, etc.) │
│ └──────────────┘ │ └──────────────────┘
└──────────────────┘
De proxy bekleedt een bevoorrechte positie in de gegevensstroom:
| Gegevenstoegang | Wat de proxy ziet |
|---|---|
| Provider-API-keys | Alle keys voor alle providers — doelwit van hoge waarde |
| Prompts | Alle gebruikers- en systeemprompts, inclusief gevoelige bedrijfsgegevens |
| Completions | Alle modelresponses |
| Gebruiksmetadata | Wie welk model gebruikt, hoe vaak, waarvoor |
| Applicatie-identiteit | API-keys/tokens van alle clientapplicaties |
Veelvoorkomende proxy-oplossingen
| Oplossing | Type | Belangrijkste beveiligingsfuncties | Standaardauthenticatie |
|---|---|---|---|
| LiteLLM | OSS Python | Virtuele keys, uitgaventracking, logging | Geen (configureerbaar) |
| Portkey | SaaS/OSS | Guardrails, caching, fallbacks | API-key |
| Helicone | SaaS | Logging, caching, rate limiting | API-key |
| Custom NGINX/Envoy | Infrastructuur | TLS, basic auth, rate limiting | Configureerbaar |
Beveiligingsbeoordeling van LiteLLM
Risico's van standaardconfiguratie
De proxyserver van LiteLLM kan met minimale configuratie worden gestart:
# Minimal LiteLLM proxy startup — examine the security implications
litellm --model gpt-4 --api_base http://0.0.0.0:4000
# Or with a config file:
litellm --config config.yamlimport requests
from typing import Dict, List, Any, Optional
class LiteLLMSecurityScanner:
"""Scan a LiteLLM proxy deployment for security issues."""
def __init__(self, proxy_url: str):
self.base_url = proxy_url.rstrip("/")
def check_authentication(self) -> Dict[str, Any]:
"""Test if the LiteLLM proxy requires authentication."""
endpoints = [
("/v1/models", "GET"),
("/v1/chat/completions", "POST"),
("/health", "GET"),
("/key/info", "GET"),
]
results = {"requires_auth": True, "open_endpoints": []}
for endpoint, method in endpoints:
try:
if method == "GET":
resp = requests.get(
f"{self.base_url}{endpoint}", timeout=5
)
else:
resp = requests.post(
f"{self.base_url}{endpoint}",
json={"model": "gpt-4", "messages": [{"role": "user", "content": "test"}]},
timeout=5,
)
if resp.status_code not in (401, 403):
results["requires_auth"] = False
results["open_endpoints"].append({
"endpoint": endpoint,
"method": method,
"status": resp.status_code,
})
except requests.RequestException:
pass
return results
def check_admin_endpoints(self, admin_key: Optional[str] = None) -> Dict:
"""Check for exposed admin endpoints."""
admin_endpoints = [
"/key/generate",
"/key/delete",
"/key/info",
"/user/new",
"/user/info",
"/model/info",
"/model/new",
"/global/spend",
"/global/spend/logs",
]
headers = {}
if admin_key:
headers["Authorization"] = f"Bearer {admin_key}"
findings = []
for endpoint in admin_endpoints:
try:
resp = requests.get(
f"{self.base_url}{endpoint}",
headers=headers,
timeout=5,
)
if resp.status_code == 200:
findings.append({
"endpoint": endpoint,
"accessible": True,
"severity": "high",
"response_preview": str(resp.text)[:200],
})
except requests.RequestException:
pass
return {"admin_endpoints": findings}
def check_model_access(self) -> Dict:
"""Enumerate accessible models and their configurations."""
try:
resp = requests.get(f"{self.base_url}/v1/models", timeout=5)
if resp.status_code == 200:
models = resp.json().get("data", [])
return {
"models_exposed": True,
"model_count": len(models),
"models": [m.get("id") for m in models[:20]],
"severity": "medium",
"finding": "Model list is publicly accessible — reveals infrastructure details",
}
except requests.RequestException:
pass
return {"models_exposed": False}
def check_spend_tracking(self) -> Dict:
"""Check if spend tracking data is accessible."""
try:
resp = requests.get(f"{self.base_url}/global/spend", timeout=5)
if resp.status_code == 200:
return {
"spend_exposed": True,
"severity": "high",
"finding": "Global spend data is publicly accessible",
"data_preview": str(resp.text)[:200],
}
except requests.RequestException:
pass
return {"spend_exposed": False}
def full_scan(self) -> Dict:
"""Run a comprehensive security scan."""
return {
"target": self.base_url,
"authentication": self.check_authentication(),
"admin_access": self.check_admin_endpoints(),
"model_exposure": self.check_model_access(),
"spend_exposure": self.check_spend_tracking(),
}Kwetsbaarheden in credential-opslag
LiteLLM slaat provider-API-keys op in zijn configuratiebestand of omgevingsvariabelen. Het configuratiebestand bevat vaak API-keys in platte tekst:
# Example LiteLLM config.yaml — note the plaintext API keys
# This is how many deployments are configured
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: sk-proj-XXXXXXXXXXXXXXXXXXXXXXXXXX # OpenAI key
- model_name: claude-3
litellm_params:
model: anthropic/claude-3-sonnet-20240229
api_key: sk-ant-XXXXXXXXXXXXXXXXXXXXXXXXXX # Anthropic key
- model_name: llama-local
litellm_params:
model: openai/llama-3
api_base: http://vllm-server:8000/v1
general_settings:
master_key: sk-litellm-master-XXXX # Admin key
database_url: postgresql://litellm:password@db:5432/litellmimport os
from pathlib import Path
from typing import Dict, List
import re
class LiteLLMConfigAuditor:
"""Audit LiteLLM configuration for security issues."""
SECRET_PATTERNS = [
(r"sk-[a-zA-Z0-9]{20,}", "OpenAI API key"),
(r"sk-ant-[a-zA-Z0-9-]{20,}", "Anthropic API key"),
(r"AIza[a-zA-Z0-9_-]{35}", "Google API key"),
(r"[a-f0-9]{32}", "Potential API key (32-char hex)"),
(r"password['\"]?\s*[:=]\s*['\"]?[^\s'\"]+", "Hardcoded password"),
]
def audit_config_file(self, config_path: str) -> List[Dict]:
"""Audit a LiteLLM config file for security issues."""
findings = []
path = Path(config_path)
if not path.exists():
return [{"severity": "error", "finding": f"Config file not found: {config_path}"}]
content = path.read_text()
# Check for plaintext secrets
for pattern, description in self.SECRET_PATTERNS:
matches = re.findall(pattern, content)
for match in matches:
findings.append({
"severity": "critical",
"finding": f"Plaintext secret found: {description}",
"value_preview": match[:8] + "..." + match[-4:] if len(match) > 12 else "***",
"remediation": "Use environment variables or a secrets manager instead",
})
# Check file permissions
file_stat = os.stat(config_path)
if file_stat.st_mode & 0o077:
findings.append({
"severity": "high",
"finding": "Config file is readable by group/others",
"permissions": oct(file_stat.st_mode)[-3:],
"remediation": "Set permissions to 600 (owner read/write only)",
})
# Check for database URL with embedded credentials
if re.search(r"database_url.*://\w+:\w+@", content):
findings.append({
"severity": "high",
"finding": "Database URL contains embedded credentials",
"remediation": "Use environment variable for database URL",
})
# Check for master key strength
master_key_match = re.search(r"master_key['\"]?\s*[:=]\s*['\"]?(\S+)", content)
if master_key_match:
key = master_key_match.group(1).strip("'\"")
if len(key) < 32:
findings.append({
"severity": "high",
"finding": "Master key is too short",
"key_length": len(key),
"remediation": "Use a master key of at least 32 characters",
})
return findings
def audit_environment(self) -> List[Dict]:
"""Audit environment variables for LiteLLM security."""
findings = []
required_vars = [
"LITELLM_MASTER_KEY",
"DATABASE_URL",
]
for var in required_vars:
value = os.environ.get(var)
if value is None:
findings.append({
"severity": "medium",
"finding": f"Environment variable {var} not set",
})
elif var == "LITELLM_MASTER_KEY" and len(value) < 32:
findings.append({
"severity": "high",
"finding": f"{var} is too short ({len(value)} chars)",
})
return findingsVeelvoorkomende kwetsbaarheidspatronen
SSRF via custom API-base
LLM-proxy's ondersteunen het configureren van custom api_base-URL's om requests naar verschillende backends te routeren. Als de proxy deze URL's niet valideert, kan een aanvaller met configuratietoegang requests naar interne services routeren:
from typing import Dict
from urllib.parse import urlparse
class SSRFValidator:
"""Validate API base URLs to prevent SSRF attacks."""
BLOCKED_HOSTS = {
"169.254.169.254", # AWS metadata
"metadata.google.internal", # GCP metadata
"100.100.100.200", # Alibaba Cloud metadata
"localhost",
"127.0.0.1",
"0.0.0.0",
"::1",
}
BLOCKED_SCHEMES = {"file", "ftp", "gopher", "dict"}
BLOCKED_PORTS = {22, 25, 445, 3389, 5432, 3306, 6379, 27017}
def validate_api_base(self, url: str) -> Dict:
"""Validate an API base URL for SSRF risks."""
try:
parsed = urlparse(url)
except Exception:
return {"valid": False, "reason": "Invalid URL format"}
# Check scheme
if parsed.scheme in self.BLOCKED_SCHEMES:
return {"valid": False, "reason": f"Blocked scheme: {parsed.scheme}"}
if parsed.scheme not in ("http", "https"):
return {"valid": False, "reason": f"Unsupported scheme: {parsed.scheme}"}
# Check host
hostname = parsed.hostname or ""
if hostname in self.BLOCKED_HOSTS:
return {"valid": False, "reason": f"Blocked host: {hostname}"}
# Check for IP address in private ranges
try:
import ipaddress
ip = ipaddress.ip_address(hostname)
if ip.is_private or ip.is_loopback or ip.is_link_local:
return {"valid": False, "reason": f"Private/internal IP: {hostname}"}
except ValueError:
pass # Not an IP address — hostname is fine
# Check port
port = parsed.port
if port and port in self.BLOCKED_PORTS:
return {"valid": False, "reason": f"Blocked port: {port}"}
return {"valid": True}Promptlogging en blootstelling van gegevens
LLM-proxy's loggen prompts en completions voor gebruikstracking, debugging en kostentoewijzing. Dit creëert een geconcentreerde gegevensopslag van gevoelige informatie:
import json
import re
from typing import Dict, List
from pathlib import Path
class LogSecurityAuditor:
"""Audit LLM proxy logs for sensitive data exposure."""
PII_PATTERNS = {
"email": re.compile(r"\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b"),
"phone": re.compile(r"\b(?:\+?1[-.]?)?\(?\d{3}\)?[-.]?\d{3}[-.]?\d{4}\b"),
"ssn": re.compile(r"\b\d{3}-\d{2}-\d{4}\b"),
"credit_card": re.compile(r"\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14})\b"),
"api_key": re.compile(r"\b(?:sk-[a-zA-Z0-9]{20,}|AIza[a-zA-Z0-9_-]{35})\b"),
}
def scan_log_file(self, log_path: str, sample_size: int = 1000) -> Dict:
"""Scan proxy logs for PII and sensitive data."""
findings = {pattern: 0 for pattern in self.PII_PATTERNS}
lines_scanned = 0
with open(log_path) as f:
for i, line in enumerate(f):
if i >= sample_size:
break
lines_scanned += 1
for pattern_name, pattern in self.PII_PATTERNS.items():
matches = pattern.findall(line)
findings[pattern_name] += len(matches)
return {
"log_file": log_path,
"lines_scanned": lines_scanned,
"pii_findings": {k: v for k, v in findings.items() if v > 0},
"severity": "critical" if any(findings.values()) else "info",
"recommendation": "Enable PII scrubbing in proxy logging configuration",
}
def check_log_permissions(self, log_dir: str) -> List[Dict]:
"""Check permissions on proxy log files."""
findings = []
for log_file in Path(log_dir).glob("*.log*"):
stat = os.stat(log_file)
if stat.st_mode & 0o077:
findings.append({
"severity": "high",
"file": str(log_file),
"finding": "Log file readable by group/others",
"permissions": oct(stat.st_mode)[-3:],
})
return findingsVirtual key bypass
LiteLLM ondersteunt virtuele keys die mappen naar onderliggende provider-keys. Als het virtuele-keysysteem implementatiefouten bevat, kunnen aanvallers mogelijk key-beperkingen omzeilen:
import requests
from typing import Dict
class VirtualKeyTester:
"""Test virtual key enforcement in LLM proxies."""
def __init__(self, proxy_url: str):
self.base_url = proxy_url.rstrip("/")
def test_key_bypass(self, valid_key: str) -> Dict:
"""Test for common virtual key bypass patterns."""
results = {"bypasses_found": []}
bypass_attempts = [
# Try with no key
("no_key", {}),
# Try with empty bearer token
("empty_bearer", {"Authorization": "Bearer "}),
# Try with 'null' key
("null_key", {"Authorization": "Bearer null"}),
# Try with the master key format
("master_format", {"Authorization": "Bearer sk-litellm-master-key"}),
]
test_payload = {
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 5,
}
for name, headers in bypass_attempts:
try:
resp = requests.post(
f"{self.base_url}/v1/chat/completions",
json=test_payload,
headers=headers,
timeout=10,
)
if resp.status_code == 200:
results["bypasses_found"].append({
"method": name,
"severity": "critical",
"finding": f"Authentication bypassed with method: {name}",
})
except requests.RequestException:
pass
return results
def test_key_scope_enforcement(
self,
limited_key: str,
allowed_model: str,
blocked_model: str,
) -> Dict:
"""Test if model-scoped virtual keys are properly enforced."""
headers = {"Authorization": f"Bearer {limited_key}"}
test_payload = {
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 5,
}
results = {}
# Test allowed model
test_payload["model"] = allowed_model
try:
resp = requests.post(
f"{self.base_url}/v1/chat/completions",
json=test_payload,
headers=headers,
timeout=10,
)
results["allowed_model"] = {
"model": allowed_model,
"status": resp.status_code,
"accessible": resp.status_code == 200,
}
except requests.RequestException as e:
results["allowed_model"] = {"error": str(e)}
# Test blocked model
test_payload["model"] = blocked_model
try:
resp = requests.post(
f"{self.base_url}/v1/chat/completions",
json=test_payload,
headers=headers,
timeout=10,
)
results["blocked_model"] = {
"model": blocked_model,
"status": resp.status_code,
"accessible": resp.status_code == 200,
"severity": "high" if resp.status_code == 200 else "info",
}
except requests.RequestException as e:
results["blocked_model"] = {"error": str(e)}
return resultsLLM-proxy-deployments harden
Veilige configuratie
# Hardened LiteLLM config.yaml
# API keys should be in environment variables, not here
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_API_KEY # Reference env var
- model_name: claude-3
litellm_params:
model: anthropic/claude-3-sonnet-20240229
api_key: os.environ/ANTHROPIC_API_KEY
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
# Security settings
allow_user_auth: true
enforce_user_param: true
max_budget: 100.0 # Global max spend in USD
budget_duration: "monthly"
litellm_settings:
# Drop excessive parameters that could be used for abuse
drop_params: true
# Set default limits
max_tokens: 4096
# Enable request logging
success_callback: ["langfuse"] # External logging service
failure_callback: ["langfuse"]#!/bin/bash
# Secure LiteLLM deployment with Docker
docker run -d \
--name litellm-proxy \
--restart unless-stopped \
-p 127.0.0.1:4000:4000 \
-v /etc/litellm/config.yaml:/app/config.yaml:ro \
-e OPENAI_API_KEY="${OPENAI_API_KEY}" \
-e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}" \
-e LITELLM_MASTER_KEY="${LITELLM_MASTER_KEY}" \
-e DATABASE_URL="${DATABASE_URL}" \
--read-only \
--tmpfs /tmp:noexec,nosuid,size=100m \
--security-opt no-new-privileges \
--cap-drop ALL \
--memory 2g \
--cpus 2 \
ghcr.io/berriai/litellm:main-latest \
--config /app/config.yaml \
--port 4000 \
--detailed_debug falseDefensieve aanbevelingen
- Schakel altijd authenticatie in — stel een sterke master key in en gebruik virtuele keys voor alle clients
- Bewaar API-keys in een secrets manager — nooit in configuratiebestanden of omgevingsbestanden die in versiebeheer worden ingecheckt
- Bind aan localhost en gebruik een reverse proxy met TLS voor externe toegang
- Valideer API-base-URL's om SSRF via custom model-endpoints te voorkomen
- Implementeer PII-scrubbing in proxylogs — prompts bevatten gevoelige bedrijfsgegevens
- Stel uitgavenlimieten in per virtuele key om factureringsmisbruik te voorkomen
- Monitor op afwijkende gebruikspatronen — plotselinge pieken in requests of tokenverbruik
- Beperk admin-endpoints tot specifieke IP-ranges of netwerken
- Audit de permissies van virtuele keys regelmatig — zorg dat keys gescoped zijn tot alleen de vereiste modellen
- Versleutel de proxydatabase — deze bevat API-keys, gebruiksgegevens en mogelijk gecachte responses
Referenties
- LiteLLM Documentation — https://docs.litellm.ai/
- LiteLLM GitHub — https://github.com/BerriAI/litellm
- OWASP LLM Top 10 2025 — LLM06 (Excessive Agency), LLM10 (Unbounded Consumption)
- MITRE ATLAS — AML.T0044 (Full ML Model Access)
- NIST AI RMF — Govern 1.4 (Security controls for AI systems)