LLM Proxy Security

intermediate12 min readUpdated 2026-03-20

Security assessment of LLM proxy and gateway solutions including LiteLLM, Portkey, and custom API gateways.

infrastructure llm-proxy api-gateway litellm

Overview

LLM proxies and gateways sit between applications and LLM providers, abstracting the provider-specific API differences behind a unified interface. They enable organizations to switch between providers (OpenAI, Anthropic, Azure, local models), enforce rate limits, log usage, cache responses, and apply content policies — all without changing application code.

The security significance of LLM proxies is that they are credential aggregators and traffic brokers. A single proxy instance holds API keys for every LLM provider the organization uses, processes all prompts and completions, and has the ability to modify requests and responses in transit. Compromising the LLM proxy gives an attacker access to all provider credentials, visibility into all prompts (which may contain sensitive business data), and the ability to manipulate model responses.

Common LLM proxy solutions include LiteLLM (open-source, most widely deployed), Portkey, Helicone, and custom API gateways. This article covers the attack surface of LLM proxy architectures with a focus on LiteLLM (as the most prevalent open-source option), provides assessment techniques, and details hardening measures.

LLM Proxy Architecture

Trust Boundary Analysis

┌─────────────┐     ┌──────────────────┐     ┌──────────────────┐
│ Application  │────▶│   LLM Proxy      │────▶│ LLM Providers    │
│ (Client)     │     │ (LiteLLM, etc.)  │     │ (OpenAI, etc.)   │
└─────────────┘     │                  │     └──────────────────┘
                    │ ┌──────────────┐ │
                    │ │ API Keys DB  │ │     ┌──────────────────┐
                    │ │ Usage Logs   │ │     │ Cache Backend    │
                    │ │ Config       │ │     │ (Redis, etc.)    │
                    │ └──────────────┘ │     └──────────────────┘
                    └──────────────────┘

The proxy occupies a privileged position in the data flow:

Data Access	What the Proxy Sees
Provider API keys	All keys for all providers — high-value target
Prompts	All user and system prompts, including sensitive business data
Completions	All model responses
Usage metadata	Who uses what model, how often, for what
Application identity	API keys/tokens of all client applications

Common Proxy Solutions

Solution	Type	Key Security Features	Default Authentication
LiteLLM	OSS Python	Virtual keys, spend tracking, logging	None (configurable)
Portkey	SaaS/OSS	Guardrails, caching, fallbacks	API key
Helicone	SaaS	Logging, caching, rate limiting	API key
Custom NGINX/Envoy	Infrastructure	TLS, basic auth, rate limiting	Configurable

LiteLLM Security Assessment

Default Configuration Risks

LiteLLM's proxy server can be started with minimal configuration:

# Minimal LiteLLM proxy startup — examine the security implications
litellm --model gpt-4 --api_base http://0.0.0.0:4000
 
# Or with a config file:
litellm --config config.yaml

import requests
from typing import Dict, List, Any, Optional
 
class LiteLLMSecurityScanner:
    """Scan a LiteLLM proxy deployment for security issues."""
 
    def __init__(self, proxy_url: str):
        self.base_url = proxy_url.rstrip("/")
 
    def check_authentication(self) -> Dict[str, Any]:
        """Test if the LiteLLM proxy requires authentication."""
        endpoints = [
            ("/v1/models", "GET"),
            ("/v1/chat/completions", "POST"),
            ("/health", "GET"),
            ("/key/info", "GET"),
        ]
 
        results = {"requires_auth": True, "open_endpoints": []}
 
        for endpoint, method in endpoints:
            try:
                if method == "GET":
                    resp = requests.get(
                        f"{self.base_url}{endpoint}", timeout=5
                    )
                else:
                    resp = requests.post(
                        f"{self.base_url}{endpoint}",
                        json={"model": "gpt-4", "messages": [{"role": "user", "content": "test"}]},
                        timeout=5,
                    )
 
                if resp.status_code not in (401, 403):
                    results["requires_auth"] = False
                    results["open_endpoints"].append({
                        "endpoint": endpoint,
                        "method": method,
                        "status": resp.status_code,
                    })
            except requests.RequestException:
                pass
 
        return results
 
    def check_admin_endpoints(self, admin_key: Optional[str] = None) -> Dict:
        """Check for exposed admin endpoints."""
        admin_endpoints = [
            "/key/generate",
            "/key/delete",
            "/key/info",
            "/user/new",
            "/user/info",
            "/model/info",
            "/model/new",
            "/global/spend",
            "/global/spend/logs",
        ]
 
        headers = {}
        if admin_key:
            headers["Authorization"] = f"Bearer {admin_key}"
 
        findings = []
        for endpoint in admin_endpoints:
            try:
                resp = requests.get(
                    f"{self.base_url}{endpoint}",
                    headers=headers,
                    timeout=5,
                )
                if resp.status_code == 200:
                    findings.append({
                        "endpoint": endpoint,
                        "accessible": True,
                        "severity": "high",
                        "response_preview": str(resp.text)[:200],
                    })
            except requests.RequestException:
                pass
 
        return {"admin_endpoints": findings}
 
    def check_model_access(self) -> Dict:
        """Enumerate accessible models and their configurations."""
        try:
            resp = requests.get(f"{self.base_url}/v1/models", timeout=5)
            if resp.status_code == 200:
                models = resp.json().get("data", [])
                return {
                    "models_exposed": True,
                    "model_count": len(models),
                    "models": [m.get("id") for m in models[:20]],
                    "severity": "medium",
                    "finding": "Model list is publicly accessible — reveals infrastructure details",
                }
        except requests.RequestException:
            pass
        return {"models_exposed": False}
 
    def check_spend_tracking(self) -> Dict:
        """Check if spend tracking data is accessible."""
        try:
            resp = requests.get(f"{self.base_url}/global/spend", timeout=5)
            if resp.status_code == 200:
                return {
                    "spend_exposed": True,
                    "severity": "high",
                    "finding": "Global spend data is publicly accessible",
                    "data_preview": str(resp.text)[:200],
                }
        except requests.RequestException:
            pass
        return {"spend_exposed": False}
 
    def full_scan(self) -> Dict:
        """Run a comprehensive security scan."""
        return {
            "target": self.base_url,
            "authentication": self.check_authentication(),
            "admin_access": self.check_admin_endpoints(),
            "model_exposure": self.check_model_access(),
            "spend_exposure": self.check_spend_tracking(),
        }

Credential Storage Vulnerabilities

LiteLLM stores provider API keys in its configuration file or environment variables. The configuration file often contains plaintext API keys:

# Example LiteLLM config.yaml — note the plaintext API keys
# This is how many deployments are configured
model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: sk-proj-XXXXXXXXXXXXXXXXXXXXXXXXXX  # OpenAI key
  - model_name: claude-3
    litellm_params:
      model: anthropic/claude-3-sonnet-20240229
      api_key: sk-ant-XXXXXXXXXXXXXXXXXXXXXXXXXX  # Anthropic key
  - model_name: llama-local
    litellm_params:
      model: openai/llama-3
      api_base: http://vllm-server:8000/v1
 
general_settings:
  master_key: sk-litellm-master-XXXX  # Admin key
  database_url: postgresql://litellm:password@db:5432/litellm

import os
from pathlib import Path
from typing import Dict, List
import re
 
class LiteLLMConfigAuditor:
    """Audit LiteLLM configuration for security issues."""
 
    SECRET_PATTERNS = [
        (r"sk-[a-zA-Z0-9]{20,}", "OpenAI API key"),
        (r"sk-ant-[a-zA-Z0-9-]{20,}", "Anthropic API key"),
        (r"AIza[a-zA-Z0-9_-]{35}", "Google API key"),
        (r"[a-f0-9]{32}", "Potential API key (32-char hex)"),
        (r"password['\"]?\s*[:=]\s*['\"]?[^\s'\"]+", "Hardcoded password"),
    ]
 
    def audit_config_file(self, config_path: str) -> List[Dict]:
        """Audit a LiteLLM config file for security issues."""
        findings = []
        path = Path(config_path)
 
        if not path.exists():
            return [{"severity": "error", "finding": f"Config file not found: {config_path}"}]
 
        content = path.read_text()
 
        # Check for plaintext secrets
        for pattern, description in self.SECRET_PATTERNS:
            matches = re.findall(pattern, content)
            for match in matches:
                findings.append({
                    "severity": "critical",
                    "finding": f"Plaintext secret found: {description}",
                    "value_preview": match[:8] + "..." + match[-4:] if len(match) > 12 else "***",
                    "remediation": "Use environment variables or a secrets manager instead",
                })
 
        # Check file permissions
        file_stat = os.stat(config_path)
        if file_stat.st_mode & 0o077:
            findings.append({
                "severity": "high",
                "finding": "Config file is readable by group/others",
                "permissions": oct(file_stat.st_mode)[-3:],
                "remediation": "Set permissions to 600 (owner read/write only)",
            })
 
        # Check for database URL with embedded credentials
        if re.search(r"database_url.*://\w+:\w+@", content):
            findings.append({
                "severity": "high",
                "finding": "Database URL contains embedded credentials",
                "remediation": "Use environment variable for database URL",
            })
 
        # Check for master key strength
        master_key_match = re.search(r"master_key['\"]?\s*[:=]\s*['\"]?(\S+)", content)
        if master_key_match:
            key = master_key_match.group(1).strip("'\"")
            if len(key) < 32:
                findings.append({
                    "severity": "high",
                    "finding": "Master key is too short",
                    "key_length": len(key),
                    "remediation": "Use a master key of at least 32 characters",
                })
 
        return findings
 
    def audit_environment(self) -> List[Dict]:
        """Audit environment variables for LiteLLM security."""
        findings = []
 
        required_vars = [
            "LITELLM_MASTER_KEY",
            "DATABASE_URL",
        ]
 
        for var in required_vars:
            value = os.environ.get(var)
            if value is None:
                findings.append({
                    "severity": "medium",
                    "finding": f"Environment variable {var} not set",
                })
            elif var == "LITELLM_MASTER_KEY" and len(value) < 32:
                findings.append({
                    "severity": "high",
                    "finding": f"{var} is too short ({len(value)} chars)",
                })
 
        return findings

Common Vulnerability Patterns

SSRF via Custom API Base

LLM proxies support configuring custom api_base URLs to route requests to different backends. If the proxy does not validate these URLs, an attacker with configuration access can route requests to internal services:

from typing import Dict
from urllib.parse import urlparse
 
class SSRFValidator:
    """Validate API base URLs to prevent SSRF attacks."""
 
    BLOCKED_HOSTS = {
        "169.254.169.254",  # AWS metadata
        "metadata.google.internal",  # GCP metadata
        "100.100.100.200",  # Alibaba Cloud metadata
        "localhost",
        "127.0.0.1",
        "0.0.0.0",
        "::1",
    }
 
    BLOCKED_SCHEMES = {"file", "ftp", "gopher", "dict"}
 
    BLOCKED_PORTS = {22, 25, 445, 3389, 5432, 3306, 6379, 27017}
 
    def validate_api_base(self, url: str) -> Dict:
        """Validate an API base URL for SSRF risks."""
        try:
            parsed = urlparse(url)
        except Exception:
            return {"valid": False, "reason": "Invalid URL format"}
 
        # Check scheme
        if parsed.scheme in self.BLOCKED_SCHEMES:
            return {"valid": False, "reason": f"Blocked scheme: {parsed.scheme}"}
 
        if parsed.scheme not in ("http", "https"):
            return {"valid": False, "reason": f"Unsupported scheme: {parsed.scheme}"}
 
        # Check host
        hostname = parsed.hostname or ""
        if hostname in self.BLOCKED_HOSTS:
            return {"valid": False, "reason": f"Blocked host: {hostname}"}
 
        # Check for IP address in private ranges
        try:
            import ipaddress
            ip = ipaddress.ip_address(hostname)
            if ip.is_private or ip.is_loopback or ip.is_link_local:
                return {"valid": False, "reason": f"Private/internal IP: {hostname}"}
        except ValueError:
            pass  # Not an IP address — hostname is fine
 
        # Check port
        port = parsed.port
        if port and port in self.BLOCKED_PORTS:
            return {"valid": False, "reason": f"Blocked port: {port}"}
 
        return {"valid": True}

Prompt Logging and Data Exposure

LLM proxies log prompts and completions for usage tracking, debugging, and cost allocation. This creates a concentrated data store of sensitive information:

import json
import re
from typing import Dict, List
from pathlib import Path
 
class LogSecurityAuditor:
    """Audit LLM proxy logs for sensitive data exposure."""
 
    PII_PATTERNS = {
        "email": re.compile(r"\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b"),
        "phone": re.compile(r"\b(?:\+?1[-.]?)?\(?\d{3}\)?[-.]?\d{3}[-.]?\d{4}\b"),
        "ssn": re.compile(r"\b\d{3}-\d{2}-\d{4}\b"),
        "credit_card": re.compile(r"\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14})\b"),
        "api_key": re.compile(r"\b(?:sk-[a-zA-Z0-9]{20,}|AIza[a-zA-Z0-9_-]{35})\b"),
    }
 
    def scan_log_file(self, log_path: str, sample_size: int = 1000) -> Dict:
        """Scan proxy logs for PII and sensitive data."""
        findings = {pattern: 0 for pattern in self.PII_PATTERNS}
        lines_scanned = 0
 
        with open(log_path) as f:
            for i, line in enumerate(f):
                if i >= sample_size:
                    break
                lines_scanned += 1
 
                for pattern_name, pattern in self.PII_PATTERNS.items():
                    matches = pattern.findall(line)
                    findings[pattern_name] += len(matches)
 
        return {
            "log_file": log_path,
            "lines_scanned": lines_scanned,
            "pii_findings": {k: v for k, v in findings.items() if v > 0},
            "severity": "critical" if any(findings.values()) else "info",
            "recommendation": "Enable PII scrubbing in proxy logging configuration",
        }
 
    def check_log_permissions(self, log_dir: str) -> List[Dict]:
        """Check permissions on proxy log files."""
        findings = []
        for log_file in Path(log_dir).glob("*.log*"):
            stat = os.stat(log_file)
            if stat.st_mode & 0o077:
                findings.append({
                    "severity": "high",
                    "file": str(log_file),
                    "finding": "Log file readable by group/others",
                    "permissions": oct(stat.st_mode)[-3:],
                })
        return findings

Virtual Key Bypass

LiteLLM supports virtual keys that map to underlying provider keys. If the virtual key system has implementation flaws, attackers may be able to bypass key restrictions:

import requests
from typing import Dict
 
class VirtualKeyTester:
    """Test virtual key enforcement in LLM proxies."""
 
    def __init__(self, proxy_url: str):
        self.base_url = proxy_url.rstrip("/")
 
    def test_key_bypass(self, valid_key: str) -> Dict:
        """Test for common virtual key bypass patterns."""
        results = {"bypasses_found": []}
 
        bypass_attempts = [
            # Try with no key
            ("no_key", {}),
            # Try with empty bearer token
            ("empty_bearer", {"Authorization": "Bearer "}),
            # Try with 'null' key
            ("null_key", {"Authorization": "Bearer null"}),
            # Try with the master key format
            ("master_format", {"Authorization": "Bearer sk-litellm-master-key"}),
        ]
 
        test_payload = {
            "model": "gpt-4",
            "messages": [{"role": "user", "content": "Hello"}],
            "max_tokens": 5,
        }
 
        for name, headers in bypass_attempts:
            try:
                resp = requests.post(
                    f"{self.base_url}/v1/chat/completions",
                    json=test_payload,
                    headers=headers,
                    timeout=10,
                )
                if resp.status_code == 200:
                    results["bypasses_found"].append({
                        "method": name,
                        "severity": "critical",
                        "finding": f"Authentication bypassed with method: {name}",
                    })
            except requests.RequestException:
                pass
 
        return results
 
    def test_key_scope_enforcement(
        self,
        limited_key: str,
        allowed_model: str,
        blocked_model: str,
    ) -> Dict:
        """Test if model-scoped virtual keys are properly enforced."""
        headers = {"Authorization": f"Bearer {limited_key}"}
        test_payload = {
            "messages": [{"role": "user", "content": "Hello"}],
            "max_tokens": 5,
        }
 
        results = {}
 
        # Test allowed model
        test_payload["model"] = allowed_model
        try:
            resp = requests.post(
                f"{self.base_url}/v1/chat/completions",
                json=test_payload,
                headers=headers,
                timeout=10,
            )
            results["allowed_model"] = {
                "model": allowed_model,
                "status": resp.status_code,
                "accessible": resp.status_code == 200,
            }
        except requests.RequestException as e:
            results["allowed_model"] = {"error": str(e)}
 
        # Test blocked model
        test_payload["model"] = blocked_model
        try:
            resp = requests.post(
                f"{self.base_url}/v1/chat/completions",
                json=test_payload,
                headers=headers,
                timeout=10,
            )
            results["blocked_model"] = {
                "model": blocked_model,
                "status": resp.status_code,
                "accessible": resp.status_code == 200,
                "severity": "high" if resp.status_code == 200 else "info",
            }
        except requests.RequestException as e:
            results["blocked_model"] = {"error": str(e)}
 
        return results

Hardening LLM Proxy Deployments

Secure Configuration

# Hardened LiteLLM config.yaml
# API keys should be in environment variables, not here
 
model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY  # Reference env var
  - model_name: claude-3
    litellm_params:
      model: anthropic/claude-3-sonnet-20240229
      api_key: os.environ/ANTHROPIC_API_KEY
 
general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  database_url: os.environ/DATABASE_URL
  # Security settings
  allow_user_auth: true
  enforce_user_param: true
  max_budget: 100.0  # Global max spend in USD
  budget_duration: "monthly"
 
litellm_settings:
  # Drop excessive parameters that could be used for abuse
  drop_params: true
  # Set default limits
  max_tokens: 4096
  # Enable request logging
  success_callback: ["langfuse"]  # External logging service
  failure_callback: ["langfuse"]

#!/bin/bash
# Secure LiteLLM deployment with Docker
 
docker run -d \
  --name litellm-proxy \
  --restart unless-stopped \
  -p 127.0.0.1:4000:4000 \
  -v /etc/litellm/config.yaml:/app/config.yaml:ro \
  -e OPENAI_API_KEY="${OPENAI_API_KEY}" \
  -e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}" \
  -e LITELLM_MASTER_KEY="${LITELLM_MASTER_KEY}" \
  -e DATABASE_URL="${DATABASE_URL}" \
  --read-only \
  --tmpfs /tmp:noexec,nosuid,size=100m \
  --security-opt no-new-privileges \
  --cap-drop ALL \
  --memory 2g \
  --cpus 2 \
  ghcr.io/berriai/litellm:main-latest \
  --config /app/config.yaml \
  --port 4000 \
  --detailed_debug false

Defensive Recommendations

Always enable authentication — set a strong master key and use virtual keys for all clients
Store API keys in a secrets manager — never in config files or environment files checked into version control
Bind to localhost and use a reverse proxy with TLS for external access
Validate API base URLs to prevent SSRF through custom model endpoints
Implement PII scrubbing in proxy logs — prompts contain sensitive business data
Set spending limits per virtual key to prevent billing abuse
Monitor for anomalous usage patterns — sudden spikes in requests or token consumption
Restrict admin endpoints to specific IP ranges or networks
Audit virtual key permissions regularly — ensure keys are scoped to only required models
Encrypt the proxy database — it contains API keys, usage data, and potentially cached responses

References

LiteLLM Documentation — https://docs.litellm.ai/
LiteLLM GitHub — https://github.com/BerriAI/litellm
OWASP LLM Top 10 2025 — LLM06 (Excessive Agency), LLM10 (Unbounded Consumption)
MITRE ATLAS — AML.T0044 (Full ML Model Access)
NIST AI RMF — Govern 1.4 (Security controls for AI systems)

Edit this page on GitHub

LLM Proxy Security

intermediate12 min readUpdated 2026-03-20

Security assessment of LLM proxy and gateway solutions including LiteLLM, Portkey, and custom API gateways.

infrastructure llm-proxy api-gateway litellm

Overview

LLM Proxy Architecture

Trust Boundary Analysis

┌─────────────┐     ┌──────────────────┐     ┌──────────────────┐
│ Application  │────▶│   LLM Proxy      │────▶│ LLM Providers    │
│ (Client)     │     │ (LiteLLM, etc.)  │     │ (OpenAI, etc.)   │
└─────────────┘     │                  │     └──────────────────┘
                    │ ┌──────────────┐ │
                    │ │ API Keys DB  │ │     ┌──────────────────┐
                    │ │ Usage Logs   │ │     │ Cache Backend    │
                    │ │ Config       │ │     │ (Redis, etc.)    │
                    │ └──────────────┘ │     └──────────────────┘
                    └──────────────────┘

The proxy occupies a privileged position in the data flow:

Data Access	What the Proxy Sees
Provider API keys	All keys for all providers — high-value target
Prompts	All user and system prompts, including sensitive business data
Completions	All model responses
Usage metadata	Who uses what model, how often, for what
Application identity	API keys/tokens of all client applications

Common Proxy Solutions

Solution	Type	Key Security Features	Default Authentication
LiteLLM	OSS Python	Virtual keys, spend tracking, logging	None (configurable)
Portkey	SaaS/OSS	Guardrails, caching, fallbacks	API key
Helicone	SaaS	Logging, caching, rate limiting	API key
Custom NGINX/Envoy	Infrastructure	TLS, basic auth, rate limiting	Configurable

LiteLLM Security Assessment

Default Configuration Risks

LiteLLM's proxy server can be started with minimal configuration:

# Minimal LiteLLM proxy startup — examine the security implications
litellm --model gpt-4 --api_base http://0.0.0.0:4000
 
# Or with a config file:
litellm --config config.yaml

import requests
from typing import Dict, List, Any, Optional
 
class LiteLLMSecurityScanner:
    """Scan a LiteLLM proxy deployment for security issues."""
 
    def __init__(self, proxy_url: str):
        self.base_url = proxy_url.rstrip("/")
 
    def check_authentication(self) -> Dict[str, Any]:
        """Test if the LiteLLM proxy requires authentication."""
        endpoints = [
            ("/v1/models", "GET"),
            ("/v1/chat/completions", "POST"),
            ("/health", "GET"),
            ("/key/info", "GET"),
        ]
 
        results = {"requires_auth": True, "open_endpoints": []}
 
        for endpoint, method in endpoints:
            try:
                if method == "GET":
                    resp = requests.get(
                        f"{self.base_url}{endpoint}", timeout=5
                    )
                else:
                    resp = requests.post(
                        f"{self.base_url}{endpoint}",
                        json={"model": "gpt-4", "messages": [{"role": "user", "content": "test"}]},
                        timeout=5,
                    )
 
                if resp.status_code not in (401, 403):
                    results["requires_auth"] = False
                    results["open_endpoints"].append({
                        "endpoint": endpoint,
                        "method": method,
                        "status": resp.status_code,
                    })
            except requests.RequestException:
                pass
 
        return results
 
    def check_admin_endpoints(self, admin_key: Optional[str] = None) -> Dict:
        """Check for exposed admin endpoints."""
        admin_endpoints = [
            "/key/generate",
            "/key/delete",
            "/key/info",
            "/user/new",
            "/user/info",
            "/model/info",
            "/model/new",
            "/global/spend",
            "/global/spend/logs",
        ]
 
        headers = {}
        if admin_key:
            headers["Authorization"] = f"Bearer {admin_key}"
 
        findings = []
        for endpoint in admin_endpoints:
            try:
                resp = requests.get(
                    f"{self.base_url}{endpoint}",
                    headers=headers,
                    timeout=5,
                )
                if resp.status_code == 200:
                    findings.append({
                        "endpoint": endpoint,
                        "accessible": True,
                        "severity": "high",
                        "response_preview": str(resp.text)[:200],
                    })
            except requests.RequestException:
                pass
 
        return {"admin_endpoints": findings}
 
    def check_model_access(self) -> Dict:
        """Enumerate accessible models and their configurations."""
        try:
            resp = requests.get(f"{self.base_url}/v1/models", timeout=5)
            if resp.status_code == 200:
                models = resp.json().get("data", [])
                return {
                    "models_exposed": True,
                    "model_count": len(models),
                    "models": [m.get("id") for m in models[:20]],
                    "severity": "medium",
                    "finding": "Model list is publicly accessible — reveals infrastructure details",
                }
        except requests.RequestException:
            pass
        return {"models_exposed": False}
 
    def check_spend_tracking(self) -> Dict:
        """Check if spend tracking data is accessible."""
        try:
            resp = requests.get(f"{self.base_url}/global/spend", timeout=5)
            if resp.status_code == 200:
                return {
                    "spend_exposed": True,
                    "severity": "high",
                    "finding": "Global spend data is publicly accessible",
                    "data_preview": str(resp.text)[:200],
                }
        except requests.RequestException:
            pass
        return {"spend_exposed": False}
 
    def full_scan(self) -> Dict:
        """Run a comprehensive security scan."""
        return {
            "target": self.base_url,
            "authentication": self.check_authentication(),
            "admin_access": self.check_admin_endpoints(),
            "model_exposure": self.check_model_access(),
            "spend_exposure": self.check_spend_tracking(),
        }

Credential Storage Vulnerabilities

LiteLLM stores provider API keys in its configuration file or environment variables. The configuration file often contains plaintext API keys:

# Example LiteLLM config.yaml — note the plaintext API keys
# This is how many deployments are configured
model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: sk-proj-XXXXXXXXXXXXXXXXXXXXXXXXXX  # OpenAI key
  - model_name: claude-3
    litellm_params:
      model: anthropic/claude-3-sonnet-20240229
      api_key: sk-ant-XXXXXXXXXXXXXXXXXXXXXXXXXX  # Anthropic key
  - model_name: llama-local
    litellm_params:
      model: openai/llama-3
      api_base: http://vllm-server:8000/v1
 
general_settings:
  master_key: sk-litellm-master-XXXX  # Admin key
  database_url: postgresql://litellm:password@db:5432/litellm

import os
from pathlib import Path
from typing import Dict, List
import re
 
class LiteLLMConfigAuditor:
    """Audit LiteLLM configuration for security issues."""
 
    SECRET_PATTERNS = [
        (r"sk-[a-zA-Z0-9]{20,}", "OpenAI API key"),
        (r"sk-ant-[a-zA-Z0-9-]{20,}", "Anthropic API key"),
        (r"AIza[a-zA-Z0-9_-]{35}", "Google API key"),
        (r"[a-f0-9]{32}", "Potential API key (32-char hex)"),
        (r"password['\"]?\s*[:=]\s*['\"]?[^\s'\"]+", "Hardcoded password"),
    ]
 
    def audit_config_file(self, config_path: str) -> List[Dict]:
        """Audit a LiteLLM config file for security issues."""
        findings = []
        path = Path(config_path)
 
        if not path.exists():
            return [{"severity": "error", "finding": f"Config file not found: {config_path}"}]
 
        content = path.read_text()
 
        # Check for plaintext secrets
        for pattern, description in self.SECRET_PATTERNS:
            matches = re.findall(pattern, content)
            for match in matches:
                findings.append({
                    "severity": "critical",
                    "finding": f"Plaintext secret found: {description}",
                    "value_preview": match[:8] + "..." + match[-4:] if len(match) > 12 else "***",
                    "remediation": "Use environment variables or a secrets manager instead",
                })
 
        # Check file permissions
        file_stat = os.stat(config_path)
        if file_stat.st_mode & 0o077:
            findings.append({
                "severity": "high",
                "finding": "Config file is readable by group/others",
                "permissions": oct(file_stat.st_mode)[-3:],
                "remediation": "Set permissions to 600 (owner read/write only)",
            })
 
        # Check for database URL with embedded credentials
        if re.search(r"database_url.*://\w+:\w+@", content):
            findings.append({
                "severity": "high",
                "finding": "Database URL contains embedded credentials",
                "remediation": "Use environment variable for database URL",
            })
 
        # Check for master key strength
        master_key_match = re.search(r"master_key['\"]?\s*[:=]\s*['\"]?(\S+)", content)
        if master_key_match:
            key = master_key_match.group(1).strip("'\"")
            if len(key) < 32:
                findings.append({
                    "severity": "high",
                    "finding": "Master key is too short",
                    "key_length": len(key),
                    "remediation": "Use a master key of at least 32 characters",
                })
 
        return findings
 
    def audit_environment(self) -> List[Dict]:
        """Audit environment variables for LiteLLM security."""
        findings = []
 
        required_vars = [
            "LITELLM_MASTER_KEY",
            "DATABASE_URL",
        ]
 
        for var in required_vars:
            value = os.environ.get(var)
            if value is None:
                findings.append({
                    "severity": "medium",
                    "finding": f"Environment variable {var} not set",
                })
            elif var == "LITELLM_MASTER_KEY" and len(value) < 32:
                findings.append({
                    "severity": "high",
                    "finding": f"{var} is too short ({len(value)} chars)",
                })
 
        return findings

Common Vulnerability Patterns

SSRF via Custom API Base

from typing import Dict
from urllib.parse import urlparse
 
class SSRFValidator:
    """Validate API base URLs to prevent SSRF attacks."""
 
    BLOCKED_HOSTS = {
        "169.254.169.254",  # AWS metadata
        "metadata.google.internal",  # GCP metadata
        "100.100.100.200",  # Alibaba Cloud metadata
        "localhost",
        "127.0.0.1",
        "0.0.0.0",
        "::1",
    }
 
    BLOCKED_SCHEMES = {"file", "ftp", "gopher", "dict"}
 
    BLOCKED_PORTS = {22, 25, 445, 3389, 5432, 3306, 6379, 27017}
 
    def validate_api_base(self, url: str) -> Dict:
        """Validate an API base URL for SSRF risks."""
        try:
            parsed = urlparse(url)
        except Exception:
            return {"valid": False, "reason": "Invalid URL format"}
 
        # Check scheme
        if parsed.scheme in self.BLOCKED_SCHEMES:
            return {"valid": False, "reason": f"Blocked scheme: {parsed.scheme}"}
 
        if parsed.scheme not in ("http", "https"):
            return {"valid": False, "reason": f"Unsupported scheme: {parsed.scheme}"}
 
        # Check host
        hostname = parsed.hostname or ""
        if hostname in self.BLOCKED_HOSTS:
            return {"valid": False, "reason": f"Blocked host: {hostname}"}
 
        # Check for IP address in private ranges
        try:
            import ipaddress
            ip = ipaddress.ip_address(hostname)
            if ip.is_private or ip.is_loopback or ip.is_link_local:
                return {"valid": False, "reason": f"Private/internal IP: {hostname}"}
        except ValueError:
            pass  # Not an IP address — hostname is fine
 
        # Check port
        port = parsed.port
        if port and port in self.BLOCKED_PORTS:
            return {"valid": False, "reason": f"Blocked port: {port}"}
 
        return {"valid": True}

Prompt Logging and Data Exposure

LLM proxies log prompts and completions for usage tracking, debugging, and cost allocation. This creates a concentrated data store of sensitive information:

import json
import re
from typing import Dict, List
from pathlib import Path
 
class LogSecurityAuditor:
    """Audit LLM proxy logs for sensitive data exposure."""
 
    PII_PATTERNS = {
        "email": re.compile(r"\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b"),
        "phone": re.compile(r"\b(?:\+?1[-.]?)?\(?\d{3}\)?[-.]?\d{3}[-.]?\d{4}\b"),
        "ssn": re.compile(r"\b\d{3}-\d{2}-\d{4}\b"),
        "credit_card": re.compile(r"\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14})\b"),
        "api_key": re.compile(r"\b(?:sk-[a-zA-Z0-9]{20,}|AIza[a-zA-Z0-9_-]{35})\b"),
    }
 
    def scan_log_file(self, log_path: str, sample_size: int = 1000) -> Dict:
        """Scan proxy logs for PII and sensitive data."""
        findings = {pattern: 0 for pattern in self.PII_PATTERNS}
        lines_scanned = 0
 
        with open(log_path) as f:
            for i, line in enumerate(f):
                if i >= sample_size:
                    break
                lines_scanned += 1
 
                for pattern_name, pattern in self.PII_PATTERNS.items():
                    matches = pattern.findall(line)
                    findings[pattern_name] += len(matches)
 
        return {
            "log_file": log_path,
            "lines_scanned": lines_scanned,
            "pii_findings": {k: v for k, v in findings.items() if v > 0},
            "severity": "critical" if any(findings.values()) else "info",
            "recommendation": "Enable PII scrubbing in proxy logging configuration",
        }
 
    def check_log_permissions(self, log_dir: str) -> List[Dict]:
        """Check permissions on proxy log files."""
        findings = []
        for log_file in Path(log_dir).glob("*.log*"):
            stat = os.stat(log_file)
            if stat.st_mode & 0o077:
                findings.append({
                    "severity": "high",
                    "file": str(log_file),
                    "finding": "Log file readable by group/others",
                    "permissions": oct(stat.st_mode)[-3:],
                })
        return findings

Virtual Key Bypass

LiteLLM supports virtual keys that map to underlying provider keys. If the virtual key system has implementation flaws, attackers may be able to bypass key restrictions:

import requests
from typing import Dict
 
class VirtualKeyTester:
    """Test virtual key enforcement in LLM proxies."""
 
    def __init__(self, proxy_url: str):
        self.base_url = proxy_url.rstrip("/")
 
    def test_key_bypass(self, valid_key: str) -> Dict:
        """Test for common virtual key bypass patterns."""
        results = {"bypasses_found": []}
 
        bypass_attempts = [
            # Try with no key
            ("no_key", {}),
            # Try with empty bearer token
            ("empty_bearer", {"Authorization": "Bearer "}),
            # Try with 'null' key
            ("null_key", {"Authorization": "Bearer null"}),
            # Try with the master key format
            ("master_format", {"Authorization": "Bearer sk-litellm-master-key"}),
        ]
 
        test_payload = {
            "model": "gpt-4",
            "messages": [{"role": "user", "content": "Hello"}],
            "max_tokens": 5,
        }
 
        for name, headers in bypass_attempts:
            try:
                resp = requests.post(
                    f"{self.base_url}/v1/chat/completions",
                    json=test_payload,
                    headers=headers,
                    timeout=10,
                )
                if resp.status_code == 200:
                    results["bypasses_found"].append({
                        "method": name,
                        "severity": "critical",
                        "finding": f"Authentication bypassed with method: {name}",
                    })
            except requests.RequestException:
                pass
 
        return results
 
    def test_key_scope_enforcement(
        self,
        limited_key: str,
        allowed_model: str,
        blocked_model: str,
    ) -> Dict:
        """Test if model-scoped virtual keys are properly enforced."""
        headers = {"Authorization": f"Bearer {limited_key}"}
        test_payload = {
            "messages": [{"role": "user", "content": "Hello"}],
            "max_tokens": 5,
        }
 
        results = {}
 
        # Test allowed model
        test_payload["model"] = allowed_model
        try:
            resp = requests.post(
                f"{self.base_url}/v1/chat/completions",
                json=test_payload,
                headers=headers,
                timeout=10,
            )
            results["allowed_model"] = {
                "model": allowed_model,
                "status": resp.status_code,
                "accessible": resp.status_code == 200,
            }
        except requests.RequestException as e:
            results["allowed_model"] = {"error": str(e)}
 
        # Test blocked model
        test_payload["model"] = blocked_model
        try:
            resp = requests.post(
                f"{self.base_url}/v1/chat/completions",
                json=test_payload,
                headers=headers,
                timeout=10,
            )
            results["blocked_model"] = {
                "model": blocked_model,
                "status": resp.status_code,
                "accessible": resp.status_code == 200,
                "severity": "high" if resp.status_code == 200 else "info",
            }
        except requests.RequestException as e:
            results["blocked_model"] = {"error": str(e)}
 
        return results

Hardening LLM Proxy Deployments

Secure Configuration

# Hardened LiteLLM config.yaml
# API keys should be in environment variables, not here
 
model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY  # Reference env var
  - model_name: claude-3
    litellm_params:
      model: anthropic/claude-3-sonnet-20240229
      api_key: os.environ/ANTHROPIC_API_KEY
 
general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  database_url: os.environ/DATABASE_URL
  # Security settings
  allow_user_auth: true
  enforce_user_param: true
  max_budget: 100.0  # Global max spend in USD
  budget_duration: "monthly"
 
litellm_settings:
  # Drop excessive parameters that could be used for abuse
  drop_params: true
  # Set default limits
  max_tokens: 4096
  # Enable request logging
  success_callback: ["langfuse"]  # External logging service
  failure_callback: ["langfuse"]

#!/bin/bash
# Secure LiteLLM deployment with Docker
 
docker run -d \
  --name litellm-proxy \
  --restart unless-stopped \
  -p 127.0.0.1:4000:4000 \
  -v /etc/litellm/config.yaml:/app/config.yaml:ro \
  -e OPENAI_API_KEY="${OPENAI_API_KEY}" \
  -e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}" \
  -e LITELLM_MASTER_KEY="${LITELLM_MASTER_KEY}" \
  -e DATABASE_URL="${DATABASE_URL}" \
  --read-only \
  --tmpfs /tmp:noexec,nosuid,size=100m \
  --security-opt no-new-privileges \
  --cap-drop ALL \
  --memory 2g \
  --cpus 2 \
  ghcr.io/berriai/litellm:main-latest \
  --config /app/config.yaml \
  --port 4000 \
  --detailed_debug false

Defensive Recommendations

Always enable authentication — set a strong master key and use virtual keys for all clients
Store API keys in a secrets manager — never in config files or environment files checked into version control
Bind to localhost and use a reverse proxy with TLS for external access
Validate API base URLs to prevent SSRF through custom model endpoints
Implement PII scrubbing in proxy logs — prompts contain sensitive business data
Set spending limits per virtual key to prevent billing abuse
Monitor for anomalous usage patterns — sudden spikes in requests or token consumption
Restrict admin endpoints to specific IP ranges or networks
Audit virtual key permissions regularly — ensure keys are scoped to only required models
Encrypt the proxy database — it contains API keys, usage data, and potentially cached responses

References

LiteLLM Documentation — https://docs.litellm.ai/
LiteLLM GitHub — https://github.com/BerriAI/litellm
OWASP LLM Top 10 2025 — LLM06 (Excessive Agency), LLM10 (Unbounded Consumption)
MITRE ATLAS — AML.T0044 (Full ML Model Access)
NIST AI RMF — Govern 1.4 (Security controls for AI systems)

Edit this page on GitHub

LLM Proxy Security

Related articles

LLM Proxy Security

Related articles