API Key Compromise Investigation

intermediate18 min readUpdated 2026-03-21

Investigating AI API key compromise incidents including detection, scope assessment, usage forensics, and remediation procedures.

ai-forensics-ir api-key credential-compromise incident-response

Overview

AI API keys are high-value targets for attackers. A compromised OpenAI, Anthropic, Google AI, or Azure OpenAI API key grants the attacker access to powerful language models at the victim's expense. The consequences range from financial: attackers have run up bills exceeding $100,000 in a single weekend using stolen GPT-4 keys, to operational: the attacker can use the key to process sensitive data through the model, potentially exfiltrating information from RAG-connected systems. In the worst case, a compromised key with fine-tuning permissions allows the attacker to poison models associated with the account.

API key compromise investigations differ from traditional credential compromise in several important ways. First, the "blast radius" depends heavily on what the key is connected to. A key used only for simple completions has a different risk profile than one wired into an agentic system with tool access. Second, the forensic artifacts are primarily API usage logs and billing records rather than system logs and network captures. Third, the attack surface for credential theft is broad: keys end up in git repositories, client-side code, CI/CD logs, environment variable dumps, and shared configuration files.

This article walks through the complete investigation lifecycle for AI API key compromise: initial detection, scope assessment, forensic reconstruction of attacker activity, and remediation.

Detection of API Key Compromise

Usage Anomaly Detection

The most reliable signal for API key compromise is anomalous usage. This manifests as spikes in request volume, unexpected model usage (the attacker using GPT-4-turbo when your application only uses GPT-3.5), requests from unfamiliar IP addresses, or usage outside of normal business hours.

import json
from datetime import datetime, timedelta
from dataclasses import dataclass, field
from collections import defaultdict
from typing import Optional
 
@dataclass
class UsageRecord:
    """A single API usage record."""
    timestamp: str
    api_key_prefix: str  # Last 4 chars or masked key
    model: str
    input_tokens: int
    output_tokens: int
    endpoint: str
    source_ip: Optional[str] = None
    user_agent: Optional[str] = None
    status_code: int = 200
    cost_usd: float = 0.0
 
@dataclass
class AnomalyAlert:
    """An anomaly detected in API usage."""
    alert_type: str
    severity: str  # low, medium, high, critical
    description: str
    evidence: dict
    timestamp: str
    api_key_prefix: str
 
class APIKeyUsageAnalyzer:
    """Detect anomalous API key usage patterns that may
    indicate compromise."""
 
    # Approximate costs per 1K tokens (USD) as of early 2026
    MODEL_COSTS = {
        "gpt-4": {"input": 0.03, "output": 0.06},
        "gpt-4-turbo": {"input": 0.01, "output": 0.03},
        "gpt-4o": {"input": 0.005, "output": 0.015},
        "gpt-3.5-turbo": {"input": 0.0005, "output": 0.0015},
        "claude-3-opus": {"input": 0.015, "output": 0.075},
        "claude-3.5-sonnet": {"input": 0.003, "output": 0.015},
        "claude-3.5-haiku": {"input": 0.0008, "output": 0.004},
    }
 
    def __init__(self, baseline_window_days: int = 30):
        self.baseline_window_days = baseline_window_days
        self.baselines: dict[str, dict] = {}  # key_prefix -> baseline stats
 
    def build_baseline(self, historical_records: list[UsageRecord]) -> None:
        """Build usage baselines from historical data."""
        key_records: dict[str, list[UsageRecord]] = defaultdict(list)
        for record in historical_records:
            key_records[record.api_key_prefix].append(record)
 
        for key_prefix, records in key_records.items():
            daily_requests: dict[str, int] = defaultdict(int)
            daily_costs: dict[str, float] = defaultdict(float)
            models_used: set[str] = set()
            ips_seen: set[str] = set()
            hourly_distribution: dict[int, int] = defaultdict(int)
 
            for r in records:
                try:
                    dt = datetime.fromisoformat(r.timestamp)
                except (ValueError, TypeError):
                    continue
 
                day_key = dt.strftime("%Y-%m-%d")
                daily_requests[day_key] += 1
                daily_costs[day_key] += r.cost_usd
                models_used.add(r.model)
                if r.source_ip:
                    ips_seen.add(r.source_ip)
                hourly_distribution[dt.hour] += 1
 
            req_counts = list(daily_requests.values()) or [0]
            cost_values = list(daily_costs.values()) or [0.0]
 
            self.baselines[key_prefix] = {
                "avg_daily_requests": sum(req_counts) / max(len(req_counts), 1),
                "max_daily_requests": max(req_counts),
                "std_daily_requests": self._std(req_counts),
                "avg_daily_cost": sum(cost_values) / max(len(cost_values), 1),
                "max_daily_cost": max(cost_values),
                "known_models": models_used,
                "known_ips": ips_seen,
                "active_hours": {
                    h for h, c in hourly_distribution.items()
                    if c > len(records) * 0.01  # at least 1% of traffic
                },
            }
 
    def analyze_records(
        self,
        records: list[UsageRecord],
    ) -> list[AnomalyAlert]:
        """Analyze usage records against baselines to detect anomalies."""
        alerts = []
 
        # Group by key and day
        key_day_records: dict[str, dict[str, list[UsageRecord]]] = defaultdict(
            lambda: defaultdict(list)
        )
        for r in records:
            try:
                day = datetime.fromisoformat(r.timestamp).strftime("%Y-%m-%d")
            except (ValueError, TypeError):
                continue
            key_day_records[r.api_key_prefix][day].append(r)
 
        for key_prefix, days in key_day_records.items():
            baseline = self.baselines.get(key_prefix)
            if not baseline:
                continue
 
            for day, day_records in days.items():
                alerts.extend(
                    self._check_volume_anomaly(key_prefix, day, day_records, baseline)
                )
                alerts.extend(
                    self._check_cost_anomaly(key_prefix, day, day_records, baseline)
                )
                alerts.extend(
                    self._check_model_anomaly(key_prefix, day_records, baseline)
                )
                alerts.extend(
                    self._check_ip_anomaly(key_prefix, day_records, baseline)
                )
                alerts.extend(
                    self._check_time_anomaly(key_prefix, day_records, baseline)
                )
 
        return sorted(alerts, key=lambda a: a.timestamp)
 
    def _check_volume_anomaly(
        self, key_prefix: str, day: str,
        records: list[UsageRecord], baseline: dict,
    ) -> list[AnomalyAlert]:
        """Check for request volume anomalies."""
        alerts = []
        count = len(records)
        avg = baseline["avg_daily_requests"]
        std = baseline["std_daily_requests"]
 
        if std > 0 and count > avg + 3 * std:
            severity = "critical" if count > avg * 10 else "high"
            alerts.append(AnomalyAlert(
                alert_type="volume_spike",
                severity=severity,
                description=(
                    f"Key {key_prefix}: {count} requests on {day}, "
                    f"baseline avg={avg:.0f}, std={std:.0f}"
                ),
                evidence={
                    "day": day,
                    "request_count": count,
                    "baseline_avg": avg,
                    "baseline_std": std,
                    "multiplier": round(count / max(avg, 1), 1),
                },
                timestamp=records[0].timestamp,
                api_key_prefix=key_prefix,
            ))
        return alerts
 
    def _check_cost_anomaly(
        self, key_prefix: str, day: str,
        records: list[UsageRecord], baseline: dict,
    ) -> list[AnomalyAlert]:
        """Check for cost anomalies."""
        alerts = []
        total_cost = sum(r.cost_usd for r in records)
        avg_cost = baseline["avg_daily_cost"]
 
        if total_cost > max(avg_cost * 5, 10.0):  # 5x baseline or $10 min
            alerts.append(AnomalyAlert(
                alert_type="cost_spike",
                severity="critical",
                description=(
                    f"Key {key_prefix}: ${total_cost:.2f} on {day}, "
                    f"baseline avg=${avg_cost:.2f}"
                ),
                evidence={
                    "day": day,
                    "total_cost": total_cost,
                    "baseline_avg_cost": avg_cost,
                },
                timestamp=records[0].timestamp,
                api_key_prefix=key_prefix,
            ))
        return alerts
 
    def _check_model_anomaly(
        self, key_prefix: str,
        records: list[UsageRecord], baseline: dict,
    ) -> list[AnomalyAlert]:
        """Check for usage of unexpected models."""
        alerts = []
        known = baseline.get("known_models", set())
        for r in records:
            if r.model not in known:
                alerts.append(AnomalyAlert(
                    alert_type="unknown_model",
                    severity="high",
                    description=(
                        f"Key {key_prefix}: model '{r.model}' not in baseline. "
                        f"Known models: {known}"
                    ),
                    evidence={
                        "model": r.model,
                        "known_models": list(known),
                    },
                    timestamp=r.timestamp,
                    api_key_prefix=key_prefix,
                ))
                break  # One alert per model is sufficient
        return alerts
 
    def _check_ip_anomaly(
        self, key_prefix: str,
        records: list[UsageRecord], baseline: dict,
    ) -> list[AnomalyAlert]:
        """Check for requests from unknown IP addresses."""
        alerts = []
        known_ips = baseline.get("known_ips", set())
        if not known_ips:
            return alerts
 
        unknown_ips = set()
        for r in records:
            if r.source_ip and r.source_ip not in known_ips:
                unknown_ips.add(r.source_ip)
 
        if unknown_ips:
            alerts.append(AnomalyAlert(
                alert_type="unknown_source_ip",
                severity="high",
                description=(
                    f"Key {key_prefix}: requests from {len(unknown_ips)} "
                    f"unknown IP(s): {list(unknown_ips)[:5]}"
                ),
                evidence={
                    "unknown_ips": list(unknown_ips),
                    "known_ips": list(known_ips)[:10],
                },
                timestamp=records[0].timestamp,
                api_key_prefix=key_prefix,
            ))
        return alerts
 
    def _check_time_anomaly(
        self, key_prefix: str,
        records: list[UsageRecord], baseline: dict,
    ) -> list[AnomalyAlert]:
        """Check for requests outside normal operating hours."""
        alerts = []
        active_hours = baseline.get("active_hours", set())
        if not active_hours:
            return alerts
 
        off_hours_count = 0
        for r in records:
            try:
                hour = datetime.fromisoformat(r.timestamp).hour
                if hour not in active_hours:
                    off_hours_count += 1
            except (ValueError, TypeError):
                continue
 
        if off_hours_count > len(records) * 0.3:  # >30% off-hours
            alerts.append(AnomalyAlert(
                alert_type="off_hours_usage",
                severity="medium",
                description=(
                    f"Key {key_prefix}: {off_hours_count}/{len(records)} "
                    f"requests outside normal hours"
                ),
                evidence={
                    "off_hours_count": off_hours_count,
                    "total_records": len(records),
                    "active_hours": list(active_hours),
                },
                timestamp=records[0].timestamp,
                api_key_prefix=key_prefix,
            ))
        return alerts
 
    @staticmethod
    def _std(values: list[float]) -> float:
        """Calculate standard deviation."""
        if len(values) < 2:
            return 0.0
        mean = sum(values) / len(values)
        variance = sum((x - mean) ** 2 for x in values) / (len(values) - 1)
        return variance ** 0.5

Monitoring Billing Dashboards

Beyond API-level monitoring, billing dashboards provide a secondary detection signal. Most AI providers offer billing alerts, but the default thresholds are often set too high to catch early-stage compromise. Configure billing alerts at levels that are meaningful for your actual usage: if you typically spend $50/day on OpenAI, set alerts at $75, $150, and $500. The first threshold catches slow leaks, the second catches active misuse, and the third catches large-scale abuse.

Scope Assessment

Determining What Was Exposed

Once a compromised key is identified, the immediate question is: what can the attacker do with it? The answer depends on the key's permissions and what it is connected to.

@dataclass
class KeyScopeAssessment:
    """Assessment of what a compromised key can access."""
    key_prefix: str
    provider: str
    permissions: list[str]
    connected_services: list[str]
    data_exposure_risk: str  # low, medium, high, critical
    financial_exposure: str
    findings: list[str]
 
def assess_openai_key_scope(
    key_prefix: str,
    org_settings: dict,
    usage_history: list[dict],
) -> KeyScopeAssessment:
    """
    Assess the scope of a compromised OpenAI API key.
 
    Args:
        key_prefix: The key identifier (last 4 chars).
        org_settings: Organization settings from OpenAI dashboard.
        usage_history: Historical usage records.
 
    Returns:
        KeyScopeAssessment with full scope analysis.
    """
    permissions = []
    findings = []
    connected_services = []
    data_risk = "low"
    financial_risk = "low"
 
    # Check what models are accessible
    models_used = {r.get("model") for r in usage_history if r.get("model")}
    permissions.append(f"model_access: {models_used}")
 
    # Check for fine-tuning access
    fine_tune_records = [
        r for r in usage_history
        if "fine_tun" in r.get("endpoint", "").lower()
    ]
    if fine_tune_records:
        permissions.append("fine_tuning")
        findings.append(
            "CRITICAL: Key has fine-tuning access. Attacker could "
            "create poisoned models in your organization."
        )
        data_risk = "critical"
 
    # Check for file upload access
    file_records = [
        r for r in usage_history
        if "/files" in r.get("endpoint", "")
    ]
    if file_records:
        permissions.append("file_upload")
        findings.append(
            "HIGH: Key has file API access. Attacker could upload "
            "training data or access existing uploaded files."
        )
        data_risk = max(data_risk, "high", key=_risk_ord)
 
    # Check for assistants/threads access (may contain conversation history)
    assistant_records = [
        r for r in usage_history
        if "/assistants" in r.get("endpoint", "")
        or "/threads" in r.get("endpoint", "")
    ]
    if assistant_records:
        permissions.append("assistants_api")
        connected_services.append("assistants")
        findings.append(
            "HIGH: Key has Assistants API access. Attacker could "
            "read existing thread histories containing user data."
        )
        data_risk = max(data_risk, "high", key=_risk_ord)
 
    # Estimate financial exposure
    monthly_limit = org_settings.get("monthly_spend_limit")
    if monthly_limit:
        financial_risk = (
            "critical" if monthly_limit > 10000
            else "high" if monthly_limit > 1000
            else "medium"
        )
        findings.append(
            f"Monthly spend limit: ${monthly_limit}. "
            f"Attacker could consume up to this amount."
        )
    else:
        financial_risk = "critical"
        findings.append(
            "WARNING: No monthly spend limit configured. "
            "Attacker has unlimited financial exposure."
        )
 
    return KeyScopeAssessment(
        key_prefix=key_prefix,
        provider="openai",
        permissions=permissions,
        connected_services=connected_services,
        data_exposure_risk=data_risk,
        financial_exposure=financial_risk,
        findings=findings,
    )
 
def _risk_ord(level: str) -> int:
    """Convert risk level to ordinal for comparison."""
    return {"low": 0, "medium": 1, "high": 2, "critical": 3}.get(level, -1)

Assessing Connected Systems

A compromised API key is often more dangerous because of what it is connected to than because of the direct API access it provides. Map out the key's integration points. Is it used by an application that has access to a vector database? Does the application have tool-use capabilities that give the model access to internal systems? Is the key shared across multiple services?

For each connected system, document what data the attacker could access through the AI application, what actions they could trigger through tool use, and whether there are any rate limits or access controls beyond the API key itself.

Forensic Reconstruction of Attacker Activity

Building the Attack Timeline

With the compromised key identified and its scope assessed, the next step is reconstructing exactly what the attacker did. This requires correlating API usage logs, billing records, and any application-level logs.

from enum import Enum
 
class AttackPhase(Enum):
    RECONNAISSANCE = "reconnaissance"
    CAPABILITY_TESTING = "capability_testing"
    EXPLOITATION = "exploitation"
    PERSISTENCE = "persistence"
    EXFILTRATION = "exfiltration"
 
@dataclass
class AttackerAction:
    """A reconstructed attacker action."""
    timestamp: str
    phase: AttackPhase
    action: str
    details: dict
    confidence: float  # 0-1
 
class APIKeyForensicReconstructor:
    """Reconstruct attacker activity from API usage logs
    after a key compromise."""
 
    def reconstruct_timeline(
        self,
        usage_records: list[UsageRecord],
        compromise_start: str,
        known_legitimate_ips: set[str],
    ) -> list[AttackerAction]:
        """
        Reconstruct attacker activity timeline.
 
        Args:
            usage_records: All usage records for the compromised key.
            compromise_start: ISO timestamp of estimated compromise start.
            known_legitimate_ips: IPs known to be legitimate.
 
        Returns:
            Ordered list of reconstructed attacker actions.
        """
        start_dt = datetime.fromisoformat(compromise_start)
 
        # Filter to records after compromise and from unknown IPs
        suspicious = []
        for r in usage_records:
            try:
                r_dt = datetime.fromisoformat(r.timestamp)
            except (ValueError, TypeError):
                continue
            if r_dt < start_dt:
                continue
            if r.source_ip and r.source_ip in known_legitimate_ips:
                continue
            suspicious.append(r)
 
        suspicious.sort(key=lambda r: r.timestamp)
 
        actions = []
 
        # Phase detection based on temporal patterns and request types
        for i, record in enumerate(suspicious):
            phase = self._classify_phase(record, i, suspicious)
            action_desc = self._describe_action(record, phase)
 
            actions.append(AttackerAction(
                timestamp=record.timestamp,
                phase=phase,
                action=action_desc,
                details={
                    "model": record.model,
                    "endpoint": record.endpoint,
                    "input_tokens": record.input_tokens,
                    "output_tokens": record.output_tokens,
                    "source_ip": record.source_ip,
                    "status_code": record.status_code,
                    "cost_usd": record.cost_usd,
                },
                confidence=0.7,  # Default; refined by context
            ))
 
        return actions
 
    def _classify_phase(
        self,
        record: UsageRecord,
        index: int,
        all_records: list[UsageRecord],
    ) -> AttackPhase:
        """Classify the attack phase of a given record."""
        # First few requests are typically reconnaissance
        if index < 3:
            return AttackPhase.RECONNAISSANCE
 
        # Small token counts suggest testing/probing
        if record.input_tokens < 50 and record.output_tokens < 50:
            return AttackPhase.CAPABILITY_TESTING
 
        # Fine-tuning or file endpoints suggest persistence
        if any(
            kw in record.endpoint.lower()
            for kw in ["fine-tun", "files", "batch"]
        ):
            return AttackPhase.PERSISTENCE
 
        # Large output token counts may indicate exfiltration
        if record.output_tokens > 4000:
            return AttackPhase.EXFILTRATION
 
        return AttackPhase.EXPLOITATION
 
    def _describe_action(
        self,
        record: UsageRecord,
        phase: AttackPhase,
    ) -> str:
        """Generate a human-readable description of the action."""
        descriptions = {
            AttackPhase.RECONNAISSANCE: (
                f"Probed {record.endpoint} with model {record.model}"
            ),
            AttackPhase.CAPABILITY_TESTING: (
                f"Tested {record.model} capabilities via {record.endpoint} "
                f"({record.input_tokens} in, {record.output_tokens} out)"
            ),
            AttackPhase.EXPLOITATION: (
                f"Used {record.model} via {record.endpoint} "
                f"({record.input_tokens}+{record.output_tokens} tokens, "
                f"${record.cost_usd:.4f})"
            ),
            AttackPhase.PERSISTENCE: (
                f"Accessed {record.endpoint}, possible persistence mechanism"
            ),
            AttackPhase.EXFILTRATION: (
                f"Large output ({record.output_tokens} tokens) from "
                f"{record.model}, possible data exfiltration"
            ),
        }
        return descriptions.get(phase, f"Unknown action on {record.endpoint}")
 
    def generate_forensic_report(
        self,
        actions: list[AttackerAction],
        key_scope: KeyScopeAssessment,
    ) -> str:
        """Generate a forensic report from reconstructed actions."""
        lines = [
            "=" * 70,
            "API KEY COMPROMISE - FORENSIC INVESTIGATION REPORT",
            "=" * 70,
            "",
            f"Compromised Key: ...{key_scope.key_prefix}",
            f"Provider: {key_scope.provider}",
            f"Data Exposure Risk: {key_scope.data_exposure_risk}",
            f"Financial Exposure: {key_scope.financial_exposure}",
            "",
            "SCOPE FINDINGS:",
        ]
        for finding in key_scope.findings:
            lines.append(f"  - {finding}")
 
        lines.extend([
            "",
            f"ATTACKER ACTIONS: {len(actions)} total",
            "",
        ])
 
        # Group by phase
        phase_groups: dict[AttackPhase, list[AttackerAction]] = defaultdict(list)
        for action in actions:
            phase_groups[action.phase].append(action)
 
        for phase in AttackPhase:
            phase_actions = phase_groups.get(phase, [])
            if not phase_actions:
                continue
            lines.append(f"--- {phase.value.upper()} ({len(phase_actions)} actions) ---")
            for a in phase_actions[:10]:  # Show first 10 per phase
                lines.append(f"  [{a.timestamp}] {a.action}")
            if len(phase_actions) > 10:
                lines.append(f"  ... and {len(phase_actions) - 10} more")
            lines.append("")
 
        # Financial impact
        total_cost = sum(
            a.details.get("cost_usd", 0) for a in actions
        )
        total_tokens = sum(
            a.details.get("input_tokens", 0) + a.details.get("output_tokens", 0)
            for a in actions
        )
        lines.extend([
            "FINANCIAL IMPACT:",
            f"  Total attacker cost: ${total_cost:.2f}",
            f"  Total tokens consumed: {total_tokens:,}",
        ])
 
        return "\n".join(lines)

Source Identification: How Was the Key Leaked?

Common Leak Vectors

Identifying how the key was compromised is essential for remediation. The most common leak vectors for AI API keys are:

Source code repositories: Keys committed to public or improperly secured repositories. Tools like GitHub's secret scanning, truffleHog, and GitLeaks can search commit history. Remember that even if the key is removed in a subsequent commit, it remains in git history.

Client-side exposure: Keys embedded in frontend JavaScript, mobile applications, or browser extensions. These are trivially extractable by any user.

CI/CD pipeline logs: Keys printed in build logs, test output, or deployment scripts. Many CI systems retain logs for weeks or months.

Environment variable leaks: Debug endpoints, error pages, or server status pages that dump environment variables.

Third-party service compromise: If the key is stored in a secrets manager, configuration service, or shared credential store that is itself compromised.

import subprocess
from pathlib import Path
 
def scan_git_history_for_key(
    repo_path: str,
    key_pattern: str,
) -> list[dict]:
    """
    Scan git history for occurrences of an API key pattern.
 
    Args:
        repo_path: Path to the git repository.
        key_pattern: Regex pattern matching the key format
            (e.g., 'sk-[a-zA-Z0-9]{48}' for OpenAI keys).
 
    Returns:
        List of commits where the key was found.
    """
    results = []
    try:
        # Use git log -p to search through diffs
        cmd = [
            "git", "-C", repo_path, "log", "--all", "-p",
            f"--grep-reflog={key_pattern}",
            "--format=%H|%an|%ae|%aI|%s",
        ]
        # Fallback: use git log with pickaxe search
        cmd = [
            "git", "-C", repo_path, "log", "--all",
            f"-S{key_pattern}", "--format=%H|%an|%ae|%aI|%s",
        ]
        output = subprocess.run(
            cmd, capture_output=True, text=True, timeout=120,
        )
 
        for line in output.stdout.strip().split("\n"):
            if not line or "|" not in line:
                continue
            parts = line.split("|", 4)
            if len(parts) >= 5:
                results.append({
                    "commit_hash": parts[0],
                    "author_name": parts[1],
                    "author_email": parts[2],
                    "date": parts[3],
                    "message": parts[4],
                })
 
    except subprocess.TimeoutExpired:
        results.append({"error": "Git search timed out"})
    except FileNotFoundError:
        results.append({"error": "Git not found or invalid repo path"})
 
    return results

Investigating Third-Party Exposure

Check whether the key appears in public data breach dumps, paste sites, or AI-specific key trading channels. Services like Have I Been Pwned (for associated email addresses) and specialized dark web monitoring can indicate whether the key has been publicly exposed. If the key is found in a public repository through GitHub's secret scanning alerts, the exposure timestamp helps establish the start of the compromise window.

Remediation Procedures

Immediate Response

The immediate response to a confirmed API key compromise follows this sequence:

Rotate the key immediately. Generate a new key and update all legitimate consumers. Do not simply disable the old key if you can rotate first, as this allows you to update consumers without downtime.
Set or reduce spending limits. If the provider supports it, set a hard spending cap to limit further financial damage while you complete the rotation.
Review recent API activity. Use the forensic reconstruction techniques above to understand what the attacker did with the key.
Check for persistence. If the attacker had fine-tuning access, review all fine-tuning jobs. If they had file access, review uploaded files. If they had assistant access, review assistants and threads for modifications.
Notify affected parties. If user data was potentially exposed through the compromised key (e.g., through access to conversation histories or RAG-connected data), initiate your data breach notification procedures.

Conducting a Post-Incident Review

After the immediate response is complete, conduct a structured post-incident review that goes beyond the technical details of the compromise. Document the full timeline from initial key exposure through detection through containment. Calculate the total financial impact, including both the attacker's consumption and the cost of the investigation and remediation effort.

Critically evaluate your detection capabilities. How long was the key compromised before detection (dwell time)? What signal ultimately triggered the investigation? Could that signal have been detected earlier with different alerting thresholds? Many organizations discover that their API key compromise dwell time is measured in weeks or months because their monitoring was not granular enough to catch the early stages of abuse.

Review the key management lifecycle that led to the exposure. Map out every system and person who had access to the key. Identify where the key was stored, how it was distributed, and what controls were in place at each point. The root cause is rarely "the key was leaked" — it is "the key was in a location where it could be leaked because our key management process allowed it."

Document specific indicators of compromise (IOCs) from the investigation: the attacker's IP addresses, user agents, request patterns, and timing characteristics. Share these IOCs with your security operations team so they can build detections that catch similar activity in the future, and consider sharing them with your AI provider's abuse team.

Provider-Specific Investigation Resources

Each major AI API provider offers different tools for investigating compromised keys. OpenAI provides a usage dashboard with per-key breakdowns and an API for programmatic usage queries. The organization audit log captures key creation, modification, and deletion events. Anthropic's console provides usage tracking by API key with cost breakdowns. Google Cloud's AI Platform integrates with Cloud Audit Logs for detailed request-level tracking. For Azure OpenAI, diagnostic logs in Azure Monitor provide the most detailed view, including full request and response content when enabled.

When working with provider support teams during an investigation, provide them with the compromised key identifier, the suspected compromise window, and any attacker IP addresses you have identified. Most providers can provide additional forensic data from their side, including request-level details that may not be available through your own logging.

Long-Term Hardening

After the immediate incident is resolved, implement controls to prevent recurrence:

Use scoped API keys with minimum necessary permissions. If your application only needs chat completions, do not give it file or fine-tuning access.
Implement API key rotation on a regular schedule (at least quarterly).
Use secrets managers (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault) rather than environment variables or configuration files.
Deploy pre-commit hooks that scan for API key patterns before code is committed.
Configure billing alerts at meaningful thresholds.
Implement IP allowlisting where supported by the provider.
Monitor for leaked keys using automated scanning of your repositories and third-party exposure monitoring.

Proactive API Key Security Monitoring

Continuous Monitoring Architecture

A robust API key security program requires continuous monitoring rather than reactive investigation. Deploy an automated monitoring pipeline that ingests API usage data in near-real-time, applies the anomaly detection algorithms described above, and routes alerts to your security operations team.

The monitoring pipeline should process usage records within minutes of their generation. Most AI API providers offer usage data with a delay of 5-30 minutes. Configure the pipeline to pull usage data at this cadence, apply the baseline comparison, and generate alerts for anomalies that exceed your configured thresholds. Feed the alerts into your existing SIEM or security alerting infrastructure so they are triaged alongside other security events.

Regularly rotate the baselines used for anomaly detection. As your application's usage grows and changes, the baseline must evolve to avoid false positives from legitimate growth and false negatives from gradual abuse. Recompute baselines weekly using the most recent 30 days of data, excluding any days where confirmed incidents occurred.

References

Lasso Security (2023). "Thousands of API keys and credentials found in public LLM training data." https://www.lasso.security/blog/thousands-of-api-keys-found-in-llm-training-data
OWASP Foundation (2025). "OWASP Top 10 for LLM Applications - LLM06: Sensitive Information Disclosure." https://owasp.org/www-project-top-10-for-large-language-model-applications/
Truffle Security (2024). "TruffleHog: Find and verify credentials in git repositories." https://github.com/trufflesecurity/trufflehog

Edit this page on GitHub

API Key Compromise Investigation

intermediate18 min readUpdated 2026-03-21

Investigating AI API key compromise incidents including detection, scope assessment, usage forensics, and remediation procedures.

ai-forensics-ir api-key credential-compromise incident-response

Overview

This article walks through the complete investigation lifecycle for AI API key compromise: initial detection, scope assessment, forensic reconstruction of attacker activity, and remediation.

Detection of API Key Compromise

Usage Anomaly Detection

import json
from datetime import datetime, timedelta
from dataclasses import dataclass, field
from collections import defaultdict
from typing import Optional
 
@dataclass
class UsageRecord:
    """A single API usage record."""
    timestamp: str
    api_key_prefix: str  # Last 4 chars or masked key
    model: str
    input_tokens: int
    output_tokens: int
    endpoint: str
    source_ip: Optional[str] = None
    user_agent: Optional[str] = None
    status_code: int = 200
    cost_usd: float = 0.0
 
@dataclass
class AnomalyAlert:
    """An anomaly detected in API usage."""
    alert_type: str
    severity: str  # low, medium, high, critical
    description: str
    evidence: dict
    timestamp: str
    api_key_prefix: str
 
class APIKeyUsageAnalyzer:
    """Detect anomalous API key usage patterns that may
    indicate compromise."""
 
    # Approximate costs per 1K tokens (USD) as of early 2026
    MODEL_COSTS = {
        "gpt-4": {"input": 0.03, "output": 0.06},
        "gpt-4-turbo": {"input": 0.01, "output": 0.03},
        "gpt-4o": {"input": 0.005, "output": 0.015},
        "gpt-3.5-turbo": {"input": 0.0005, "output": 0.0015},
        "claude-3-opus": {"input": 0.015, "output": 0.075},
        "claude-3.5-sonnet": {"input": 0.003, "output": 0.015},
        "claude-3.5-haiku": {"input": 0.0008, "output": 0.004},
    }
 
    def __init__(self, baseline_window_days: int = 30):
        self.baseline_window_days = baseline_window_days
        self.baselines: dict[str, dict] = {}  # key_prefix -> baseline stats
 
    def build_baseline(self, historical_records: list[UsageRecord]) -> None:
        """Build usage baselines from historical data."""
        key_records: dict[str, list[UsageRecord]] = defaultdict(list)
        for record in historical_records:
            key_records[record.api_key_prefix].append(record)
 
        for key_prefix, records in key_records.items():
            daily_requests: dict[str, int] = defaultdict(int)
            daily_costs: dict[str, float] = defaultdict(float)
            models_used: set[str] = set()
            ips_seen: set[str] = set()
            hourly_distribution: dict[int, int] = defaultdict(int)
 
            for r in records:
                try:
                    dt = datetime.fromisoformat(r.timestamp)
                except (ValueError, TypeError):
                    continue
 
                day_key = dt.strftime("%Y-%m-%d")
                daily_requests[day_key] += 1
                daily_costs[day_key] += r.cost_usd
                models_used.add(r.model)
                if r.source_ip:
                    ips_seen.add(r.source_ip)
                hourly_distribution[dt.hour] += 1
 
            req_counts = list(daily_requests.values()) or [0]
            cost_values = list(daily_costs.values()) or [0.0]
 
            self.baselines[key_prefix] = {
                "avg_daily_requests": sum(req_counts) / max(len(req_counts), 1),
                "max_daily_requests": max(req_counts),
                "std_daily_requests": self._std(req_counts),
                "avg_daily_cost": sum(cost_values) / max(len(cost_values), 1),
                "max_daily_cost": max(cost_values),
                "known_models": models_used,
                "known_ips": ips_seen,
                "active_hours": {
                    h for h, c in hourly_distribution.items()
                    if c > len(records) * 0.01  # at least 1% of traffic
                },
            }
 
    def analyze_records(
        self,
        records: list[UsageRecord],
    ) -> list[AnomalyAlert]:
        """Analyze usage records against baselines to detect anomalies."""
        alerts = []
 
        # Group by key and day
        key_day_records: dict[str, dict[str, list[UsageRecord]]] = defaultdict(
            lambda: defaultdict(list)
        )
        for r in records:
            try:
                day = datetime.fromisoformat(r.timestamp).strftime("%Y-%m-%d")
            except (ValueError, TypeError):
                continue
            key_day_records[r.api_key_prefix][day].append(r)
 
        for key_prefix, days in key_day_records.items():
            baseline = self.baselines.get(key_prefix)
            if not baseline:
                continue
 
            for day, day_records in days.items():
                alerts.extend(
                    self._check_volume_anomaly(key_prefix, day, day_records, baseline)
                )
                alerts.extend(
                    self._check_cost_anomaly(key_prefix, day, day_records, baseline)
                )
                alerts.extend(
                    self._check_model_anomaly(key_prefix, day_records, baseline)
                )
                alerts.extend(
                    self._check_ip_anomaly(key_prefix, day_records, baseline)
                )
                alerts.extend(
                    self._check_time_anomaly(key_prefix, day_records, baseline)
                )
 
        return sorted(alerts, key=lambda a: a.timestamp)
 
    def _check_volume_anomaly(
        self, key_prefix: str, day: str,
        records: list[UsageRecord], baseline: dict,
    ) -> list[AnomalyAlert]:
        """Check for request volume anomalies."""
        alerts = []
        count = len(records)
        avg = baseline["avg_daily_requests"]
        std = baseline["std_daily_requests"]
 
        if std > 0 and count > avg + 3 * std:
            severity = "critical" if count > avg * 10 else "high"
            alerts.append(AnomalyAlert(
                alert_type="volume_spike",
                severity=severity,
                description=(
                    f"Key {key_prefix}: {count} requests on {day}, "
                    f"baseline avg={avg:.0f}, std={std:.0f}"
                ),
                evidence={
                    "day": day,
                    "request_count": count,
                    "baseline_avg": avg,
                    "baseline_std": std,
                    "multiplier": round(count / max(avg, 1), 1),
                },
                timestamp=records[0].timestamp,
                api_key_prefix=key_prefix,
            ))
        return alerts
 
    def _check_cost_anomaly(
        self, key_prefix: str, day: str,
        records: list[UsageRecord], baseline: dict,
    ) -> list[AnomalyAlert]:
        """Check for cost anomalies."""
        alerts = []
        total_cost = sum(r.cost_usd for r in records)
        avg_cost = baseline["avg_daily_cost"]
 
        if total_cost > max(avg_cost * 5, 10.0):  # 5x baseline or $10 min
            alerts.append(AnomalyAlert(
                alert_type="cost_spike",
                severity="critical",
                description=(
                    f"Key {key_prefix}: ${total_cost:.2f} on {day}, "
                    f"baseline avg=${avg_cost:.2f}"
                ),
                evidence={
                    "day": day,
                    "total_cost": total_cost,
                    "baseline_avg_cost": avg_cost,
                },
                timestamp=records[0].timestamp,
                api_key_prefix=key_prefix,
            ))
        return alerts
 
    def _check_model_anomaly(
        self, key_prefix: str,
        records: list[UsageRecord], baseline: dict,
    ) -> list[AnomalyAlert]:
        """Check for usage of unexpected models."""
        alerts = []
        known = baseline.get("known_models", set())
        for r in records:
            if r.model not in known:
                alerts.append(AnomalyAlert(
                    alert_type="unknown_model",
                    severity="high",
                    description=(
                        f"Key {key_prefix}: model '{r.model}' not in baseline. "
                        f"Known models: {known}"
                    ),
                    evidence={
                        "model": r.model,
                        "known_models": list(known),
                    },
                    timestamp=r.timestamp,
                    api_key_prefix=key_prefix,
                ))
                break  # One alert per model is sufficient
        return alerts
 
    def _check_ip_anomaly(
        self, key_prefix: str,
        records: list[UsageRecord], baseline: dict,
    ) -> list[AnomalyAlert]:
        """Check for requests from unknown IP addresses."""
        alerts = []
        known_ips = baseline.get("known_ips", set())
        if not known_ips:
            return alerts
 
        unknown_ips = set()
        for r in records:
            if r.source_ip and r.source_ip not in known_ips:
                unknown_ips.add(r.source_ip)
 
        if unknown_ips:
            alerts.append(AnomalyAlert(
                alert_type="unknown_source_ip",
                severity="high",
                description=(
                    f"Key {key_prefix}: requests from {len(unknown_ips)} "
                    f"unknown IP(s): {list(unknown_ips)[:5]}"
                ),
                evidence={
                    "unknown_ips": list(unknown_ips),
                    "known_ips": list(known_ips)[:10],
                },
                timestamp=records[0].timestamp,
                api_key_prefix=key_prefix,
            ))
        return alerts
 
    def _check_time_anomaly(
        self, key_prefix: str,
        records: list[UsageRecord], baseline: dict,
    ) -> list[AnomalyAlert]:
        """Check for requests outside normal operating hours."""
        alerts = []
        active_hours = baseline.get("active_hours", set())
        if not active_hours:
            return alerts
 
        off_hours_count = 0
        for r in records:
            try:
                hour = datetime.fromisoformat(r.timestamp).hour
                if hour not in active_hours:
                    off_hours_count += 1
            except (ValueError, TypeError):
                continue
 
        if off_hours_count > len(records) * 0.3:  # >30% off-hours
            alerts.append(AnomalyAlert(
                alert_type="off_hours_usage",
                severity="medium",
                description=(
                    f"Key {key_prefix}: {off_hours_count}/{len(records)} "
                    f"requests outside normal hours"
                ),
                evidence={
                    "off_hours_count": off_hours_count,
                    "total_records": len(records),
                    "active_hours": list(active_hours),
                },
                timestamp=records[0].timestamp,
                api_key_prefix=key_prefix,
            ))
        return alerts
 
    @staticmethod
    def _std(values: list[float]) -> float:
        """Calculate standard deviation."""
        if len(values) < 2:
            return 0.0
        mean = sum(values) / len(values)
        variance = sum((x - mean) ** 2 for x in values) / (len(values) - 1)
        return variance ** 0.5

Monitoring Billing Dashboards

Scope Assessment

Determining What Was Exposed

Once a compromised key is identified, the immediate question is: what can the attacker do with it? The answer depends on the key's permissions and what it is connected to.

@dataclass
class KeyScopeAssessment:
    """Assessment of what a compromised key can access."""
    key_prefix: str
    provider: str
    permissions: list[str]
    connected_services: list[str]
    data_exposure_risk: str  # low, medium, high, critical
    financial_exposure: str
    findings: list[str]
 
def assess_openai_key_scope(
    key_prefix: str,
    org_settings: dict,
    usage_history: list[dict],
) -> KeyScopeAssessment:
    """
    Assess the scope of a compromised OpenAI API key.
 
    Args:
        key_prefix: The key identifier (last 4 chars).
        org_settings: Organization settings from OpenAI dashboard.
        usage_history: Historical usage records.
 
    Returns:
        KeyScopeAssessment with full scope analysis.
    """
    permissions = []
    findings = []
    connected_services = []
    data_risk = "low"
    financial_risk = "low"
 
    # Check what models are accessible
    models_used = {r.get("model") for r in usage_history if r.get("model")}
    permissions.append(f"model_access: {models_used}")
 
    # Check for fine-tuning access
    fine_tune_records = [
        r for r in usage_history
        if "fine_tun" in r.get("endpoint", "").lower()
    ]
    if fine_tune_records:
        permissions.append("fine_tuning")
        findings.append(
            "CRITICAL: Key has fine-tuning access. Attacker could "
            "create poisoned models in your organization."
        )
        data_risk = "critical"
 
    # Check for file upload access
    file_records = [
        r for r in usage_history
        if "/files" in r.get("endpoint", "")
    ]
    if file_records:
        permissions.append("file_upload")
        findings.append(
            "HIGH: Key has file API access. Attacker could upload "
            "training data or access existing uploaded files."
        )
        data_risk = max(data_risk, "high", key=_risk_ord)
 
    # Check for assistants/threads access (may contain conversation history)
    assistant_records = [
        r for r in usage_history
        if "/assistants" in r.get("endpoint", "")
        or "/threads" in r.get("endpoint", "")
    ]
    if assistant_records:
        permissions.append("assistants_api")
        connected_services.append("assistants")
        findings.append(
            "HIGH: Key has Assistants API access. Attacker could "
            "read existing thread histories containing user data."
        )
        data_risk = max(data_risk, "high", key=_risk_ord)
 
    # Estimate financial exposure
    monthly_limit = org_settings.get("monthly_spend_limit")
    if monthly_limit:
        financial_risk = (
            "critical" if monthly_limit > 10000
            else "high" if monthly_limit > 1000
            else "medium"
        )
        findings.append(
            f"Monthly spend limit: ${monthly_limit}. "
            f"Attacker could consume up to this amount."
        )
    else:
        financial_risk = "critical"
        findings.append(
            "WARNING: No monthly spend limit configured. "
            "Attacker has unlimited financial exposure."
        )
 
    return KeyScopeAssessment(
        key_prefix=key_prefix,
        provider="openai",
        permissions=permissions,
        connected_services=connected_services,
        data_exposure_risk=data_risk,
        financial_exposure=financial_risk,
        findings=findings,
    )
 
def _risk_ord(level: str) -> int:
    """Convert risk level to ordinal for comparison."""
    return {"low": 0, "medium": 1, "high": 2, "critical": 3}.get(level, -1)

Assessing Connected Systems

Forensic Reconstruction of Attacker Activity

Building the Attack Timeline

from enum import Enum
 
class AttackPhase(Enum):
    RECONNAISSANCE = "reconnaissance"
    CAPABILITY_TESTING = "capability_testing"
    EXPLOITATION = "exploitation"
    PERSISTENCE = "persistence"
    EXFILTRATION = "exfiltration"
 
@dataclass
class AttackerAction:
    """A reconstructed attacker action."""
    timestamp: str
    phase: AttackPhase
    action: str
    details: dict
    confidence: float  # 0-1
 
class APIKeyForensicReconstructor:
    """Reconstruct attacker activity from API usage logs
    after a key compromise."""
 
    def reconstruct_timeline(
        self,
        usage_records: list[UsageRecord],
        compromise_start: str,
        known_legitimate_ips: set[str],
    ) -> list[AttackerAction]:
        """
        Reconstruct attacker activity timeline.
 
        Args:
            usage_records: All usage records for the compromised key.
            compromise_start: ISO timestamp of estimated compromise start.
            known_legitimate_ips: IPs known to be legitimate.
 
        Returns:
            Ordered list of reconstructed attacker actions.
        """
        start_dt = datetime.fromisoformat(compromise_start)
 
        # Filter to records after compromise and from unknown IPs
        suspicious = []
        for r in usage_records:
            try:
                r_dt = datetime.fromisoformat(r.timestamp)
            except (ValueError, TypeError):
                continue
            if r_dt < start_dt:
                continue
            if r.source_ip and r.source_ip in known_legitimate_ips:
                continue
            suspicious.append(r)
 
        suspicious.sort(key=lambda r: r.timestamp)
 
        actions = []
 
        # Phase detection based on temporal patterns and request types
        for i, record in enumerate(suspicious):
            phase = self._classify_phase(record, i, suspicious)
            action_desc = self._describe_action(record, phase)
 
            actions.append(AttackerAction(
                timestamp=record.timestamp,
                phase=phase,
                action=action_desc,
                details={
                    "model": record.model,
                    "endpoint": record.endpoint,
                    "input_tokens": record.input_tokens,
                    "output_tokens": record.output_tokens,
                    "source_ip": record.source_ip,
                    "status_code": record.status_code,
                    "cost_usd": record.cost_usd,
                },
                confidence=0.7,  # Default; refined by context
            ))
 
        return actions
 
    def _classify_phase(
        self,
        record: UsageRecord,
        index: int,
        all_records: list[UsageRecord],
    ) -> AttackPhase:
        """Classify the attack phase of a given record."""
        # First few requests are typically reconnaissance
        if index < 3:
            return AttackPhase.RECONNAISSANCE
 
        # Small token counts suggest testing/probing
        if record.input_tokens < 50 and record.output_tokens < 50:
            return AttackPhase.CAPABILITY_TESTING
 
        # Fine-tuning or file endpoints suggest persistence
        if any(
            kw in record.endpoint.lower()
            for kw in ["fine-tun", "files", "batch"]
        ):
            return AttackPhase.PERSISTENCE
 
        # Large output token counts may indicate exfiltration
        if record.output_tokens > 4000:
            return AttackPhase.EXFILTRATION
 
        return AttackPhase.EXPLOITATION
 
    def _describe_action(
        self,
        record: UsageRecord,
        phase: AttackPhase,
    ) -> str:
        """Generate a human-readable description of the action."""
        descriptions = {
            AttackPhase.RECONNAISSANCE: (
                f"Probed {record.endpoint} with model {record.model}"
            ),
            AttackPhase.CAPABILITY_TESTING: (
                f"Tested {record.model} capabilities via {record.endpoint} "
                f"({record.input_tokens} in, {record.output_tokens} out)"
            ),
            AttackPhase.EXPLOITATION: (
                f"Used {record.model} via {record.endpoint} "
                f"({record.input_tokens}+{record.output_tokens} tokens, "
                f"${record.cost_usd:.4f})"
            ),
            AttackPhase.PERSISTENCE: (
                f"Accessed {record.endpoint}, possible persistence mechanism"
            ),
            AttackPhase.EXFILTRATION: (
                f"Large output ({record.output_tokens} tokens) from "
                f"{record.model}, possible data exfiltration"
            ),
        }
        return descriptions.get(phase, f"Unknown action on {record.endpoint}")
 
    def generate_forensic_report(
        self,
        actions: list[AttackerAction],
        key_scope: KeyScopeAssessment,
    ) -> str:
        """Generate a forensic report from reconstructed actions."""
        lines = [
            "=" * 70,
            "API KEY COMPROMISE - FORENSIC INVESTIGATION REPORT",
            "=" * 70,
            "",
            f"Compromised Key: ...{key_scope.key_prefix}",
            f"Provider: {key_scope.provider}",
            f"Data Exposure Risk: {key_scope.data_exposure_risk}",
            f"Financial Exposure: {key_scope.financial_exposure}",
            "",
            "SCOPE FINDINGS:",
        ]
        for finding in key_scope.findings:
            lines.append(f"  - {finding}")
 
        lines.extend([
            "",
            f"ATTACKER ACTIONS: {len(actions)} total",
            "",
        ])
 
        # Group by phase
        phase_groups: dict[AttackPhase, list[AttackerAction]] = defaultdict(list)
        for action in actions:
            phase_groups[action.phase].append(action)
 
        for phase in AttackPhase:
            phase_actions = phase_groups.get(phase, [])
            if not phase_actions:
                continue
            lines.append(f"--- {phase.value.upper()} ({len(phase_actions)} actions) ---")
            for a in phase_actions[:10]:  # Show first 10 per phase
                lines.append(f"  [{a.timestamp}] {a.action}")
            if len(phase_actions) > 10:
                lines.append(f"  ... and {len(phase_actions) - 10} more")
            lines.append("")
 
        # Financial impact
        total_cost = sum(
            a.details.get("cost_usd", 0) for a in actions
        )
        total_tokens = sum(
            a.details.get("input_tokens", 0) + a.details.get("output_tokens", 0)
            for a in actions
        )
        lines.extend([
            "FINANCIAL IMPACT:",
            f"  Total attacker cost: ${total_cost:.2f}",
            f"  Total tokens consumed: {total_tokens:,}",
        ])
 
        return "\n".join(lines)

Source Identification: How Was the Key Leaked?

Common Leak Vectors

Identifying how the key was compromised is essential for remediation. The most common leak vectors for AI API keys are:

Client-side exposure: Keys embedded in frontend JavaScript, mobile applications, or browser extensions. These are trivially extractable by any user.

CI/CD pipeline logs: Keys printed in build logs, test output, or deployment scripts. Many CI systems retain logs for weeks or months.

Environment variable leaks: Debug endpoints, error pages, or server status pages that dump environment variables.

Third-party service compromise: If the key is stored in a secrets manager, configuration service, or shared credential store that is itself compromised.

import subprocess
from pathlib import Path
 
def scan_git_history_for_key(
    repo_path: str,
    key_pattern: str,
) -> list[dict]:
    """
    Scan git history for occurrences of an API key pattern.
 
    Args:
        repo_path: Path to the git repository.
        key_pattern: Regex pattern matching the key format
            (e.g., 'sk-[a-zA-Z0-9]{48}' for OpenAI keys).
 
    Returns:
        List of commits where the key was found.
    """
    results = []
    try:
        # Use git log -p to search through diffs
        cmd = [
            "git", "-C", repo_path, "log", "--all", "-p",
            f"--grep-reflog={key_pattern}",
            "--format=%H|%an|%ae|%aI|%s",
        ]
        # Fallback: use git log with pickaxe search
        cmd = [
            "git", "-C", repo_path, "log", "--all",
            f"-S{key_pattern}", "--format=%H|%an|%ae|%aI|%s",
        ]
        output = subprocess.run(
            cmd, capture_output=True, text=True, timeout=120,
        )
 
        for line in output.stdout.strip().split("\n"):
            if not line or "|" not in line:
                continue
            parts = line.split("|", 4)
            if len(parts) >= 5:
                results.append({
                    "commit_hash": parts[0],
                    "author_name": parts[1],
                    "author_email": parts[2],
                    "date": parts[3],
                    "message": parts[4],
                })
 
    except subprocess.TimeoutExpired:
        results.append({"error": "Git search timed out"})
    except FileNotFoundError:
        results.append({"error": "Git not found or invalid repo path"})
 
    return results

Rotate the key immediately. Generate a new key and update all legitimate consumers. Do not simply disable the old key if you can rotate first, as this allows you to update consumers without downtime.
Set or reduce spending limits. If the provider supports it, set a hard spending cap to limit further financial damage while you complete the rotation.
Review recent API activity. Use the forensic reconstruction techniques above to understand what the attacker did with the key.
Check for persistence. If the attacker had fine-tuning access, review all fine-tuning jobs. If they had file access, review uploaded files. If they had assistant access, review assistants and threads for modifications.
Notify affected parties. If user data was potentially exposed through the compromised key (e.g., through access to conversation histories or RAG-connected data), initiate your data breach notification procedures.

Use scoped API keys with minimum necessary permissions. If your application only needs chat completions, do not give it file or fine-tuning access.
Implement API key rotation on a regular schedule (at least quarterly).
Use secrets managers (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault) rather than environment variables or configuration files.
Deploy pre-commit hooks that scan for API key patterns before code is committed.
Configure billing alerts at meaningful thresholds.
Implement IP allowlisting where supported by the provider.
Monitor for leaked keys using automated scanning of your repositories and third-party exposure monitoring.

Lasso Security (2023). "Thousands of API keys and credentials found in public LLM training data." https://www.lasso.security/blog/thousands-of-api-keys-found-in-llm-training-data
OWASP Foundation (2025). "OWASP Top 10 for LLM Applications - LLM06: Sensitive Information Disclosure." https://owasp.org/www-project-top-10-for-large-language-model-applications/
Truffle Security (2024). "TruffleHog: Find and verify credentials in git repositories." https://github.com/trufflesecurity/trufflehog

Edit this page on GitHub

API Key Compromise Investigation

Related articles

API Key Compromise Investigation

Related articles