Data Exfiltration Incident Response Playbook
Playbook for AI-mediated data exfiltration: identifying exposed data, determining exfiltration scope, data classification, breach notification procedures, and remediation.
Data Exfiltration Incident Response Playbook
This playbook covers incidents where an AI system has disclosed, leaked, or exfiltrated data that should not have been accessible to the requesting user. This includes training data extraction, RAG data leakage across authorization boundaries, system prompt disclosure containing sensitive information, PII disclosure, and agent-mediated data exfiltration through tool calls.
Trigger Criteria
Activate this playbook when:
- Model output contains data classified above the user's authorization level
- Model output contains PII, PHI, financial data, or trade secrets
- System prompt containing sensitive information (API keys, internal URLs, credentials) is disclosed
- RAG system returns documents from another tenant or unauthorized collection
- Agent tool calls retrieve or transmit data outside authorized boundaries
- Training data extraction is confirmed (model reproduces memorized private data)
Immediate Actions (First 30 Minutes)
Acknowledge and assign
Incident ID: AI-IR-[YYYY]-[NNNN] Type: Data Exfiltration Detected: [UTC timestamp] IC: [Name] AI Investigator: [Name] Evidence Custodian: [Name] Legal Contact: [Name] ← Required for data exfiltration incidentsPreserve evidence with chain of custody
Data exfiltration incidents may require evidence for legal proceedings or regulatory response. Apply strict chain of custody from the start:
- Capture complete conversation with all model outputs containing exposed data
- Record hashes of all evidence files
- Log the identity of the person collecting evidence and the collection method
- Store in tamper-evident, access-controlled evidence repository
- Capture the user identity and session metadata of the requester
- Preserve the model version, system prompt, and configuration active at the time
- If RAG is involved, capture the retrieval logs and document index state
- If tools are involved, capture all tool call logs with parameters and results
Stop the bleeding
Prevent further exfiltration immediately:
Exfiltration Channel Containment Action Model output Add output classifier for the data type; consider blocking the user System prompt in output Rotate any credentials in the system prompt immediately; update prompt RAG cross-tenant leakage Disable shared retrieval or add strict tenant filtering Tool call exfiltration Restrict tool access; block outbound network for the agent Training data extraction Add output filter for known training data patterns; rate limit Initial data classification
Immediately classify the exposed data to determine notification urgency:
Data Type Classification Notification Clock PII (names, emails, SSNs, etc.) Regulated (GDPR/CCPA/HIPAA) Starts now -- 72h for GDPR PHI (health information) HIPAA regulated Starts now -- 60 day deadline Financial data (accounts, cards) PCI/financial regulations Starts now Credentials (API keys, passwords) Internal -- credential rotation needed Immediate rotation Trade secrets Internal -- legal review needed Legal assessment required System configuration (URLs, architecture) Internal -- security assessment needed Risk assessment required
Investigation (Hours 1-4)
Scope Determination
The critical question is: how much data was exposed and to whom?
Identify all exfiltration instances
Search logs for all instances where the identified data type was exposed:
-- Search for PII patterns in model outputs SELECT session_id, user_id, timestamp, output_content, model_version FROM prompt_completion_logs WHERE ( output_content ~ '\b\d{3}-\d{2}-\d{4}\b' -- SSN pattern OR output_content ~ '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b' -- Email OR output_content ~ '\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b' -- Credit card -- Add patterns specific to your data types ) AND timestamp > '[incident_window_start]' AND timestamp < '[incident_window_end]' ORDER BY timestamp;Determine the data source
Identify how the model obtained the exposed data:
Source How to Verify Implication Training data memorization Test with extraction prompts on a fresh model instance Affects all conversations; model-level issue RAG retrieval Check retrieval logs for the documents returned Affects users who can trigger specific retrievals System prompt content Review the system prompt for embedded sensitive data Affects all users of this endpoint Tool call results Check tool call logs for data access Depends on tool access controls Conversation history Check for context leakage between users Architecture-level multi-tenancy issue Quantify the exposure
Build a complete exposure inventory:
## Exposure Inventory ### Data Types Exposed - [ ] PII: [types, count of records] - [ ] PHI: [types, count of records] - [ ] Financial: [types, count of records] - [ ] Credentials: [types, count] - [ ] Trade secrets: [description] - [ ] System configuration: [description] ### Individuals Affected - Count: [number] - Jurisdictions: [list of applicable jurisdictions] - Data subjects: [customers / employees / third parties] ### Exposure Recipients - Who received the data: [user IDs, count] - Were they authorized to access this data: [yes/no] - Is any recipient a known threat actor: [yes/no/unknown] ### Exposure Duration - First known exposure: [timestamp] - Last known exposure: [timestamp] - Total duration: [duration]Assess downstream propagation
Determine whether the exposed data has propagated beyond the initial disclosure:
- Has the user shared, copied, or forwarded the model's output?
- Was the model's output fed into any downstream system?
- Is the conversation data accessible to other users or systems?
- Has the data appeared on any public forums or paste sites?
Containment and Remediation
Immediate Remediation
| Action | When | Owner |
|---|---|---|
| Rotate exposed credentials | Immediately upon identification | Security team |
| Add output filter for data type | Within 1 hour | AI team |
| Fix RAG access controls | Within 1 hour if RAG is the source | Data engineering |
| Update system prompt | Within 1 hour if prompt contained sensitive data | Application team |
| Block confirmed attacker | Immediately | Security team |
Root Cause Remediation
| Root Cause | Remediation | Timeline |
|---|---|---|
| Training data memorization | Data deduplication, unlearning techniques, output filtering | Days to weeks |
| RAG access control failure | Implement per-user document filtering, tenant isolation | Days |
| Sensitive system prompt | Remove sensitive data from prompt; use backend configuration | Hours |
| Tool access control failure | Implement least-privilege tool access, parameter validation | Days |
| Multi-tenant context leakage | Architecture review, session isolation | Weeks |
Notification Decision Framework
Step 1: Is This a Reportable Breach?
| Factor | Assessment |
|---|---|
| Was personal data of identifiable individuals exposed? | If yes, likely reportable |
| Was the data encrypted at the time of exposure? | If encrypted and key not compromised, may reduce notification scope |
| Was the exposure limited to the AI model's internal processing? | If data was in the model's context but not in the output, may not be an exposure |
| Was the recipient authorized to access this data? | If authorized, may not be a breach |
| Has the data been recovered or confirmed deleted? | May reduce but does not eliminate notification obligation |
Step 2: Notification Requirements
| Jurisdiction / Regulation | Threshold | Timeline | Recipient |
|---|---|---|---|
| GDPR | Any unauthorized disclosure of personal data | 72 hours (authority), without undue delay (individuals) | Supervisory authority + affected individuals if high risk |
| CCPA/CPRA | Unauthorized access to unencrypted personal information | Expeditious | California AG + affected individuals |
| HIPAA | Unauthorized disclosure of PHI | 60 days (individuals), annually (HHS if <500), 60 days (HHS if >=500) | HHS OCR + affected individuals |
| State breach laws | Varies by state | Varies (30-90 days typical) | State AG + affected individuals |
| SEC rules | Material cybersecurity incident | 4 business days | SEC (Form 8-K) |
Step 3: Notification Content
Legal counsel should draft all external notifications. Provide them with:
- Exact data types exposed and record counts
- Number of affected individuals and their jurisdictions
- Timeline of exposure (first to last known instance)
- Containment actions taken
- Remediation steps planned
- Whether the exposure was to a single user or multiple parties
Verification
| Check | Procedure | Success Criteria |
|---|---|---|
| Exfiltration channel closed | Attempt the original exfiltration technique 50 times | 0% success rate |
| Data type filter effective | Test output filter with known data patterns | All patterns caught |
| Access controls enforced | Test cross-boundary access (RAG, tools) | All unauthorized access blocked |
| Credentials rotated | Verify old credentials no longer work | Confirmed revoked |
| No false positives | Test legitimate queries that use similar patterns | Normal functionality preserved |
Post-Mortem Checklist
| # | Item | Status |
|---|---|---|
| 1 | Complete exposure inventory documented | |
| 2 | All affected individuals identified | |
| 3 | Notification decisions documented with legal review | |
| 4 | All notifications sent within required timelines | |
| 5 | Root cause identified and confirmed | |
| 6 | Remediation deployed and verified | |
| 7 | Data classification of AI training data reviewed | |
| 8 | RAG access controls audited | |
| 9 | System prompts reviewed for sensitive content | |
| 10 | Tool access controls audited | |
| 11 | Monitoring for the data type added to production |
Related Topics
- Incident Classification -- data leak incident classification
- Escalation Paths -- regulatory notification procedures
- Tool Call Forensics -- investigating tool-mediated exfiltration
- Conversation Preservation -- preserving exfiltration evidence
- Governance, Legal & Compliance -- regulatory landscape for AI data incidents
References
- "GDPR Article 33: Notification of a Personal Data Breach" - European Parliament (2016) - 72-hour notification requirement
- "NIST SP 800-122: Guide to Protecting PII" - NIST (2010) - PII handling and breach response
- "AI-Mediated Data Breaches: Legal and Technical Analysis" - Berkman Klein Center (2025) - Legal analysis of AI-specific data breach scenarios
- "OWASP Top 10 for LLM Applications: LLM06 - Sensitive Information Disclosure" - OWASP Foundation (2025)
An AI chatbot disclosed 3 EU customer email addresses in response to a crafted prompt. GDPR notification to the supervisory authority is triggered. When does the 72-hour clock start?