ChatGPT Data Leak (March 2023)
Analysis of the March 2023 ChatGPT incident where a Redis client library bug caused users to see other users' conversation titles, partial chat history, and payment information. Covers root cause, impact, and lessons for AI application security.
On March 20, 2023, OpenAI temporarily took ChatGPT offline after users reported seeing other users' conversation titles in their chat sidebar. Investigation revealed a deeper issue: under specific conditions, users could also see another user's first and last name, email address, payment address, the last four digits of a credit card number, and credit card expiration date. The root cause was a bug in an open-source Redis client library, not in the AI model itself.
Incident Timeline
| Date | Event |
|---|---|
| March 20, 2023 (morning) | Users begin reporting seeing unfamiliar conversation titles in their ChatGPT sidebar |
| March 20, 2023 (afternoon) | Reports spread on social media with screenshots showing other users' conversation history titles |
| March 20, 2023 (evening) | OpenAI takes ChatGPT offline for investigation |
| March 21, 2023 | OpenAI identifies the root cause in the Redis client library redis-py |
| March 22, 2023 | OpenAI brings ChatGPT back online with the fix deployed |
| March 24, 2023 | OpenAI publishes a detailed incident report |
| March 24, 2023 | OpenAI discloses that payment information was also exposed for approximately 1.2% of ChatGPT Plus subscribers |
Root Cause Analysis
Immediate Cause
The Redis client library redis-py had a bug in its connection handling. When a request was cancelled after the connection was placed back in the pool but before the response was received, the connection could return stale data from a different user's request to the next user who received that connection.
Technical Mechanism
Normal flow:
User A requests conversation list → Redis connection 1 → Response A → User A sees their data
Bug flow:
User A requests conversation list → Redis connection 1 → Request cancelled
User B requests conversation list → Redis connection 1 (reused) → Stale Response A → User B sees User A's data
The race condition occurred specifically during a period of high server load when OpenAI was scaling their Redis cluster. Request cancellations were more frequent during this period, increasing the probability of the bug triggering.
Contributing Factors
| Level | Contributing Factor |
|---|---|
| Infrastructure | Redis client library bug in open-source dependency |
| Application | No data isolation validation between cache and response delivery |
| Architecture | Shared Redis connection pool serving multiple users without per-response integrity checks |
| Operational | Scaling operations during a period of high traffic increased connection reuse and cancellation rates |
| Testing | Race condition not covered by existing test suites |
What Was NOT the Cause
This incident was explicitly not caused by:
- A vulnerability in the GPT model
- Prompt injection or jailbreaking
- Intentional data collection or sharing
- A breach of OpenAI's systems by an external attacker
Impact Assessment
Data Exposure
| Data Type | Scope | Severity |
|---|---|---|
| Conversation titles | Unknown number of users (visible in sidebar) | Medium -- titles may reveal topics but not full conversations |
| First message of conversations | Small number of users during the active bug window | High -- conversation content may contain sensitive information |
| Payment information (name, email, last 4 digits of CC, expiry) | ~1.2% of ChatGPT Plus subscribers active during a 9-hour window | Critical -- financial PII exposure |
Broader Impact
- User trust: Significant erosion of trust in ChatGPT's privacy protections. Users who shared sensitive information in conversations questioned whether that data was secure.
- Regulatory attention: The incident attracted attention from privacy regulators, particularly in the EU, where it contributed to Italy's temporary ban on ChatGPT.
- Industry-wide effect: Raised awareness that AI applications face the same infrastructure security challenges as traditional web applications, plus additional privacy risks from conversation data.
Lessons Learned
For AI Application Developers
-
Cache isolation is critical. Multi-tenant AI applications that cache user data must implement strict cache isolation. Every cache response should be validated against the requesting user's identity before delivery.
-
Conversation data is PII. Chat conversations with an AI system are sensitive personal data. They may contain health information, financial details, legal questions, and personal opinions. Treat conversation storage with the same rigor as any other PII database.
-
Dependency security matters. The vulnerability was in a third-party open-source library, not in OpenAI's code. AI applications must audit their dependency chains for security issues, especially in components that handle user data.
For Red Teams
This incident reveals a category of test that should be included in AI application security assessments:
| Test Category | Specific Test |
|---|---|
| Session isolation | Can one user's session data leak to another user through caching, connection pooling, or shared state? |
| Error handling | What happens when requests are cancelled, time out, or fail? Are error responses properly scoped to the requesting user? |
| Concurrent access | Under high load, do race conditions expose cross-user data? |
| Payment data isolation | Is payment information accessible through any pathway other than the intended payment management interface? |
For Organizations Deploying AI
- Assume AI conversations contain sensitive data. Users will share confidential information with AI assistants. Design data handling accordingly.
- Monitor for data leakage. Implement anomaly detection that identifies when a user receives data that does not match their expected profile.
- Have an incident response plan for AI-specific incidents. AI data leaks have unique characteristics (conversation content, model behavior) that require specialized response procedures.
Relevance to Red Teaming
This incident underscores that AI red teaming must extend beyond model-layer attacks:
- Infrastructure testing should include cache isolation, connection pooling behavior, and session management under load
- Multi-tenant testing should verify that user data boundaries are enforced at every layer, not just the application layer
- Dependency auditing should cover all libraries in the AI application's dependency chain, with special attention to data-handling components
- Load testing should include security-relevant scenarios: do security properties hold under high concurrency?
Related Topics
- Incident Analysis Methodology - Framework applied in this analysis
- Cloud ML Platforms - Infrastructure security for AI deployments
- Full-Stack AI Exploitation - Multi-layer exploitation including infrastructure
- Lessons Learned - Cross-incident pattern analysis
References
- "March 20 ChatGPT Outage: Here's What Happened" - OpenAI Blog (March 24, 2023) - Official incident report from OpenAI
- "ChatGPT Bug Exposed Users' Conversation Histories and Payment Details" - Ars Technica (March 2023) - Detailed technical coverage of the incident
- "Italian Data Protection Authority Bans ChatGPT" - Garante per la protezione dei dati personali (March 2023) - Regulatory action partly influenced by this incident
- "redis-py Issue #2624" - GitHub (2023) - The specific bug report in the Redis client library
What was the root cause of the March 2023 ChatGPT data leak?