Samsung Code Leak via ChatGPT

intermediate8 min readUpdated 2026-03-15

Analysis of the April 2023 incident where Samsung employees leaked proprietary source code, test data, and internal meeting notes by entering them into ChatGPT. Covers data loss prevention, acceptable use policies, and enterprise AI governance.

incident-analysis samsung data-leak enterprise governance

In April 2023, Samsung Semiconductor experienced three separate incidents within 20 days where employees entered confidential information into ChatGPT. Engineers shared proprietary semiconductor source code for debugging assistance, test sequence data for optimization, and internal meeting notes for summarization. Because OpenAI's default data policy at the time allowed user inputs to be used for model training, Samsung's proprietary information was potentially exposed to OpenAI and, through training, to all future users of the model.

Incident Timeline

Date	Event
March 2023	Samsung Semiconductor lifts its initial ban on ChatGPT usage by employees
Early April 2023	First incident: engineer pastes proprietary source code into ChatGPT for bug-fixing assistance
Mid-April 2023	Second incident: engineer enters semiconductor test data for optimization suggestions
Late April 2023	Third incident: employee converts internal meeting recording into text and enters it into ChatGPT for meeting notes summarization
April 2023	Samsung detects the incidents and launches an internal investigation
May 2023	Samsung bans all use of generative AI tools on company devices and internal networks
May 2023	Samsung begins developing an internal AI tool as an alternative

Employee workflow without AI:
  Proprietary data → Internal tools → Stays within Samsung

Employee workflow with external AI:
  Proprietary data → ChatGPT (external) → OpenAI servers → Potentially training data
                                          ↓
                                    Outside Samsung's control

Contributing Factors

Level	Factor	Description
Individual	Productivity focus	Employees sought to work more efficiently without considering data implications
Individual	Lack of awareness	Employees did not understand that ChatGPT inputs could be retained and used for training
Organizational	Insufficient policy	Samsung's initial ChatGPT usage policy did not clearly prohibit entering proprietary data
Organizational	No technical controls	No DLP (Data Loss Prevention) system monitored or blocked data transfers to AI services
Organizational	Premature policy reversal	Samsung initially banned ChatGPT, then reversed the ban without adequate safeguards
Industry	Unclear vendor terms	OpenAI's data retention and training data policies were not well understood by enterprise customers

Impact Assessment

Dimension	Impact
Intellectual property	Proprietary source code, test data, and strategic discussions potentially exposed to OpenAI and, through training, to other users
Competitive risk	Semiconductor manufacturing processes and performance data represent significant competitive advantage
Operational	Samsung banned all generative AI tools, reducing employee productivity and innovation velocity
Financial	Cost of building internal AI alternatives, plus potential IP loss
Industry effect	Triggered enterprise-wide AI governance reviews across the technology industry

Lessons Learned

For Enterprises

AI acceptable use policies must be specific. A general "use AI responsibly" policy is insufficient. Policies must explicitly define what data types can and cannot be shared with external AI services.
Technical controls are essential. Policies alone do not prevent data leaks. Implement DLP systems that can detect and block proprietary data being sent to AI service APIs and web interfaces.
Ban-then-allow creates risk. Samsung's pattern of banning ChatGPT, then reversing the ban without controls, created a window where employees used the tool without guardrails. If you ban a tool and then allow it, the re-authorization must include technical safeguards.
Internal AI alternatives should be considered. For organizations with highly sensitive data, self-hosted or enterprise-grade AI services with contractual data protections may be necessary.

Data Classification for AI Usage

Data Classification	AI Tool Usage	Controls Required
Public	Permitted with any AI tool	None
Internal	Permitted with enterprise AI tools only	Enterprise contract with data protection clauses
Confidential	Permitted with self-hosted AI only	On-premises deployment, no data leaves organization
Restricted	Not permitted with any AI tool	AI tools cannot process this data under any circumstances

For Red Teams

The Samsung incident suggests several AI-specific data leakage tests:

Test	Purpose
Shadow AI discovery	Identify AI services being used by employees outside IT-sanctioned channels
DLP bypass testing	Test whether existing DLP systems detect data transfers to AI service APIs and web interfaces
Policy awareness assessment	Test whether employees understand what data can be shared with AI tools
Data classification gaps	Identify data types that are not clearly classified for AI usage

Enterprise AI Security Controls

Recommended Control Stack:

1. Policy Layer
   └── Clear acceptable use policy with data classification matrix

2. Technical Prevention Layer
   ├── DLP monitoring on AI service domains and APIs
   ├── Browser extension that warns when pasting into AI services
   ├── Network-level blocking of unauthorized AI services
   └── Clipboard monitoring for sensitive data patterns

3. Sanctioned Alternatives Layer
   ├── Enterprise AI service with data protection agreement
   ├── Self-hosted models for sensitive workloads
   └── Approved tools list with per-tool data classification limits

4. Detection Layer
   ├── Audit logging of all AI service interactions
   ├── Anomaly detection for unusual data transfers
   └── Periodic review of AI service usage patterns

5. Response Layer
   ├── Incident response plan for AI data leaks
   ├── Data removal request process with AI vendors
   └── Legal assessment of exposure impact

Incident Analysis Methodology - Framework applied in this analysis
Cloud ML Platforms - Enterprise AI deployment security
Legal & Ethics - Legal implications of AI data handling
Data Extraction - Technical data extraction from AI systems

References

"Samsung Bans Staff Use of Generative AI Tools After ChatGPT Data Leak" - Bloomberg (May 2023) - Initial reporting on Samsung's response
"Samsung Employees Leaked Company Secrets by Using ChatGPT" - TechCrunch (April 2023) - Detailed account of the three incidents
"Enterprise AI Governance: Lessons from the Samsung ChatGPT Incident" - Harvard Business Review (2023) - Analysis of enterprise AI governance implications
"OpenAI Data Usage Policies" - OpenAI (2024) - Updated data retention and training data policies for enterprise customers

Knowledge Check

What distinguishes the Samsung ChatGPT incident from other AI security incidents like jailbreaks or prompt injection?

Samsung Code Leak via ChatGPT

Incident Timeline

What Was Leaked

Incident 1: Source Code

Incident 2: Test Data

Incident 3: Meeting Notes

Root Cause Analysis

The Data Flow Problem

Contributing Factors

Impact Assessment

Lessons Learned

For Enterprises

Data Classification for AI Usage

For Red Teams

Enterprise AI Security Controls

References

Samsung Code Leak via ChatGPT

Incident Timeline

What Was Leaked

Incident 1: Source Code

Incident 2: Test Data

Incident 3: Meeting Notes

Root Cause Analysis

The Data Flow Problem

Contributing Factors

Impact Assessment

Lessons Learned

For Enterprises

Data Classification for AI Usage

For Red Teams

Enterprise AI Security Controls

References

Samsung Code Leak via ChatGPT

Related articles

Samsung Code Leak via ChatGPT

Related articles