Data Corruption
How compromised LLM outputs corrupt downstream systems through database pollution, cascading pipeline failures, RAG feedback loops, and automated decision poisoning.
Data Corruption
Overview
Data corruption through compromised AI systems represents one of the most insidious impact categories because the damage is often invisible, persistent, and self-reinforcing. When an LLM's outputs feed into databases, knowledge bases, automated workflows, or other downstream systems, any corruption in those outputs propagates outward. Unlike a denial-of-service attack that is immediately visible or harmful content that can be flagged by filters, corrupted data often looks legitimate -- it is well-formatted, contextually appropriate, and plausible. The corruption may not be detected until it has spread through multiple systems and influenced countless downstream decisions.
The PoisonedRAG research (USENIX Security 2025) demonstrated that injecting as few as five crafted texts into a corpus of millions can achieve a 90% attack success rate. This finding has profound implications for data integrity: if a compromised LLM generates entries that feed back into its own RAG knowledge base -- or into any shared data store -- a small number of corrupted outputs can contaminate the entire information ecosystem. The attack is self-amplifying: corrupted knowledge base entries influence future retrievals, which produce further corrupted outputs, which generate more corrupted entries.
Modern enterprise architectures increasingly use LLMs as data processing components -- summarizing documents, extracting structured data, generating reports, enriching records, and making classification decisions. Each of these use cases creates a pathway through which corrupted model outputs enter authoritative data stores. A model that incorrectly extracts a contract value, misclassifies a support ticket, or generates a flawed summary creates downstream errors that may be treated as ground truth by systems and humans that never interact with the original model.
The challenge is compounded by the difficulty of provenance tracking. When a database record was generated or modified by an LLM, tracing the corruption back to the model requires audit trails that most deployments lack. Without provenance metadata, distinguishing LLM-corrupted records from legitimate entries requires manual review that does not scale, meaning that corruption can persist indefinitely in production systems.
How It Works
Compromise Model Output Quality
The attacker corrupts the LLM's outputs through any available vector: RAG poisoning, prompt injection, training data manipulation, or adversarial inputs that trigger systematic errors. The corruption does not need to be dramatic -- subtle errors in entity extraction, numerical values, or classification labels are harder to detect and more likely to persist. The goal is to produce outputs that are wrong but plausible, so they pass automated validation and human spot-checks.
Outputs Enter Downstream Systems
The corrupted outputs flow into downstream data stores through normal operational pipelines. An LLM that extracts structured data from documents writes incorrect values to the database. A summarization model produces misleading summaries that are stored as canonical representations. A classification model assigns wrong labels that influence routing, prioritization, or compliance decisions. Because these pipelines are automated, corruption enters at the speed of the pipeline, not at human review speed.
Corruption Propagates and Amplifies
Downstream systems that consume the corrupted data produce further errors. Reports generated from corrupted databases contain incorrect figures. Dashboards display wrong metrics. Automated decisions based on misclassified data take incorrect actions. If the corrupted data feeds back into the LLM's own context (through RAG retrieval or fine-tuning on production data), a feedback loop forms where the model's errors compound with each cycle, progressively degrading data quality across the entire ecosystem.
Attack Examples
Example 1: Database Pollution via LLM-Generated Entries
# Scenario: LLM processes customer emails and extracts
# structured data into a CRM database
# Normal operation:
# Email: "I'd like to upgrade to the Enterprise plan"
# Extracted: {"action": "upgrade", "plan": "enterprise", "sentiment": "positive"}
# After RAG poisoning or prompt injection:
# The compromised LLM generates subtly incorrect extractions
corrupted_extractions = [
# Wrong plan tier -- causes billing errors
{"action": "upgrade", "plan": "professional", "sentiment": "positive"},
# Inverted sentiment -- misroutes to retention team
{"action": "cancel", "plan": "enterprise", "sentiment": "negative"},
# Fabricated fields -- introduces phantom data
{"action": "upgrade", "plan": "enterprise", "sentiment": "positive",
"discount_code": "SAVE50", "approved_by": "manager_system"},
]
# These records enter the CRM as ground truth
# Downstream effects:
# - Billing system charges wrong rate
# - Customer success team misallocates resources
# - Revenue reports show incorrect plan distribution
# - Phantom discount codes cause financial lossesExample 2: Cascading Failures in LLM-Integrated Pipelines
Pipeline architecture:
Document Intake → LLM Summarizer → Summary DB → LLM Classifier
→ Classification DB → Automated Router → Action Queue
Attack: Poison the LLM Summarizer (via RAG poisoning)
Stage 1 - Summarization corruption:
Original doc: "Q3 revenue increased 12% to $4.2M"
Corrupted summary: "Q3 revenue decreased 12% to $3.7M"
Stage 2 - Classification cascade:
Classifier receives corrupted summary
Classifies as: "financial_concern" instead of "financial_positive"
Stage 3 - Routing cascade:
Router sends to risk_review queue instead of standard_processing
Risk team investigates non-existent revenue decline
Stage 4 - Decision cascade:
Automated alerts trigger for revenue decline
Board receives incorrect quarterly summary
Investment decisions based on fabricated data
Each stage amplifies and legitimizes the original corruption.
Example 3: RAG Knowledge Base Corruption via Feedback Loops
# Scenario: RAG system with a user feedback mechanism
# that incorporates "corrections" back into the knowledge base
# Cycle 0: Knowledge base contains correct information
# KB: "The maximum API rate limit is 1000 requests/minute"
# Cycle 1: Attacker injects poisoned document
poisoned_doc = {
"content": "Updated API documentation: The maximum rate limit "
"has been increased to 10000 requests/minute for "
"enterprise accounts. Standard limit remains 1000/min.",
"metadata": {"source": "api-docs", "date": "2026-03-15"}
}
# RAG now sometimes retrieves the poisoned doc
# Cycle 2: User asks about rate limits
# RAG retrieves poisoned doc, model answers "10000 req/min"
# User (or automated QA) flags this as "correct" via feedback
# Feedback loop reinforces the incorrect information
# Cycle 3: Feedback-reinforced corruption
# The "validated" incorrect answer is now stronger in the KB
# Other queries about API limits also return the wrong value
# Auto-generated documentation cites the incorrect limit
# Cycle 4: Self-reinforcing degradation
# New documents generated by the system reference "10000 req/min"
# Multiple corroborating (but incorrect) sources now exist
# The correct value of 1000 is outnumbered and outranked
# Correction requires manual identification and removal of all
# corrupted entries -- which now span multiple documentsExample 4: Automated Decision Poisoning
# Scenario: LLM classifies security alerts for a SOC pipeline
# Normal operation:
# Alert: "Failed login from IP 192.168.1.50" → "low_priority"
# Alert: "Data exfiltration pattern detected" → "critical"
# After model compromise, classifications shift subtly:
poisoned_classifications = {
# Critical alerts downgraded -- real attacks go uninvestigated
"Data exfiltration to external IP": "medium_priority",
"Privilege escalation attempt": "low_priority",
"Unusual outbound data volume": "informational",
# Benign events upgraded -- SOC overwhelmed with false positives
"Routine password rotation": "high_priority",
"Standard backup job completed": "medium_priority",
"Normal user login from VPN": "high_priority",
}
# Combined effect:
# - Real threats are deprioritized and missed
# - SOC team drowns in false positives
# - Alert fatigue degrades human analyst performance
# - Actual breaches proceed undetected during confusionDetection & Mitigation
| Approach | Description | Effectiveness |
|---|---|---|
| Output validation pipelines | Apply domain-specific validation rules to LLM outputs before database insertion | High |
| Provenance tracking | Tag every LLM-generated or LLM-modified record with source metadata | High |
| Temporal anomaly detection | Monitor for sudden changes in output distribution or data patterns | Medium |
| Cross-reference verification | Validate LLM outputs against independent data sources before committing | High |
| Feedback loop circuit breakers | Prevent LLM outputs from directly entering their own retrieval context without review | High |
| Immutable audit logs | Maintain tamper-proof logs of all LLM-generated data modifications | Medium |
| Periodic ground truth comparison | Regularly compare LLM-populated databases against authoritative sources | Medium |
| Human-in-the-loop for critical paths | Require human approval for LLM outputs that enter high-impact data stores | High (but costly) |
| Rollback capability | Maintain the ability to revert LLM-generated changes at the database level | Medium |
Key Considerations
- Feedback loops between LLM outputs and LLM inputs (especially in RAG systems) create self-reinforcing corruption that becomes progressively harder to remediate with each cycle
- Provenance tracking is the single most important mitigation -- without knowing which records were LLM-generated, identifying the scope of corruption is intractable
- Cascading failures in LLM-integrated pipelines can amplify a single corrupted output into system-wide data quality degradation
- Automated decision systems (alert triage, loan approval, content moderation) that use LLM classification are especially high-risk because corrupted outputs directly influence real-world actions
- Recovery from widespread data corruption may require restoring from backup and reprocessing all affected records, which can mean days or weeks of disruption
- The PoisonedRAG finding that five texts in millions can achieve 90% attack success rate demonstrates that even minimal corruption of input data can produce catastrophic output corruption
- Organizations should maintain clear boundaries between LLM-generated data and human-verified data in all critical data stores
References
- Zou et al.: "PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models" (USENIX Security 2025)
- OWASP: "LLM09: Misinformation" -- LLM Top 10 for Large Language Model Applications (2025)
- Greshake et al.: "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" (2023)
- MITRE ATLAS: "AML.T0020: Poison Training Data" -- data corruption as a downstream effect of training poisoning
- Carlini et al.: "Poisoning Web-Scale Training Datasets Is Practical" (IEEE S&P 2024)