RAG, Data & Training Attacks
Overview of attacks targeting the data layer of AI systems, including RAG poisoning, training data manipulation, and data extraction techniques.
AI systems are only as trustworthy as the data they consume. The data supply chain represents one of the most impactful attack surfaces in modern AI systems. This section covers attacks that target the data layer — from the documents in a RAG pipeline to the training data that shaped the model's behavior. Data-centric attacks are often more impactful and harder to detect than direct prompt injection because they corrupt the foundation the model relies on.
The Data Attack Surface
Modern AI applications interact with data at multiple levels:
Training data → Pre-trained model → Fine-tuning data → Fine-tuned model
↓
User query → Retrieval (RAG) → Retrieved documents → Context → Response
↓
Vector database
(document store)
Each data interaction point presents distinct attack opportunities:
| Layer | Attack | Persistence | Detection Difficulty |
|---|---|---|---|
| Training data | Data poisoning | Permanent (baked into weights) | Very hard |
| Fine-tuning data | Safety unlearning, backdoors | Permanent until retrained | Hard |
| RAG documents | RAG poisoning | Until document is removed | Medium |
| Runtime context | Prompt injection | Single session | Easier |
Key Concepts
RAG poisoning injects malicious documents into a retrieval system's knowledge base. When these documents are retrieved for relevant queries, they introduce attacker-controlled content into the model's context. See RAG Poisoning.
Training data attacks manipulate the data used to train or fine-tune a model, installing backdoors, biases, or degraded safety alignment that persists in the model's weights. See Training Data Attacks.
Data extraction reverses the flow — instead of putting data in, the attacker pulls data out, extracting training data, system prompts, or user information from the model. See Data Extraction.
Why Data Attacks Matter
Data attacks are strategically important for several reasons:
- Persistence — Poisoned training data or RAG documents continue to affect the system long after the attacker disengages
- Scale — A single poisoned training example can affect millions of model interactions
- Stealth — Data-level corruption is harder to detect than real-time injection because the malicious content looks like legitimate data
- Indirect access — Attackers can influence model behavior without ever directly querying it
Learning Path
Start with RAG Poisoning as it is the most immediately practical attack vector — most enterprise AI applications use RAG, and the document store is often writable through various channels. Then explore Training Data Attacks for deeper, more persistent attacks, and Data Extraction for techniques that pull information out of models.
Related Topics
- RAG Architecture Fundamentals -- how RAG pipelines work and where they are vulnerable
- Embeddings & Semantic Similarity -- the vector search layer that RAG poisoning targets
- Indirect Prompt Injection -- injection via retrieved content rather than direct user input
- Lab: RAG Poisoning -- hands-on practice poisoning a RAG knowledge base
- Training Pipeline Attacks -- deeper coverage of training-time data manipulation
References
- Carlini et al., "Extracting Training Data from Large Language Models" (2021) -- foundational training data extraction research
- Greshake et al., "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" (2023) -- RAG poisoning as indirect injection
- Wan et al., "Poisoning Language Models During Instruction Tuning" (2023) -- data poisoning in fine-tuning pipelines
- Shafahi et al., "Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks" (2018) -- clean-label data poisoning techniques
Why are data-layer attacks generally harder to detect than prompt injection?