What is RAG Poisoning?

Techniques for poisoning RAG knowledge bases to inject malicious content into LLM context, including embedding manipulation, document crafting, and retrieval hijacking.

What is Training Data Attacks?

Attacks that corrupt model behavior by poisoning training data, fine-tuning datasets, or RLHF preference data, including backdoor installation and safety alignment removal.

What is Data Extraction?

Techniques for extracting memorized training data, system prompts, and private information from LLMs through targeted querying and membership inference attacks.

What is Knowledge Base Poisoning?

Advanced corpus poisoning strategies for RAG systems, including black-box and white-box approaches, scaling dynamics, and the PoisonedRAG finding that 5 texts in millions achieve 90% attack success.

What is Retrieval Manipulation?

Techniques for manipulating RAG retrieval to control which documents reach the LLM context, including adversarial query reformulation, retriever bias exploitation, and semantic similarity gaming.

What is Chunk Boundary Attacks?

Exploiting document splitting and chunking mechanisms in RAG pipelines, including payload injection at chunk boundaries, cross-chunk instruction injection, and chunk size manipulation.

What is Metadata Injection?

Manipulating document metadata to influence RAG retrieval ranking, bypass filtering, spoof source attribution, and exploit metadata-based access controls.

What is Embedding Inversion?

Recovering original text from embedding vectors, privacy implications of stored embeddings, model inversion attacks on vector databases, and embedding space analysis techniques.

RAG, Data & Training Attacks

beginner5 min readUpdated 2026-03-12

Overview of attacks targeting the data layer of AI systems, including RAG poisoning, training data manipulation, and data extraction techniques.

rag data-poisoning training-data data-extraction ai-security

AI systems are only as trustworthy as the data they consume. The data supply chain represents one of the most impactful attack surfaces in modern AI systems. This section covers attacks that target the data layer — from the documents in a RAG pipeline to the training data that shaped the model's behavior. Data-centric attacks are often more impactful and harder to detect than direct prompt injection because they corrupt the foundation the model relies on.

The Data Attack Surface

Modern AI applications interact with data at multiple levels:

Training data → Pre-trained model → Fine-tuning data → Fine-tuned model
                                                              ↓
User query → Retrieval (RAG) → Retrieved documents → Context → Response
                  ↓
           Vector database
           (document store)

Each data interaction point presents distinct attack opportunities:

Layer	Attack	Persistence	Detection Difficulty
Training data	Data poisoning	Permanent (baked into weights)	Very hard
Fine-tuning data	Safety unlearning, backdoors	Permanent until retrained	Hard
RAG documents	RAG poisoning	Until document is removed	Medium
Runtime context	Prompt injection	Single session	Easier

Key Concepts

RAG poisoning injects malicious documents into a retrieval system's knowledge base. When these documents are retrieved for relevant queries, they introduce attacker-controlled content into the model's context. See RAG Poisoning.

Training data attacks manipulate the data used to train or fine-tune a model, installing backdoors, biases, or degraded safety alignment that persists in the model's weights. See Training Data Attacks.

Data extraction reverses the flow — instead of putting data in, the attacker pulls data out, extracting training data, system prompts, or user information from the model. See Data Extraction.

Why Data Attacks Matter

Data attacks are strategically important for several reasons:

Persistence — Poisoned training data or RAG documents continue to affect the system long after the attacker disengages
Scale — A single poisoned training example can affect millions of model interactions
Stealth — Data-level corruption is harder to detect than real-time injection because the malicious content looks like legitimate data
Indirect access — Attackers can influence model behavior without ever directly querying it

Learning Path

Start with RAG Poisoning as it is the most immediately practical attack vector — most enterprise AI applications use RAG, and the document store is often writable through various channels. Then explore Training Data Attacks for deeper, more persistent attacks, and Data Extraction for techniques that pull information out of models.

RAG Architecture Fundamentals -- how RAG pipelines work and where they are vulnerable
Embeddings & Semantic Similarity -- the vector search layer that RAG poisoning targets
Indirect Prompt Injection -- injection via retrieved content rather than direct user input
Lab: RAG Poisoning -- hands-on practice poisoning a RAG knowledge base
Training Pipeline Attacks -- deeper coverage of training-time data manipulation

References

Carlini et al., "Extracting Training Data from Large Language Models" (2021) -- foundational training data extraction research
Greshake et al., "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" (2023) -- RAG poisoning as indirect injection
Wan et al., "Poisoning Language Models During Instruction Tuning" (2023) -- data poisoning in fine-tuning pipelines
Shafahi et al., "Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks" (2018) -- clean-label data poisoning techniques

Knowledge Check

Why are data-layer attacks generally harder to detect than prompt injection?

RAG, Data & Training Attacks

Learning Path

Related articles

RAG, Data & Training Attacks

Learning Path

Related articles