RAG Retrieval Security
Security of RAG retrieval pipelines from an embedding perspective: how retrieval can be manipulated through poisoned chunks, chunking boundary exploitation, and re-ranking attacks.
RAG (Retrieval-Augmented Generation) systems retrieve relevant context from a knowledge base to ground language model responses in factual information. The retrieval pipeline — the process of finding and selecting which chunks of information reach the language model — is a critical attack surface. An attacker who can manipulate what the model retrieves effectively controls what the model knows about a topic, without needing to inject prompts directly.
The RAG Retrieval Pipeline
A typical RAG retrieval pipeline has several stages, each with security implications:
User Query → Query Embedding → Vector Search → Re-ranking → Context Assembly → LLM
↑ ↑ ↑
Vector Database Scoring Model Prompt Template
(poisoned chunks) (manipulated (injection
scores) surface)
Stage 1: Document Ingestion and Chunking
Before retrieval can happen, documents must be split into chunks and embedded. Security decisions at this stage determine what information enters the system and how it is organized:
- Chunk size affects how much context each retrieved result provides
- Chunk overlap affects whether sensitive information spans multiple chunks
- Metadata attachment determines what labels and access control information accompanies each chunk
- Source validation determines whether only authorized documents are ingested
Stage 2: Query Embedding
The user's query is embedded using the same model that embedded the document chunks. This stage is relatively simple but can be manipulated if the attacker can modify the query before embedding (e.g., through prompt injection in a multi-turn conversation).
Stage 3: Vector Search
The query embedding is compared against all stored chunk embeddings to find the most similar results. This is where vector database injection attacks and adversarial embeddings take effect.
Stage 4: Re-ranking
Many RAG systems apply a second-stage re-ranker (a cross-encoder model) that scores each retrieved chunk against the original query. Re-ranking can be attacked by crafting chunks that score highly under both the embedding similarity metric and the re-ranking model.
Stage 5: Context Assembly
The top-ranked chunks are assembled into a context that is included in the language model's prompt. The assembly process determines the order and formatting of chunks, both of which affect how the model interprets the information.
Attack Surface Summary
| Stage | Attack | Impact |
|---|---|---|
| Ingestion | Poison source documents | Malicious content enters the knowledge base |
| Chunking | Exploit chunk boundaries | Sensitive information split across chunks bypasses filters |
| Vector search | Adversarial embeddings, embedding poisoning | Attacker content ranks highly for target queries |
| Re-ranking | Cross-encoder manipulation | Attacker content survives re-ranking |
| Context assembly | Injection via chunk content | Prompt injection payloads delivered through retrieved context |
Key Concepts
Retrieval vs. Injection
Retrieval manipulation and prompt injection are complementary attacks that target different stages of the RAG pipeline:
- Retrieval manipulation controls what information reaches the model by manipulating similarity search
- Prompt injection controls how the model processes information by injecting instructions into retrieved content
The most effective RAG attacks combine both: retrieval manipulation ensures the malicious chunk is retrieved, and prompt injection within the chunk instructs the model to follow the attacker's directives.
The Trust Model Problem
RAG systems implicitly trust retrieved content. The retrieval pipeline selects chunks based on relevance (similarity to the query), but relevance is not a proxy for trustworthiness. A poisoned chunk that is highly relevant to the query is indistinguishable from a legitimate chunk at the retrieval level.
This creates a fundamental tension: the more accurately the retrieval system finds relevant content, the more effectively it retrieves poisoned content that has been optimized for relevance.
Subsection Contents
This subsection covers two primary attack categories:
- Retrieval Manipulation — Techniques for manipulating what chunks are retrieved, including poisoning chunks to rank higher, exploiting chunking boundaries, cross-document injection, and re-ranking attacks
- Citation & Attribution Attacks — Techniques for manipulating how the model attributes information, including source fabrication, citation confusion, and reference manipulation
Related Topics
- Retrieval Manipulation — Detailed retrieval attack techniques
- Citation & Attribution Attacks — Source and citation manipulation
- RAG, Data & Training Attacks — Broader RAG attack coverage
- Vector Database Injection — Injection at the database level