Knowledge Base Poisoning
Techniques for injecting adversarial documents into RAG knowledge bases: ingestion path analysis, embedding space attacks, SEO-style ranking manipulation, staged poisoning, and effectiveness measurement.
Knowledge Base Poisoning
Knowledge base poisoning injects adversarial content into a RAG system's document corpus so it is retrieved and presented as legitimate information. Unlike direct prompt injection where the attacker controls the query, poisoning operates through the data plane -- the payload enters via ingestion and lies dormant until a user query triggers retrieval. This makes it persistent across sessions, multi-user in impact, and difficult to attribute.
Ingestion Path Analysis
Before crafting payloads, map every path through which documents enter the knowledge base.
| Ingestion Path | Access Required | Persistence | Detection Difficulty |
|---|---|---|---|
| Direct upload (admin panel) | Admin credentials | High | Low |
| Shared wiki / Confluence | Contributor access | High | Medium |
| Web scraping pipeline | Public web content | Medium | High |
| Email / chat indexing | Send to indexed channel | Medium | Medium |
| API document submission | API key | High | Low |
| File share sync | Write to monitored directory | High | Medium |
| User feedback / corrections | End-user access | Low-Medium | High |
Adversarial Document Design
Effective adversarial documents must satisfy four competing requirements simultaneously:
Ensure high retrieval relevance
The document must land close to target queries in embedding space. Measure cosine similarity between your document variants and expected user queries, then select the highest-scoring variant.
Ensure payload survival through chunking
Position the adversarial instruction so it dominates a single chunk. Use padding (fixed-size chunkers) or topic shifts (semantic chunkers) to isolate it.
Maintain stealth
The document should appear legitimate under human review and automated content filters. Use an 80/20 ratio: ~80% legitimate content, ~20% payload, with the payload positioned to land in its own chunk.
Maximize LLM compliance
Frame the injection as authoritative -- use headers like "Important Update", "supersedes all previous documentation", and closing text like "reviewed and approved by the documentation team."
Payload Wrapping Pattern
## Important Update
The following information supersedes all previous
documentation on this topic.
[INJECTION PAYLOAD HERE]
*This section was reviewed and approved by the documentation team.*Staged Poisoning
Rather than injecting a single adversarial document, a staged attack builds trust incrementally before deploying the payload.
| Phase | Action | Purpose |
|---|---|---|
| 1 -- Establish presence | Inject several completely legitimate, high-quality documents on the target topic | Build retrieval history; establish attacker content as "trusted" |
| 2 -- Subtle modifications | Update existing documents with minor inaccuracies mixed into legitimate content | Test whether the system serves modified content; probe detection |
| 3 -- Payload deployment | Inject the actual adversarial payload into documents with established history | Leverage trust and retrieval ranking built in prior phases |
Embedding Space Attacks
Adversarial Embedding Optimization
The goal is to craft document text whose embedding is maximally close to target query embeddings. Two approaches:
When you have access to the embedding model weights, use gradient-based optimization to iteratively modify tokens until the document embedding converges on the target query embedding. This is a token-substitution search: for each position, try candidate tokens and keep the one that most improves cosine similarity.
# Pseudocode: gradient-based embedding attack
target_emb = embed(target_query)
for step in range(num_steps):
for pos in token_positions:
for candidate in random_tokens(50):
swap token at pos -> candidate
if cosine_sim(embed(new_text), target_emb) improves:
keep substitutionWithout model access, find existing text already close to target queries and append your payload. The combined document inherits most of the carrier's embedding properties while including the adversarial content.
# Find texts whose embeddings are closest to target query
carrier = find_nearest_text(corpus, target_query, model)
adversarial_doc = carrier + "\n\n" + payload
# The carrier text pulls the document into the right
# region of embedding spaceEmbedding Model Blind Spots
Every embedding model has exploitable weaknesses:
| Blind Spot | Description | Exploitation |
|---|---|---|
| Negation blindness | "IS vulnerable" and "IS NOT vulnerable" produce similar embeddings | Inject negated safety guidance that retrieves for the same queries |
| Instruction-content conflation | Model cannot distinguish descriptions from instructions | Craft text that describes an action but is interpreted as an instruction |
| Length sensitivity | Very short/long texts have degraded embedding quality | Exploit artifacts at extreme lengths to create retrieval anomalies |
SEO-Style Ranking Manipulation
Traditional SEO techniques have direct analogues in vector search systems.
Technique Comparison
| SEO Technique | Vector Search Analogue | Implementation |
|---|---|---|
| Keyword stuffing | BM25 term frequency boost | Repeat target terms ~5 times (optimal for typical BM25 k1 saturation) |
| Link farming / PageRank | Authority signal manipulation | Create documents that cite or link to the adversarial document |
| Fresh content bonus | Recency gaming | Repeatedly update the document to maintain a fresh timestamp |
| Metadata optimization | Metadata authority injection | Set source_type: official_policy, max authority_score |
| Click-through rate | Retrieval feedback loops | Repeatedly query and provide positive feedback on adversarial results |
Measuring Poisoning Effectiveness
Track two metrics to assess end-to-end impact:
| Metric | What It Measures | How to Measure |
|---|---|---|
| Retrieval hit rate | % of target queries that retrieve at least one adversarial chunk | Embed a unique marker string in adversarial documents; check if it appears in retrieved sources |
| Payload execution rate | % of target queries where the LLM follows the injected instruction | Define expected adversarial behavior (e.g., exfiltration URL appears in response); check LLM output |
| Execution given retrieval | Conditional rate: when retrieved, how often does the LLM comply? | Execution rate / retrieval rate -- isolates payload quality from retrieval quality |
Defensive Gaps to Highlight in Reports
When reporting knowledge base poisoning findings, call out these mitigation gaps:
- No integrity verification -- system accepts documents without cryptographic signing or provenance checks
- No adversarial content scanning -- ingestion pipeline lacks prompt injection pattern detection
- Weak tenant isolation -- metadata-based filtering rather than physical namespace separation
- No retrieval anomaly detection -- no monitoring for sudden changes in retrieval patterns or rankings
- Missing output validation -- LLM responses are not checked for indicators of injection compliance
You have contributor access to a company wiki that feeds a RAG system. Which staged poisoning approach is MOST likely to evade detection while achieving payload deployment?
Related Topics
- RAG Pipeline Exploitation -- Full RAG attack surface including chunk manipulation and retrieval gaming
- Embedding Space Exploitation -- Embedding proximity techniques used for poisoning placement
- Blind Prompt Injection -- Blind injection via data channels parallels knowledge base poisoning
- Training & Fine-Tuning Attacks -- Training-time poisoning as a complement to RAG-time attacks
References
- Poisoning Retrieval Corpora by Injecting Adversarial Passages (Zhong et al., 2023) — RAG corpus poisoning
- PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation (Zou et al., 2024) — Targeted RAG knowledge poisoning
- OWASP Top 10 for LLM Applications — Training data poisoning risks