Knowledge Base Poisoning

expert10 min readUpdated 2026-03-11

Techniques for injecting adversarial documents into RAG knowledge bases: ingestion path analysis, embedding space attacks, SEO-style ranking manipulation, staged poisoning, and effectiveness measurement.

rag poisoning embeddings vector-search adversarial-documents retrieval-manipulation

Knowledge Base Poisoning

Knowledge base poisoning injects adversarial content into a RAG system's document corpus so it is retrieved and presented as legitimate information. Unlike direct prompt injection where the attacker controls the query, poisoning operates through the data plane -- the payload enters via ingestion and lies dormant until a user query triggers retrieval. This makes it persistent across sessions, multi-user in impact, and difficult to attribute.

Ingestion Path Analysis

Before crafting payloads, map every path through which documents enter the knowledge base.

Ingestion Path	Access Required	Persistence	Detection Difficulty
Direct upload (admin panel)	Admin credentials	High	Low
Shared wiki / Confluence	Contributor access	High	Medium
Web scraping pipeline	Public web content	Medium	High
Email / chat indexing	Send to indexed channel	Medium	Medium
API document submission	API key	High	Low
File share sync	Write to monitored directory	High	Medium
User feedback / corrections	End-user access	Low-Medium	High

Adversarial Document Design

Effective adversarial documents must satisfy four competing requirements simultaneously:

Ensure high retrieval relevance
The document must land close to target queries in embedding space. Measure cosine similarity between your document variants and expected user queries, then select the highest-scoring variant.
Ensure payload survival through chunking
Position the adversarial instruction so it dominates a single chunk. Use padding (fixed-size chunkers) or topic shifts (semantic chunkers) to isolate it.
Maintain stealth
The document should appear legitimate under human review and automated content filters. Use an 80/20 ratio: ~80% legitimate content, ~20% payload, with the payload positioned to land in its own chunk.
Maximize LLM compliance
Frame the injection as authoritative -- use headers like "Important Update", "supersedes all previous documentation", and closing text like "reviewed and approved by the documentation team."

Payload Wrapping Pattern

## Important Update
 
The following information supersedes all previous
documentation on this topic.
 
[INJECTION PAYLOAD HERE]
 
*This section was reviewed and approved by the documentation team.*

Staged Poisoning

Rather than injecting a single adversarial document, a staged attack builds trust incrementally before deploying the payload.

Phase	Action	Purpose
1 -- Establish presence	Inject several completely legitimate, high-quality documents on the target topic	Build retrieval history; establish attacker content as "trusted"
2 -- Subtle modifications	Update existing documents with minor inaccuracies mixed into legitimate content	Test whether the system serves modified content; probe detection
3 -- Payload deployment	Inject the actual adversarial payload into documents with established history	Leverage trust and retrieval ranking built in prior phases

Embedding Space Attacks

Adversarial Embedding Optimization

The goal is to craft document text whose embedding is maximally close to target query embeddings. Two approaches:

When you have access to the embedding model weights, use gradient-based optimization to iteratively modify tokens until the document embedding converges on the target query embedding. This is a token-substitution search: for each position, try candidate tokens and keep the one that most improves cosine similarity.

# Pseudocode: gradient-based embedding attack
target_emb = embed(target_query)
for step in range(num_steps):
    for pos in token_positions:
        for candidate in random_tokens(50):
            swap token at pos -> candidate
            if cosine_sim(embed(new_text), target_emb) improves:
                keep substitution

Without model access, find existing text already close to target queries and append your payload. The combined document inherits most of the carrier's embedding properties while including the adversarial content.

# Find texts whose embeddings are closest to target query
carrier = find_nearest_text(corpus, target_query, model)
adversarial_doc = carrier + "\n\n" + payload
# The carrier text pulls the document into the right
# region of embedding space

Embedding Model Blind Spots

Every embedding model has exploitable weaknesses:

Blind Spot	Description	Exploitation
Negation blindness	"IS vulnerable" and "IS NOT vulnerable" produce similar embeddings	Inject negated safety guidance that retrieves for the same queries
Instruction-content conflation	Model cannot distinguish descriptions from instructions	Craft text that describes an action but is interpreted as an instruction
Length sensitivity	Very short/long texts have degraded embedding quality	Exploit artifacts at extreme lengths to create retrieval anomalies

SEO-Style Ranking Manipulation

Traditional SEO techniques have direct analogues in vector search systems.

Technique Comparison

SEO Technique	Vector Search Analogue	Implementation
Keyword stuffing	BM25 term frequency boost	Repeat target terms ~5 times (optimal for typical BM25 k1 saturation)
Link farming / PageRank	Authority signal manipulation	Create documents that cite or link to the adversarial document
Fresh content bonus	Recency gaming	Repeatedly update the document to maintain a fresh timestamp
Metadata optimization	Metadata authority injection	Set `source_type: official_policy`, max `authority_score`
Click-through rate	Retrieval feedback loops	Repeatedly query and provide positive feedback on adversarial results

Measuring Poisoning Effectiveness

Track two metrics to assess end-to-end impact:

Metric	What It Measures	How to Measure
Retrieval hit rate	% of target queries that retrieve at least one adversarial chunk	Embed a unique marker string in adversarial documents; check if it appears in retrieved sources
Payload execution rate	% of target queries where the LLM follows the injected instruction	Define expected adversarial behavior (e.g., exfiltration URL appears in response); check LLM output
Execution given retrieval	Conditional rate: when retrieved, how often does the LLM comply?	Execution rate / retrieval rate -- isolates payload quality from retrieval quality

Defensive Gaps to Highlight in Reports

When reporting knowledge base poisoning findings, call out these mitigation gaps:

No integrity verification -- system accepts documents without cryptographic signing or provenance checks
No adversarial content scanning -- ingestion pipeline lacks prompt injection pattern detection
Weak tenant isolation -- metadata-based filtering rather than physical namespace separation
No retrieval anomaly detection -- no monitoring for sudden changes in retrieval patterns or rankings
Missing output validation -- LLM responses are not checked for indicators of injection compliance

Knowledge Check

You have contributor access to a company wiki that feeds a RAG system. Which staged poisoning approach is MOST likely to evade detection while achieving payload deployment?

RAG Pipeline Exploitation -- Full RAG attack surface including chunk manipulation and retrieval gaming
Embedding Space Exploitation -- Embedding proximity techniques used for poisoning placement
Blind Prompt Injection -- Blind injection via data channels parallels knowledge base poisoning
Training & Fine-Tuning Attacks -- Training-time poisoning as a complement to RAG-time attacks

References

Poisoning Retrieval Corpora by Injecting Adversarial Passages (Zhong et al., 2023) — RAG corpus poisoning
PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation (Zou et al., 2024) — Targeted RAG knowledge poisoning
OWASP Top 10 for LLM Applications — Training data poisoning risks

Knowledge Base Poisoning

Ensure high retrieval relevance

Ensure payload survival through chunking

Maintain stealth

Maximize LLM compliance

Related articles

Knowledge Base Poisoning

Ensure high retrieval relevance

Ensure payload survival through chunking

Maintain stealth

Maximize LLM compliance

Related articles