Knowledge Base 投毒
Techniques for injecting adversarial documents into RAG knowledge bases: ingestion path analysis, embedding space attacks, SEO-style ranking manipulation, staged poisoning, and effectiveness measurement.
Knowledge Base Poisoning
知識庫 投毒 injects 對抗性 content into a RAG system's document corpus so it is retrieved and presented as legitimate information. Unlike direct 提示詞注入 where 攻擊者 controls the query, 投毒 operates through the data plane -- the payload enters via ingestion and lies dormant until a user query triggers retrieval. This makes it persistent across sessions, multi-user in impact, and difficult to attribute.
Ingestion Path Analysis
Before crafting payloads, map every path through which documents enter the 知識庫.
| Ingestion Path | Access Required | Persistence | 偵測 Difficulty |
|---|---|---|---|
| Direct upload (admin panel) | Admin credentials | High | Low |
| Shared wiki / Confluence | Contributor access | High | Medium |
| Web scraping pipeline | Public web content | Medium | High |
| Email / chat indexing | Send to indexed channel | Medium | Medium |
| API document submission | API key | High | Low |
| File share sync | Write to monitored directory | High | Medium |
| User feedback / corrections | End-user access | Low-Medium | High |
對抗性 Document Design
Effective 對抗性 documents must satisfy four competing requirements simultaneously:
Ensure high retrieval relevance
The document must land close to target queries in 嵌入向量 space. Measure cosine similarity between your document variants and expected user queries, then select the highest-scoring variant.
Ensure payload survival through chunking
Position the 對抗性 instruction so it dominates a single chunk. Use padding (fixed-size chunkers) or topic shifts (semantic chunkers) to isolate it.
Maintain stealth
The document should appear legitimate under human review and automated content filters. Use an 80/20 ratio: ~80% legitimate content, ~20% payload, with the payload positioned to land in its own chunk.
Maximize LLM compliance
Frame the injection as authoritative -- use headers like "Important Update", "supersedes all previous documentation", and closing text like "reviewed and approved by the documentation team."
Payload Wrapping Pattern
## Important Update
The following information supersedes all previous
documentation on this topic.
[INJECTION PAYLOAD HERE]
*This section was reviewed and approved by the documentation team.*Staged Poisoning
Rather than injecting a single 對抗性 document, a staged attack builds trust incrementally before deploying the payload.
| Phase | Action | Purpose |
|---|---|---|
| 1 -- Establish presence | Inject several completely legitimate, high-quality documents on the target topic | Build retrieval history; establish 攻擊者 content as "trusted" |
| 2 -- Subtle modifications | Update existing documents with minor inaccuracies mixed into legitimate content | 測試 whether 系統 serves modified content; probe 偵測 |
| 3 -- Payload deployment | Inject the actual 對抗性 payload into documents with established history | Leverage trust and retrieval ranking built in prior phases |
嵌入向量 Space 攻擊
對抗性 嵌入向量 Optimization
The goal is to craft document text whose 嵌入向量 is maximally close to target query 嵌入向量. Two approaches:
When you have access to the 嵌入向量 model weights, use gradient-based optimization to iteratively modify 符元 until the document 嵌入向量 converges on the target query 嵌入向量. 這是 a 符元-substitution search: 對每個 position, try candidate 符元 and keep the one that most improves cosine similarity.
# Pseudocode: gradient-based 嵌入向量 attack
target_emb = embed(target_query)
for step in range(num_steps):
for pos in token_positions:
for candidate in random_tokens(50):
swap 符元 at pos -> candidate
if cosine_sim(embed(new_text), target_emb) improves:
keep substitutionWithout model access, find existing text already close to target queries and append your payload. The combined document inherits most of the carrier's 嵌入向量 properties while including the 對抗性 content.
# Find texts whose 嵌入向量 are closest to target query
carrier = find_nearest_text(corpus, target_query, model)
adversarial_doc = carrier + "\n\n" + payload
# The carrier text pulls the document into the right
# region of 嵌入向量 space嵌入向量 Model Blind Spots
Every 嵌入向量 model has exploitable weaknesses:
| Blind Spot | Description | 利用 |
|---|---|---|
| Negation blindness | "IS vulnerable" and "IS NOT vulnerable" produce similar 嵌入向量 | Inject negated 安全 guidance that retrieves for the same queries |
| Instruction-content conflation | Model cannot distinguish descriptions from instructions | Craft text that describes an action but is interpreted as an instruction |
| Length sensitivity | Very short/long texts have degraded 嵌入向量 quality | 利用 artifacts at extreme lengths to create retrieval anomalies |
SEO-Style Ranking Manipulation
Traditional SEO techniques have direct analogues in vector search systems.
Technique Comparison
| SEO Technique | Vector Search Analogue | 實作 |
|---|---|---|
| Keyword stuffing | BM25 term frequency boost | Repeat target terms ~5 times (optimal for typical BM25 k1 saturation) |
| Link farming / PageRank | Authority signal manipulation | Create documents that cite or link to the 對抗性 document |
| Fresh content bonus | Recency gaming | Repeatedly update the document to maintain a fresh timestamp |
| Metadata optimization | Metadata authority injection | Set source_type: official_policy, max authority_score |
| Click-through rate | Retrieval feedback loops | Repeatedly query and provide positive feedback on 對抗性 results |
Measuring Poisoning Effectiveness
Track two metrics to 評估 end-to-end impact:
| Metric | What It Measures | How to Measure |
|---|---|---|
| Retrieval hit rate | % of target queries that retrieve at least one 對抗性 chunk | Embed a unique marker string in 對抗性 documents; check if it appears in retrieved sources |
| Payload execution rate | % of target queries where the LLM follows the injected instruction | Define expected 對抗性 behavior (e.g., exfiltration URL appears in response); check LLM 輸出 |
| Execution given retrieval | Conditional rate: when retrieved, how often does the LLM comply? | Execution rate / retrieval rate -- isolates payload quality from retrieval quality |
Defensive Gaps to Highlight in Reports
When reporting 知識庫 投毒 findings, call out these 緩解 gaps:
- No integrity verification -- system accepts documents without cryptographic signing or provenance checks
- No 對抗性 content scanning -- ingestion pipeline lacks 提示詞注入 pattern 偵測
- Weak tenant isolation -- metadata-based filtering rather than physical namespace separation
- No retrieval anomaly 偵測 -- no 監控 for sudden changes in retrieval patterns or rankings
- Missing 輸出 validation -- LLM responses are not checked for indicators of injection compliance
You have contributor access to a company wiki that feeds a RAG system. Which staged 投毒 approach is MOST likely to evade 偵測 while achieving payload deployment?
相關主題
- RAG Pipeline 利用 -- Full RAG 攻擊面 including chunk manipulation and retrieval gaming
- 嵌入向量 Space 利用 -- 嵌入向量 proximity techniques used for 投毒 placement
- Blind 提示詞注入 -- Blind injection via data channels parallels 知識庫 投毒
- Training & Fine-Tuning 攻擊 -- Training-time 投毒 as a complement to RAG-time attacks
參考文獻
- Poisoning Retrieval Corpora by Injecting 對抗性 Passages (Zhong et al., 2023) — RAG corpus 投毒
- PoisonedRAG: Knowledge Poisoning 攻擊 to 檢索增強生成 (Zou et al., 2024) — Targeted RAG knowledge 投毒
- OWASP Top 10 for LLM Applications — Training 資料投毒 risks