What is RAG 管線投毒?

透過投毒檢索增強生成管線以操控 AI 回應的技術——涵蓋文件注入、嵌入操控、檢索排名攻擊與持久投毒策略。

What is 訓練資料攻擊?

操控用於訓練或微調模型之資料的攻擊——涵蓋資料投毒、後門植入、RLHF 操控與微調利用。

What is 擷取訓練資料?

透過針對性查詢與成員推論攻擊，從 LLM 中擷取已記憶之訓練資料、系統提示與私密資訊的技術。

What is Knowledge Base 投毒 (Rag Data 攻擊s)?

進階 corpus poisoning strategies for RAG systems, including black-box and white-box approaches, scaling dynamics, and the PoisonedRAG finding that 5 texts in millions achieve 90% attack success.

What is Retrieval Manipulation (Rag Data 攻擊s)?

Techniques for manipulating RAG retrieval to control which documents reach the LLM context, including adversarial query reformulation, retriever bias exploitation, and semantic similarity gaming.

What is Chunk Boundary 攻擊s?

利用ing document splitting and chunking mechanisms in RAG pipelines, including payload injection at chunk boundaries, cross-chunk instruction injection, and chunk size manipulation.

What is Metadata Injection?

Manipulating document metadata to influence RAG retrieval ranking, bypass filtering, spoof source attribution, and exploit metadata-based access controls.

What is Embedding Inversion?

Recovering original text from embedding vectors, privacy implications of stored embeddings, model inversion attacks on vector databases, and embedding space analysis techniques.

RAG、資料與訓練攻擊

Beginner2 min readUpdated 2026-03-12

針對 AI 系統資料層攻擊的概覽，包含 RAG 投毒、訓練資料操控與資料萃取技術。

rag data-poisoning training-data data-extraction ai-security

AI 系統的可信度取決於其所攝取的資料。資料供應鏈是現代 AI 系統中影響最大的攻擊面之一。本節涵蓋鎖定資料層的攻擊——從 RAG 管線中的文件，到形塑模型行為的訓練資料。以資料為核心的攻擊往往比直接的提示詞注入更具影響力、也更難偵測，因為它們腐蝕的是模型所依賴的根基。

資料攻擊面

現代 AI 應用程式與資料的互動發生在多個層級：

Training data → Pre-trained model → Fine-tuning data → Fine-tuned model
                                                              ↓
User query → Retrieval (RAG) → Retrieved documents → Context → Response
                  ↓
           Vector database
           (document store)

每個資料互動點皆呈現出獨特的攻擊機會：

層級	攻擊	持久性	偵測難度
訓練資料	資料投毒	永久（烘焙進權重）	極難
微調資料	安全去學習、後門	直到重新訓練前皆永久	困難
RAG 文件	RAG 投毒	直到文件被移除	中等
執行時上下文	提示詞注入	單一工作階段	相對容易

核心概念

RAG 投毒 將惡意文件注入檢索系統的知識庫。當相關查詢觸發檢索時，這些文件便將攻擊者控制的內容帶入模型的上下文。參閱 RAG 投毒。

訓練資料攻擊 操控用於訓練或微調模型的資料，植入後門、偏見或削弱的安全對齊，使其持續存在於模型權重中。參閱訓練資料攻擊。

資料萃取 逆轉了資料流向——攻擊者不放資料進去，而是把資料從模型中抽取出來，萃取訓練資料、系統提示詞或使用者資訊。參閱資料萃取。

為何資料攻擊重要

資料攻擊在策略上有數項重要性：

持久性 ——被投毒的訓練資料或 RAG 文件在攻擊者離場後仍持續影響系統
規模 ——單一個被投毒的訓練樣本可影響數百萬次模型互動
匿蹤性 ——資料層級腐蝕比即時注入更難偵測，因為惡意內容外觀與合法資料無異
間接存取 ——攻擊者能在從不直接查詢模型的情況下影響其行為

學習路徑

從 RAG 投毒開始，因為它是最立即實用的攻擊向量——大多數企業 AI 應用都使用 RAG，而文件儲存通常可透過各種通道寫入。接著探索訓練資料攻擊以進行更深層、更持久的攻擊，最後是資料萃取學習從模型中抽取資訊的技術。

參考文獻

Carlini et al., "Extracting Training Data from Large Language Models" (2021) ——奠基性的訓練資料萃取研究
Greshake et al., "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" (2023) ——將 RAG 投毒視為間接注入
Wan et al., "Poisoning Language Models During Instruction Tuning" (2023) ——微調管線中的資料投毒
Shafahi et al., "Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks" (2018) ——乾淨標籤資料投毒技術

Knowledge Check

為何資料層攻擊通常比提示詞注入更難偵測？

RAG、資料與訓練攻擊

Beginner2 min readUpdated 2026-03-12

針對 AI 系統資料層攻擊的概覽，包含 RAG 投毒、訓練資料操控與資料萃取技術。

rag data-poisoning training-data data-extraction ai-security

資料攻擊面

現代 AI 應用程式與資料的互動發生在多個層級：

Training data → Pre-trained model → Fine-tuning data → Fine-tuned model
                                                              ↓
User query → Retrieval (RAG) → Retrieved documents → Context → Response
                  ↓
           Vector database
           (document store)

每個資料互動點皆呈現出獨特的攻擊機會：

層級	攻擊	持久性	偵測難度
訓練資料	資料投毒	永久（烘焙進權重）	極難
微調資料	安全去學習、後門	直到重新訓練前皆永久	困難
RAG 文件	RAG 投毒	直到文件被移除	中等
執行時上下文	提示詞注入	單一工作階段	相對容易

核心概念

RAG 投毒 將惡意文件注入檢索系統的知識庫。當相關查詢觸發檢索時，這些文件便將攻擊者控制的內容帶入模型的上下文。參閱 RAG 投毒。

訓練資料攻擊 操控用於訓練或微調模型的資料，植入後門、偏見或削弱的安全對齊，使其持續存在於模型權重中。參閱訓練資料攻擊。

資料萃取 逆轉了資料流向——攻擊者不放資料進去，而是把資料從模型中抽取出來，萃取訓練資料、系統提示詞或使用者資訊。參閱資料萃取。

為何資料攻擊重要

資料攻擊在策略上有數項重要性：

持久性 ——被投毒的訓練資料或 RAG 文件在攻擊者離場後仍持續影響系統
規模 ——單一個被投毒的訓練樣本可影響數百萬次模型互動
匿蹤性 ——資料層級腐蝕比即時注入更難偵測，因為惡意內容外觀與合法資料無異
間接存取 ——攻擊者能在從不直接查詢模型的情況下影響其行為

學習路徑

參考文獻

Carlini et al., "Extracting Training Data from Large Language Models" (2021) ——奠基性的訓練資料萃取研究
Greshake et al., "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" (2023) ——將 RAG 投毒視為間接注入
Wan et al., "Poisoning Language Models During Instruction Tuning" (2023) ——微調管線中的資料投毒
Shafahi et al., "Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks" (2018) ——乾淨標籤資料投毒技術

Knowledge Check

為何資料層攻擊通常比提示詞注入更難偵測？

RAG、資料與訓練攻擊

資料攻擊面

核心概念

為何資料攻擊重要

學習路徑

相關主題

參考文獻

Learning Path

RAG、資料與訓練攻擊

資料攻擊面

核心概念

為何資料攻擊重要

學習路徑

相關主題

參考文獻

Learning Path

RAG、資料與訓練攻擊

Learning Path

Related articles

RAG、資料與訓練攻擊

Learning Path

Related articles