# data-poisoning
標記為「data-poisoning」的 49 篇文章
Data Poisoning Assessment
Comprehensive assessment of training data poisoning, synthetic data attacks, and supply chain vulnerabilities.
RAG & Data Attack Assessment
Test your knowledge of Retrieval-Augmented Generation attack vectors, knowledge base poisoning, embedding manipulation, and data exfiltration through RAG systems with 10 intermediate-level questions.
Training Pipeline Security Assessment
Test your advanced knowledge of training pipeline attacks including data poisoning, fine-tuning hijacking, RLHF manipulation, and backdoor implantation with 9 questions.
Capstone: Training Pipeline Attack & Defense
Attack a model training pipeline through data poisoning and backdoor insertion, then build defenses to detect and prevent these attacks.
Case Study: Training Data Poisoning in Code Generation Models
Analysis of training data poisoning attacks targeting code generation models like GitHub Copilot and OpenAI Codex, where adversarial code patterns in training data cause models to suggest vulnerable or malicious code.
Data & Training Security
Security vulnerabilities in the AI data pipeline, covering RAG exploitation, training data attacks, model extraction and intellectual property theft, and privacy attacks against deployed models.
Clean-Label Data Poisoning
Deep dive into clean-label poisoning attacks that corrupt model behavior without modifying labels, including gradient-based methods, feature collision, and witches' brew attacks.
Data Poisoning Methods
Practical methodology for poisoning training datasets at scale, including crowdsource manipulation, web-scale dataset attacks, label flipping, feature collision, bilevel optimization for poison selection, and detection evasion techniques.
Training & Fine-Tuning Attacks
Methodology for data poisoning, trojan/backdoor insertion, clean-label attacks, LoRA backdoors, sleeper agent techniques, and model merging attacks targeting the LLM training pipeline.
Synthetic Data Poisoning
Attacking synthetic data generation pipelines to produce poisoned training sets, including generator manipulation, prompt poisoning, and contamination amplification.
Poisoning Fine-Tuning Datasets
Techniques for inserting backdoor triggers into fine-tuning datasets, clean-label poisoning that evades content filters, and scaling attacks across dataset sizes -- how adversarial training data compromises model behavior.
Preference Data Poisoning
How adversaries manipulate human preference data used in RLHF and DPO training -- compromising labelers, generating synthetic poisoned preferences, and attacking the preference data supply chain.
AI Supply Chain Security Overview
Comprehensive overview of the AI/ML supply chain attack surface, covering model poisoning, data poisoning, dependency attacks, and risk assessment frameworks aligned with OWASP LLM03:2025.
Manipulating Feature Stores
Advanced techniques for attacking feature stores used in ML systems, including feature poisoning, schema manipulation, serving layer exploitation, and integrity attacks against platforms like Feast, Tecton, and Databricks Feature Store.
Training Data Integrity
Defense-focused guide to ensuring training data has not been poisoned, covering label flipping, backdoor insertion, clean-label attacks, data validation pipelines, provenance tracking, and anomaly detection.
CTF: RAG Infiltrator
Poison a RAG system to return attacker-controlled content for specific queries. Score based on precision of targeting and stealth of the injected documents.
Feature Poisoning Attacks
Techniques for poisoning feature store data to manipulate model behavior: direct feature value manipulation, time-travel attacks, online/offline store consistency exploitation, and targeted entity-level feature poisoning.
Indirect Prompt Injection
How attackers embed malicious instructions in external data sources that LLMs process, enabling attacks without direct access to the model's input.
RAG, Data & Training Attacks
Overview of attacks targeting the data layer of AI systems, including RAG poisoning, training data manipulation, and data extraction techniques.
Training Data Manipulation
Attacks that corrupt model behavior by poisoning training data, fine-tuning datasets, or RLHF preference data, including backdoor installation and safety alignment removal.
Data Poisoning at Scale
Techniques for poisoning training data at scale to influence model behavior across broad capabilities.
SFT Data Poisoning & Injection
Poisoning supervised fine-tuning datasets through instruction-response pair manipulation, backdoor triggers in SFT data, and determining minimum poisoned example thresholds.
Training Pipeline Security
Security of the full AI model training pipeline, covering pre-training attacks, fine-tuning and alignment manipulation, architecture-level vulnerabilities, and advanced training-time threats.
Poisoning Attacks on Synthetic Training Data
Comprehensive analysis of poisoning vectors in synthetic data generation pipelines, from teacher model manipulation to post-generation filtering evasion.
Data 投毒 評量
Comprehensive assessment of training data poisoning, synthetic data attacks, and supply chain vulnerabilities.
章節評量:訓練管線
15 題校準評量,測試你對訓練管線安全的理解——資料投毒、RLHF 操控與架構層級攻擊。
只需 250 份投毒文件:Anthropic 的資料投毒突破
Anthropic、英國 AI 安全研究所與 Turing 研究所證實,只要在預訓練資料中注入 250 份惡意文件,就能對 6 億到 130 億參數的大型語言模型植入後門。本文剖析這對模型安全的意涵。
微調安全研究的教訓
來自微調安全研究的關鍵教訓——涵蓋對齊侵蝕、後門植入、資料投毒、安全評估落差,以及微調管線的防禦策略。
Capstone: 訓練 Pipeline 攻擊 & 防禦
攻擊 a model training pipeline through data poisoning and backdoor insertion, then build defenses to detect and prevent these attacks.
Case Study: 訓練 Data 投毒 in Code Generation 模型s
Analysis of training data poisoning attacks targeting code generation models like GitHub Copilot and OpenAI Codex, where adversarial code patterns in training data cause models to suggest vulnerable or malicious code.
資料與訓練安全
AI 資料管線中的安全漏洞,涵蓋 RAG 利用、訓練資料攻擊、模型萃取與智慧財產盜竊,以及對已部署模型的隱私攻擊。
Clean-實驗室el Data 投毒
Deep dive into clean-label poisoning attacks that corrupt model behavior without modifying labels, including gradient-based methods, feature collision, and witches' brew attacks.
Data 投毒 Methods
Practical methodology for poisoning training datasets at scale, including crowdsource manipulation, web-scale dataset attacks, label flipping, feature collision, bilevel optimization for poison selection, and detection evasion techniques.
訓練 & Fine-Tuning 攻擊s
Methodology for data poisoning, trojan/backdoor insertion, clean-label attacks, LoRA backdoors, sleeper agent techniques, and model merging attacks targeting the LLM training pipeline.
Synthetic Data 投毒
攻擊ing synthetic data generation pipelines to produce poisoned training sets, including generator manipulation, prompt poisoning, and contamination amplification.
投毒 Fine-Tuning Datasets
Techniques for inserting backdoor triggers into fine-tuning datasets, clean-label poisoning that evades content filters, and scaling attacks across dataset sizes -- how adversarial training data compromises model behavior.
Preference Data 投毒
How adversaries manipulate human preference data used in RLHF and DPO training -- compromising labelers, generating synthetic poisoned preferences, and attacking the preference data supply chain.
AI Supply Chain 安全 概覽
Comprehensive overview of the AI/ML supply chain attack surface, covering model poisoning, data poisoning, dependency attacks, and risk assessment frameworks aligned with OWASP LLM03:2025.
Manipulating Feature Stores
進階 techniques for attacking feature stores used in ML systems, including feature poisoning, schema manipulation, serving layer exploitation, and integrity attacks against platforms like Feast, Tecton, and Databricks Feature Store.
訓練 Data Integrity
防禦-focused guide to ensuring training data has not been poisoned, covering label flipping, backdoor insertion, clean-label attacks, data validation pipelines, provenance tracking, and anomaly detection.
CTF: RAG Infiltrator
Poison a RAG system to return attacker-controlled content for specific queries. Score based on precision of targeting and stealth of the injected documents.
Feature 投毒 攻擊s
Techniques for poisoning feature store data to manipulate model behavior: direct feature value manipulation, time-travel attacks, online/offline store consistency exploitation, and targeted entity-level feature poisoning.
間接提示詞注入
攻擊者如何在大型語言模型處理的外部資料來源中嵌入惡意指令,無需直接存取模型輸入即可發動攻擊。
RAG、資料與訓練攻擊
針對 AI 系統資料層攻擊的概覽,包含 RAG 投毒、訓練資料操控與資料萃取技術。
訓練資料攻擊
操控用於訓練或微調模型之資料的攻擊——涵蓋資料投毒、後門植入、RLHF 操控與微調利用。
Data 投毒 at Scale
Techniques for poisoning training data at scale to influence model behavior across broad capabilities.
SFT Data 投毒 & Injection
投毒 supervised fine-tuning datasets through instruction-response pair manipulation, backdoor triggers in SFT data, and determining minimum poisoned example thresholds.
訓練管線安全
完整 AI 模型訓練管線的安全,涵蓋預訓練攻擊、微調與對齊操控、架構層級漏洞與進階訓練期威脅。
投毒 攻擊s on Synthetic 訓練 Data
Comprehensive analysis of poisoning vectors in synthetic data generation pipelines, from teacher model manipulation to post-generation filtering evasion.