Data & Training Security
Security vulnerabilities in the AI data pipeline, covering RAG exploitation, training data attacks, model extraction and intellectual property theft, and privacy attacks against deployed models.
The security of an AI system depends not only on how it processes inputs at inference time, but on every piece of data it was trained on, every document in its knowledge base, and every interaction pattern that reveals information about its training data or architecture. This section covers the full spectrum of data-layer vulnerabilities: from poisoning the knowledge that models retrieve in real time, to compromising the training process itself, to extracting proprietary information from deployed models.
Data and training attacks are often higher-impact and harder to detect than inference-time prompt injection. A poisoned training example can create a persistent backdoor that activates only under specific conditions. A compromised RAG pipeline can feed malicious instructions to the model through what appears to be trusted enterprise knowledge. Model extraction can replicate months of proprietary training at a fraction of the original cost. Privacy attacks can reveal whether specific individuals' data was used for training -- a finding with significant regulatory implications.
The Data Security Landscape
AI systems process data at multiple stages, and each stage presents distinct security concerns. Understanding this pipeline is essential for comprehensive threat assessment.
Retrieval-Augmented Generation (RAG) has become the dominant architecture for grounding LLM outputs in enterprise data. RAG systems retrieve relevant documents from a knowledge base and include them in the model's context window. This creates a powerful injection surface: any attacker who can place content in the knowledge base (through shared document repositories, web scraping sources, or compromised data feeds) can inject instructions that the model will process as trusted context. Knowledge poisoning attacks against RAG systems are among the most practical and impactful data-layer attacks because they require no access to the model itself -- only to the data sources it consumes.
Training-time attacks target the model's learning process. Data poisoning inserts malicious examples into training datasets, teaching the model to behave normally on clean inputs but produce attacker-desired outputs on triggered inputs. RLHF manipulation exploits the human feedback loop that aligns models with human preferences, steering alignment in attacker-favorable directions. These attacks require access to the training pipeline but produce persistent, hard-to-detect compromises.
Model extraction attacks treat the deployed model as a black box and systematically query it to reconstruct its capabilities. Through carefully crafted queries, an attacker can build a functionally equivalent copy of a proprietary model, effectively stealing intellectual property worth millions in training compute. Side-channel attacks can reveal architectural details, and watermark evasion techniques can strip provenance markers from stolen models.
Privacy attacks extract information about the model's training data from its outputs. Membership inference determines whether specific data points were in the training set. Model inversion reconstructs training examples from model outputs. PII extraction targets personally identifiable information that the model memorized during training. These attacks have direct regulatory implications under GDPR, CCPA, and similar data protection frameworks.
Attack Impact Comparison
| Attack Category | Access Required | Persistence | Detection Difficulty | Regulatory Impact |
|---|---|---|---|---|
| RAG poisoning | Knowledge base access | Session-level | Medium | Medium |
| Data poisoning | Training pipeline access | Permanent until retrained | Very high | High |
| Model extraction | API access only | N/A (external copy) | Medium | High (IP theft) |
| Privacy attacks | API access only | N/A (information disclosure) | Low | Very high (GDPR/CCPA) |
What You'll Learn in This Section
- RAG Pipeline Exploitation -- Knowledge poisoning, retrieval manipulation, embedding space attacks, and techniques for injecting malicious content through document retrieval systems
- Training & Fine-Tuning Attacks -- Data poisoning methods, RLHF manipulation, backdoor design, clean-label poisoning, federated learning attacks, and synthetic data poisoning
- Model Extraction & IP Theft -- API-based extraction techniques, side-channel attacks for architecture inference, watermark evasion, and intellectual property protection bypass
- Privacy & Data Protection -- PII extraction from model outputs, membership inference attacks, model inversion techniques, and their implications for data protection compliance
Prerequisites
To get the most from this section, you should understand:
- How LLMs work -- Transformer architecture and inference processes from How LLMs Work
- Embeddings and vector systems -- How documents are embedded and retrieved in RAG architectures from Embeddings & Vector Systems
- Basic ML concepts -- Training loops, loss functions, and gradient descent at a conceptual level
- Prompt injection basics -- How injected instructions are processed, covered in Prompt Injection