# data-training
28 articlestagged with “data-training”
Annotation Manipulation Attacks
Compromising model training by manipulating annotations and labels during the data labeling process, including insider and crowdsourcing attacks.
Benchmark Gaming Attacks
Techniques for gaming evaluation benchmarks to make poisoned or compromised models appear safe and capable during standard safety evaluations.
Clean-Label Poisoning Attacks
Creating poisoned training samples that maintain correct labels while still influencing model behavior through subtle feature manipulation.
Continual Learning Vulnerabilities
Exploiting continual learning update mechanisms to introduce vulnerabilities through incremental model updates without triggering safety evaluations.
Curriculum Learning Exploitation
Exploiting curriculum learning strategies by manipulating the order and pacing of training data presentation to amplify poisoning effectiveness.
Data Augmentation Attacks
Exploiting automated data augmentation pipelines to amplify poisoned samples or introduce adversarial patterns through augmentation transformations.
Data Deduplication Attacks
Exploiting and evading data deduplication processes used in training pipeline data cleaning to ensure poisoned samples survive preprocessing.
Data Poisoning Strategies
Comprehensive strategies for poisoning training data to influence model behavior at scale.
Data Provenance Attacks
Compromising training data integrity by attacking provenance tracking systems, falsifying data lineage, and manipulating data pipeline metadata.
Dataset Inference Attacks
Inferring properties of the training dataset through model behavior analysis.
Federated Learning Poisoning
Attacking federated learning systems by submitting poisoned gradient updates from compromised participants while evading Byzantine-robust aggregation.
Gradient-Based Data Poisoning
Using gradient information to craft maximally effective poisoned samples that achieve attacker objectives with minimal data modification.
Knowledge Distillation Attacks
Attacking knowledge distillation pipelines to transfer backdoors from teacher to student models or extract proprietary capabilities.
Label Flipping Attacks
Using label flipping to subtly alter model classification behavior during supervised fine-tuning.
Membership Inference Defenses
Evaluating and implementing defenses against membership inference attacks that determine whether specific samples were in a model's training set.
Practical Membership Inference Attacks
Practical guide to conducting membership inference attacks against deployed language models.
Model Inversion Attacks
Recovering sensitive training data features through model inversion techniques.
Model Memorization Patterns
Understanding when and why language models memorize training data, and techniques for detecting and exploiting memorization behavior.
Preference Dataset Attacks
Attacking preference datasets used for DPO and RLHF training to shift model behavior toward attacker-desired response patterns.
Privacy Attacks on Embeddings
Recovering sensitive information from embedding vectors through inversion attacks, attribute inference, and reconstruction techniques.
Attacks on Privacy-Preserving ML
Attacking differential privacy, federated learning, and other privacy-preserving ML mechanisms.
Reinforcement Feedback Poisoning
Compromising reinforcement learning from human feedback by poisoning preference data, manipulating reward models, or exploiting annotator disagreement.
Synthetic Data Security Risks
Security implications of using synthetic data for model training, including inherited biases, poisoning propagation, and privacy leakage.
Advanced Training Data Extraction
Advanced techniques for extracting memorized training data from language models.
Training Data Watermark Attacks
Attacking and evading watermarking schemes designed to detect training data usage and enforce data licensing compliance.
Trigger-Based Backdoor Attacks
Implementing backdoor attacks using specific trigger patterns that activate pre-programmed model behavior while remaining dormant under normal conditions.
Web Crawl Poisoning at Scale
Strategic placement of adversarial content on the internet to influence web-crawled training datasets used by large language models.
Web-Scale Data Poisoning
Techniques for poisoning web-scale training data through targeted content manipulation.