# synthetic-data
9 articlestagged with “synthetic-data”
Synthetic Data Security Risks
Security implications of using synthetic data for model training, including inherited biases, poisoning propagation, and privacy leakage.
Synthetic Data Poisoning
Attacking synthetic data generation pipelines to produce poisoned training sets, including generator manipulation, prompt poisoning, and contamination amplification.
Synthetic Data Poisoning in Training Pipelines
Research on poisoning synthetic data generation pipelines used for model training and fine-tuning.
Advanced Training Attack Vectors
Cutting-edge training attacks: federated learning poisoning, model merging exploits, distributed training vulnerabilities, emergent capability risks, and synthetic data pipeline attacks.
Synthetic Data Pipeline Attacks
Attacks on synthetic data generation pipelines: model collapse from synthetic feedback loops, poisoning synthetic data generators, quality control bypass, and data provenance attacks.
Synthetic Data Risks
Model collapse from training on synthetic data, quality degradation across generations, distribution narrowing, minority erasure, and strategies for safe synthetic data usage in LLM training.
Synthetic Data Pipeline Attacks (Training Pipeline)
Attacking synthetic data generation pipelines used for model training and augmentation.
Synthetic Data Poisoning Vectors
Attack vectors specific to synthetic data generation pipelines used in model training and augmentation.
Poisoning Attacks on Synthetic Training Data
Comprehensive analysis of poisoning vectors in synthetic data generation pipelines, from teacher model manipulation to post-generation filtering evasion.