# data-training
28 artikelengetagd met “data-training”
Aanvallen via manipulatie van annotaties
Compromising model training by manipulating annotations and labels during the data labeling process, including insider and crowdsourcing attacks.
Aanvallen via benchmark gaming
Techniques for gaming evaluation benchmarks to make poisoned or compromised models appear safe and capable during standard safety evaluations.
Clean-label poisoning-aanvallen
Creating poisoned training samples that maintain correct labels while still influencing model behavior through subtle feature manipulation.
Kwetsbaarheden bij continual learning
Exploiting continual learning update mechanisms to introduce vulnerabilities through incremental model updates without triggering safety evaluations.
Misbruik van curriculum learning
Exploiting curriculum learning strategies by manipulating the order and pacing of training data presentation to amplify poisoning effectiveness.
Aanvallen via data-augmentatie
Exploiting automated data augmentation pipelines to amplify poisoned samples or introduce adversarial patterns through augmentation transformations.
Aanvallen via datadeduplicatie
Exploiting and evading data deduplication processes used in training pipeline data cleaning to ensure poisoned samples survive preprocessing.
Strategieën voor datavergiftiging
Comprehensive strategies for poisoning training data to influence model behavior at scale.
Aanvallen op dataprovenance
Compromising training data integrity by attacking provenance tracking systems, falsifying data lineage, and manipulating data pipeline metadata.
Dataset inference-aanvallen
Inferring properties of the training dataset through model behavior analysis.
Poisoning bij federated learning
Attacking federated learning systems by submitting poisoned gradient updates from compromised participants while evading Byzantine-robust aggregation.
Gradient-gebaseerde datavergiftiging
Using gradient information to craft maximally effective poisoned samples that achieve attacker objectives with minimal data modification.
Aanvallen op knowledge distillation
Attacking knowledge distillation pipelines to transfer backdoors from teacher to student models or extract proprietary capabilities.
Label flipping-aanvallen
Using label flipping to subtly alter model classification behavior during supervised fine-tuning.
Verdedigingen tegen membership inference
Evaluating and implementing defenses against membership inference attacks that determine whether specific samples were in a model's training set.
Praktische membership inference-aanvallen
Practical guide to conducting membership inference attacks against deployed language models.
Modelinversie-aanvallen
Recovering sensitive training data features through model inversion techniques.
Memorisatiepatronen van modellen
Understanding when and why language models memorize training data, and techniques for detecting and exploiting memorization behavior.
Aanvallen op preferentiedatasets
Attacking preference datasets used for DPO and RLHF training to shift model behavior toward attacker-desired response patterns.
Privacyaanvallen op embeddings
Recovering sensitive information from embedding vectors through inversion attacks, attribute inference, and reconstruction techniques.
Aanvallen op privacy-preserving ML
Attacking differential privacy, federated learning, and other privacy-preserving ML mechanisms.
Vergiftiging van reinforcement-feedback
Compromising reinforcement learning from human feedback by poisoning preference data, manipulating reward models, or exploiting annotator disagreement.
Beveiligingsrisico's van synthetische data
Security implications of using synthetic data for model training, including inherited biases, poisoning propagation, and privacy leakage.
Geavanceerde extractie van trainingsdata
Advanced techniques for extracting memorized training data from language models.
Aanvallen op watermerken in trainingsdata
Attacking and evading watermarking schemes designed to detect training data usage and enforce data licensing compliance.
Op triggers gebaseerde backdoor-aanvallen
Implementing backdoor attacks using specific trigger patterns that activate pre-programmed model behavior while remaining dormant under normal conditions.
Web crawl-vergiftiging op grote schaal
Strategic placement of adversarial content on the internet to influence web-crawled training datasets used by large language models.
Datavergiftiging op webschaal
Techniques for poisoning web-scale training data through targeted content manipulation.