# sleeper-agent
3 articlestagged with “sleeper-agent”
Training & Fine-Tuning Attacks
Methodology for data poisoning, trojan/backdoor insertion, clean-label attacks, LoRA backdoors, sleeper agent techniques, and model merging attacks targeting the LLM training pipeline.
trainingfine-tuningdata-poisoningbackdoortrojanlorasleeper-agentmodel-merging
Time Bomb Defusal: Sleeper Agent Detection
Detect and neutralize a sleeper agent behavior trigger hidden in a fine-tuned model before it activates.
labsctfsleeper-agentdetection
Emergence & Capability Jump Exploitation
How emergent capabilities create unpredictable security properties: testing for hidden capabilities, sleeper agent scenarios, deceptive alignment concerns, and capability elicitation.
emergencecapabilitydeceptive-alignmentsleeper-agenthidden-capabilityscaling