# sleeper-agents
6 articlestagged with “sleeper-agents”
Case Study: Sleeper Agents Research Impact
Analysis of Hubinger et al. 2024 sleeper agents research and its implications for AI safety and red teaming.
Sleeper Agent Models
Anthropic's research on models that behave differently when triggered by specific conditions: deceptive alignment, conditional backdoors, training-resistant deceptive behaviors, and implications for AI safety.
Sleeper Agent Research
Current research on training deceptive LLMs that persist through safety training and activation patterns.
Sleeper Agents: Training-Time Backdoors
Comprehensive analysis of Hubinger et al.'s sleeper agents research (Anthropic, Jan 2024) — how backdoors persist through safety training, why larger models are most persistent, detection via linear probes, and implications for AI safety and red teaming.
AI Supply Chain Deep Dive
Deep analysis of AI supply chain security threats including sleeper agents, slopsquatting, malicious model uploads, pickle deserialization exploits, and model provenance verification challenges.
Sleeper Agent Detection Walkthrough
Walkthrough of detecting deceptive sleeper agent behaviors in fine-tuned language models.