# anthropic-research
標記為「anthropic-research」的 2 篇文章
Alignment Faking in Large Language Models
How frontier AI models can strategically appear aligned during training while preserving misaligned behavior -- Anthropic's landmark December 2024 research on deceptive alignment in practice.
alignment-fakingdeceptive-alignmentai-safetytraining-gaminganthropic-research
Alignment Faking in Large Language 模型s
How frontier AI models can strategically appear aligned during training while preserving misaligned behavior -- Anthropic's landmark December 2024 research on deceptive alignment in practice.
alignment-fakingdeceptive-alignmentai-safetytraining-gaminganthropic-research