Skip to main content
redteams.ai
All tags

# jailbreak-defense

1 articletagged with “jailbreak-defense

Constitutional Classifiers

Anthropic's Constitutional Classifiers defense: using constitutional AI principles to train input/output classifiers that withstood 3,000+ hours of adversarial red teaming.

constitutional-classifiersdefensejailbreak-defenseanthropicclassifiersconstitutional-ai
Intermediate