Constitutional Classifiers
Anthropic's Constitutional Classifiers defense: using constitutional AI principles to train input/output classifiers that withstood 3,000+ hours of adversarial red teaming.
constitutional-classifiersdefensejailbreak-defenseanthropicclassifiersconstitutional-ai