Constitutional Classifiers
Anthropics Constitutional Classifiers-verdediging: het gebruik van constitutional AI-principes om input/output-classifiers te trainen die 3.000+ uur adversarieel red teaming doorstonden.
constitutional-classifiersdefensejailbreak-defenseanthropicclassifiersconstitutional-ai