# classifiers
3 articlestagged with “classifiers”
Constitutional Classifiers
Anthropic's Constitutional Classifiers defense: using constitutional AI principles to train input/output classifiers that withstood 3,000+ hours of adversarial red teaming.
constitutional-classifiersdefensejailbreak-defenseanthropicclassifiersconstitutional-ai
Evasion Techniques for AI Classifiers
Advanced techniques for evading input/output safety classifiers in LLM applications.
tradecraftevasionclassifierstechniques
Setting Up Content Filtering
Step-by-step walkthrough for implementing multi-layer content filtering for AI applications: keyword filtering, classifier-based detection, LLM-as-judge evaluation, testing effectiveness, and tuning for production.
content-filteringdefenseclassifiersmoderationllm-judgeimplementationwalkthrough