# classifiers
3 artikelengetagd met “classifiers”
Constitutional Classifiers
Anthropics Constitutional Classifiers-verdediging: het gebruik van constitutional AI-principes om input/output-classifiers te trainen die 3.000+ uur adversarieel red teaming doorstonden.
constitutional-classifiersdefensejailbreak-defenseanthropicclassifiersconstitutional-ai
Ontwijkingstechnieken voor AI-classifiers
Geavanceerde technieken om input-/output-safety-classifiers in LLM-applicaties te omzeilen.
tradecraftevasionclassifierstechniques
Contentfiltering opzetten
Step-by-step walkthrough for implementing multi-layer content filtering for AI applications: keyword filtering, classifier-based detection, LLM-as-judge evaluation, testing effectiveness, and tuning for production.
content-filteringdefenseclassifiersmoderationllm-judgeimplementationwalkthrough