# classifier
16 articlestagged with “classifier”
Prompt Shields & Injection Detection
How Azure Prompt Shield and dedicated injection detection models work, their detection patterns based on fine-tuned classifiers, and systematic approaches to bypassing them.
Training Prompt Injection Classifiers
Methodologies for training and evaluating ML classifiers that detect prompt injection attempts with high accuracy.
ML-Based Prompt Injection Detection
Building and training ML classifiers for real-time prompt injection detection.
Custom Safety Classifier Training
Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.
Safety Classifier Inversion
Reverse-engineer safety classifier decision boundaries through systematic probing.
Basic Classifier Evasion
Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.
Classifier Gauntlet: 10-Stage Evasion
Bypass 10 progressively harder input classifiers using different evasion techniques at each stage.
Lab: Advanced Classifier Evasion
Evade advanced input classifiers using adversarial perturbation, semantic paraphrasing, and gradient-free optimization.
ML Classifier Evasion Lab
Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.
Input Classifier Boundary Mapping
Systematically probe input classifiers to map their decision boundaries and find bypass regions.
Building an Input Safety Classifier
Build a production-quality input classifier that detects prompt injection attempts.
Constitutional Classifier Setup
Step-by-step walkthrough for implementing constitutional AI-style classifiers that evaluate LLM outputs against a set of principles, covering principle definition, classifier training, chain-of-thought evaluation, and deployment.
Output Content Classifier
Step-by-step walkthrough for building a classifier to filter harmful LLM outputs, covering taxonomy definition, multi-label classification, threshold calibration, and deployment as a real-time output gate.
Prompt Classifier Training
Step-by-step walkthrough for training a machine learning classifier to detect malicious prompts, covering dataset curation, feature engineering, model selection, training pipeline, evaluation, and deployment as a real-time detection service.
ML-Based Prompt Injection Detection Systems
Walkthrough for building and deploying ML-based prompt injection detection systems, covering training data collection, feature engineering, model architecture selection, threshold tuning, production deployment, and continuous improvement.
Training a Prompt Injection Classifier
Train a custom prompt injection detection classifier using labeled datasets and modern NLP techniques.