# classifier
標記為「classifier」的 16 篇文章
Prompt Shield 與注入偵測
Azure Prompt Shield 與專責注入偵測模型如何運作,其基於微調分類器之偵測模式,以及繞過它們之系統化方法。
訓練提示詞注入分類器
訓練並評估能高準確度偵測提示詞注入嘗試之 ML 分類器的方法論。
以 ML 為基礎的提示詞注入偵測
為即時提示詞注入偵測建構並訓練 ML 分類器。
Custom 安全 Classifier 訓練
Train a custom input safety classifier與then develop payloads that reliably evade it to underst與classifier limitations.
Safety Classifier Inversion
Reverse-engineer safety classifier decision boundaries through systematic probing.
Basic Classifier Evasion
Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.
Classifier Gauntlet: 10-Stage Evasion
繞過10 progressively harder input classifiers using different evasion techniques at each stage.
實驗:進階分類器閃避
使用對抗性擾動、語意改寫與無梯度最佳化,閃避進階輸入分類器。
ML Classifier Evasion 實驗室
發展payloads that evade machine learning-based input classifiers through adversarial text perturbation.
Input Classifier Boundary Mapping
Systematically probe input classifiers to map their decision boundaries與find bypass regions.
Building an Input Safety Classifier
Build a production-quality input classifier that detects prompt injection attempts.
憲法式分類器設置
實作憲法式 AI 風格分類器以原則集合評估 LLM 輸出的逐步演練,涵蓋原則定義、分類器訓練、思維鏈評估與部署。
Output Content Classifier
Step-by-step walkthrough for building a classifier to filter harmful LLM outputs, covering taxonomy definition, multi-label classification, threshold calibration, and deployment as a real-time output gate.
Prompt Classifier 訓練
Step-by-step walkthrough for training a machine learning classifier to detect malicious prompts, covering dataset curation, feature engineering, model selection, training pipeline, evaluation, and deployment as a real-time detection service.
ML-Based 提示詞注入 Detection Systems
導覽 for building and deploying ML-based prompt injection detection systems, covering training data collection, feature engineering, model architecture selection, threshold tuning, production deployment, and continuous improvement.
Training a Prompt Injection Classifier
Train a custom prompt injection detection classifier using labeled datasets and modern NLP techniques.