# safety-classifier

標記為「safety-classifier」的 3 篇文章

Lab: Safety Classifier Model Extraction

Extract the decision boundary of safety classifiers through systematic probing to craft maximally evasive payloads.

Reverse-engineer a safety classifier's decision boundaries through systematic adversarial probing.

訓練針對你應用程式特定威脅模型與內容政策調校的自訂安全分類器。