# safety-evaluation
標記為「safety-evaluation」的 4 篇文章
案例研究:DeepSeek 模型安全評估發現
對 DeepSeek 模型安全評估發現的完整分析,包括與 GPT-4 和 Claude 的比較評估、越獄脆弱性測試,以及對開源權重模型部署的啟示。
case-studiesdeepseeksafety-evaluationopen-weightjailbreakbenchmarks
LLM-as-Judge 防禦系統
LLM-as-judge 架構如何評估其他 LLM 輸出之安全性,含循序與平行設計、judge 提示工程,以及攻擊 judge 模型之技術。
llm-judgesafety-evaluationdefense-architectureadversarialjudge-bypass
Fine-Tuning Safety Evaluation Framework
A comprehensive framework for evaluating the safety of fine-tuned models -- combining pre-deployment testing, safety regression benchmarks, and continuous monitoring to detect when fine-tuning has compromised model safety.
safety-evaluationregression-testingbenchmarkingmonitoringfine-tuning-securitysafety-framework
Lab: Model Security Comparison
系統性 compare the safety posture of major 語言模型 using a standardized test suite, building quantitative security profiles for GPT-4, Claude, and Gemini.
labmodel-comparisonsafety-evaluationbenchmarking