# safety-evaluation
標記為「safety-evaluation」的 8 篇文章
Case Study: DeepSeek Model Safety Evaluation Findings
Comprehensive analysis of safety evaluation findings for DeepSeek models, including comparative assessments against GPT-4 and Claude, jailbreak susceptibility testing, and implications for open-weight model deployment.
LLM-as-Judge Defense Systems
How LLM-as-judge architectures evaluate other LLM outputs for safety, including sequential and parallel designs, judge prompt engineering, and techniques for attacking judge models.
Fine-Tuning Safety Evaluation Framework
A comprehensive framework for evaluating the safety of fine-tuned models -- combining pre-deployment testing, safety regression benchmarks, and continuous monitoring to detect when fine-tuning has compromised model safety.
Lab: Model Security Comparison
Systematically compare the safety posture of major language models using a standardized test suite, building quantitative security profiles for GPT-4, Claude, and Gemini.
Case Study: DeepSeek 模型 Safety Evaluation Findings
Comprehensive analysis of safety evaluation findings for DeepSeek models, including comparative assessments against GPT-4 and Claude, jailbreak susceptibility testing, and implications for open-weight model deployment.
LLM-as-Judge 防禦系統
LLM-as-judge 架構如何評估其他 LLM 輸出之安全性,含循序與平行設計、judge 提示工程,以及攻擊 judge 模型之技術。
Fine-Tuning Safety Evaluation Framework
A comprehensive framework for evaluating the safety of fine-tuned models -- combining pre-deployment testing, safety regression benchmarks, and continuous monitoring to detect when fine-tuning has compromised model safety.
實驗室: 模型 安全 Comparison
Systematically compare the safety posture of major language models using a standardized test suite, building quantitative security profiles for GPT-4, Claude, and Gemini.