# safety-evaluation
4 articlestagged with “safety-evaluation”
Case Study: DeepSeek Model Safety Evaluation Findings
Comprehensive analysis of safety evaluation findings for DeepSeek models, including comparative assessments against GPT-4 and Claude, jailbreak susceptibility testing, and implications for open-weight model deployment.
LLM-as-Judge Defense Systems
How LLM-as-judge architectures evaluate other LLM outputs for safety, including sequential and parallel designs, judge prompt engineering, and techniques for attacking judge models.
Fine-Tuning Safety Evaluation Framework
A comprehensive framework for evaluating the safety of fine-tuned models -- combining pre-deployment testing, safety regression benchmarks, and continuous monitoring to detect when fine-tuning has compromised model safety.
Lab: Model Security Comparison
Systematically compare the safety posture of major language models using a standardized test suite, building quantitative security profiles for GPT-4, Claude, and Gemini.