# safety-evaluation

4 articlestagged with “safety-evaluation”

Case Study: DeepSeek Model Safety Evaluation Findings

Comprehensive analysis of safety evaluation findings for DeepSeek models, including comparative assessments against GPT-4 and Claude, jailbreak susceptibility testing, and implications for open-weight model deployment.

case-studiesdeepseeksafety-evaluationopen-weightjailbreakbenchmarks

Intermediate

LLM-as-Judge Defense Systems

How LLM-as-judge architectures evaluate other LLM outputs for safety, including sequential and parallel designs, judge prompt engineering, and techniques for attacking judge models.

llm-judgesafety-evaluationdefense-architectureadversarialjudge-bypass

Expert

Fine-Tuning Safety Evaluation Framework

A comprehensive framework for evaluating the safety of fine-tuned models -- combining pre-deployment testing, safety regression benchmarks, and continuous monitoring to detect when fine-tuning has compromised model safety.

safety-evaluationregression-testingbenchmarkingmonitoringfine-tuning-securitysafety-framework

Intermediate

Lab: Model Security Comparison

Systematically compare the safety posture of major language models using a standardized test suite, building quantitative security profiles for GPT-4, Claude, and Gemini.

labmodel-comparisonsafety-evaluationbenchmarking

Advanced