# benchmarks
11 articlestagged with “benchmarks”
Case Study: DeepSeek Model Safety Evaluation Findings
Comprehensive analysis of safety evaluation findings for DeepSeek models, including comparative assessments against GPT-4 and Claude, jailbreak susceptibility testing, and implications for open-weight model deployment.
Community Benchmark Results
Community-submitted benchmark results for attack and defense effectiveness across models.
Benchmark Gaming Attacks
Techniques for gaming evaluation benchmarks to make poisoned or compromised models appear safe and capable during standard safety evaluations.
Evaluating Defense Effectiveness
Metrics, benchmarks, and methodology for measuring how well AI defenses work against real attacks, including evaluation pitfalls and best practices.
AI Agent Safety Evaluation Frameworks
Comprehensive review of agent safety benchmarks including SWE-bench safety, AgentBench, and custom evaluation suites.
LLM Agent Safety Benchmarks
Survey of agent safety benchmarks and evaluation frameworks for assessing autonomous AI system risks.
AI Safety Benchmarks & Evaluation
Overview of AI safety evaluation: benchmarking frameworks, safety metrics, evaluation methodologies, and the landscape of standardized assessment tools for AI red teaming.
Injection Benchmark Design
Designing robust benchmarks for evaluating injection attack and defense effectiveness.
Evaluation Benchmark Gaming
Techniques for gaming evaluation benchmarks to mask vulnerabilities or inflate safety scores.
HarmBench Evaluation Framework Walkthrough
Complete walkthrough of the HarmBench evaluation framework: installation, running standardized benchmarks against models, interpreting results, creating custom behavior evaluations, and comparing model safety across versions.
Inspect AI Walkthrough
Complete walkthrough of UK AISI's Inspect AI framework: installation, writing evaluations, running against models, custom scorers, benchmark suites, and producing compliance-ready reports.