# benchmarks

22 articlestagged with “benchmarks”

Case Study: DeepSeek Model Safety Evaluation Findings

Comprehensive analysis of safety evaluation findings for DeepSeek models, including comparative assessments against GPT-4 and Claude, jailbreak susceptibility testing, and implications for open-weight model deployment.

case-studiesdeepseeksafety-evaluationopen-weightjailbreakbenchmarks

Intermediate

Community Benchmark Results

Community-submitted benchmark results for attack and defense effectiveness across models.

communitybenchmarksresultsleaderboard

Intermediate

Benchmark Gaming Attacks

Techniques for gaming evaluation benchmarks to make poisoned or compromised models appear safe and capable during standard safety evaluations.

data-trainingbenchmarksevaluationgaming

Intermediate

Evaluating Defense Effectiveness

Metrics, benchmarks, and methodology for measuring how well AI defenses work against real attacks, including evaluation pitfalls and best practices.

evaluationmetricsbenchmarksmeasurementdefense-testing

Intermediate

AI Agent Safety Evaluation Frameworks

Comprehensive review of agent safety benchmarks including SWE-bench safety, AgentBench, and custom evaluation suites.

frontieragent-safetybenchmarks

Intermediate

LLM Agent Safety Benchmarks

Survey of agent safety benchmarks and evaluation frameworks for assessing autonomous AI system risks.

frontier-researchagent-safetybenchmarksevaluation

Advanced

AI Safety Benchmarks & Evaluation

Overview of AI safety evaluation: benchmarking frameworks, safety metrics, evaluation methodologies, and the landscape of standardized assessment tools for AI red teaming.

benchmarksevaluationsafety

Advanced

Injection Benchmark Design

Designing robust benchmarks for evaluating injection attack and defense effectiveness.

injection-researchbenchmarksdesignevaluation

Intermediate

Evaluation Benchmark Gaming

Techniques for gaming evaluation benchmarks to mask vulnerabilities or inflate safety scores.

training-pipelinebenchmarksgamingevaluation

Advanced

HarmBench Evaluation Framework Walkthrough

Complete walkthrough of the HarmBench evaluation framework: installation, running standardized benchmarks against models, interpreting results, creating custom behavior evaluations, and comparing model safety across versions.

harmbenchevaluationbenchmarkssafetyred-team-automationwalkthrough

Advanced

Inspect AI Walkthrough

Complete walkthrough of UK AISI's Inspect AI framework: installation, writing evaluations, running against models, custom scorers, benchmark suites, and producing compliance-ready reports.

inspect-aiaisievaluationbenchmarksscorerscompliancewalkthrough

Intermediate

Case Study: DeepSeek 模型 Safety Evaluation Findings

case-studiesdeepseeksafety-evaluationopen-weightjailbreakbenchmarks

Intermediate

Community Benchmark Results

Community-submitted benchmark results for attack and defense effectiveness across models.

communitybenchmarksresultsleaderboard

Intermediate

Benchmark Gaming 攻擊s

Techniques for gaming evaluation benchmarks to make poisoned or compromised models appear safe and capable during standard safety evaluations.

data-trainingbenchmarksevaluationgaming

Intermediate

Evaluating 防禦 Effectiveness

Metrics, benchmarks, and methodology for measuring how well AI defenses work against real attacks, including evaluation pitfalls and best practices.

evaluationmetricsbenchmarksmeasurementdefense-testing

Intermediate

AI 代理 Safety Evaluation Frameworks

Comprehensive review of agent safety benchmarks including SWE-bench safety, 代理Bench, and custom evaluation suites.

frontieragent-safetybenchmarks

Intermediate

LLM 代理 Safety Benchmarks

Survey of agent safety benchmarks and evaluation frameworks for assessing autonomous AI system risks.

frontier-researchagent-safetybenchmarksevaluation

Advanced

AI 安全基準與評估

AI 安全評估之概觀：基準框架、安全指標、評估方法論，以及供 AI 紅隊使用之標準化評估工具景觀。

benchmarksevaluationsafety

Advanced

Injection Benchmark Design

Designing robust benchmarks for evaluating injection attack and defense effectiveness.

injection-researchbenchmarksdesignevaluation

Intermediate

Evaluation Benchmark Gaming

Techniques for gaming evaluation benchmarks to mask vulnerabilities or inflate safety scores.

training-pipelinebenchmarksgamingevaluation

Advanced

HarmBench Evaluation Framework 導覽

harmbenchevaluationbenchmarkssafetyred-team-automationwalkthrough

Advanced

Inspect AI 導覽

Complete walkthrough of UK AISI's Inspect AI framework: installation, writing evaluations, running against models, custom scorers, benchmark suites, and producing compliance-ready reports.

inspect-aiaisievaluationbenchmarksscorerscompliancewalkthrough

Intermediate