# benchmark

16 articlestagged with “benchmark”

Community Project: Benchmark Suite

Community-developed benchmark suite for evaluating LLM security that covers injection, exfiltration, jailbreaking, and agent exploitation attack classes.

communityprojectbenchmarkevaluation

Intermediate

Monthly Competition: Model Breaker

Monthly competitions focused on discovering novel jailbreak techniques against updated model versions, with community-validated scoring.

communitycompetitionjailbreakbenchmark

Advanced

HarmBench: Standardized Red Team Evaluation

Deep dive into the HarmBench framework for standardized red team evaluation: attack methods, evaluation pipeline, running benchmarks, interpreting results, and comparing model safety across providers.

harmbenchbenchmarkevaluationstandardized

Advanced

Lab: Create a Safety Benchmark

Design, build, and validate a comprehensive AI safety evaluation suite. Learn benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.

labexpertbenchmarkevaluationsafetyhands-on

Expert

Benchmark Suite Comparison

Comparison of AI safety benchmark suites including HarmBench, JailbreakBench, and custom evaluation frameworks with coverage analysis.

referencebenchmarkcomparisonevaluation

Intermediate

Security Benchmark Runner Development

Build a benchmark runner for standardized evaluation of LLM security across models and configurations.

benchmarkrunnertoolsdevelopmentwalkthroughs

Intermediate

HarmBench Evaluation Walkthrough

Run standardized attack evaluations using the HarmBench framework and interpret results.

walkthroughsharmbenchevaluationbenchmark

Advanced

JailbreakBench Usage and Submission

Use JailbreakBench to evaluate jailbreak techniques and submit results to the benchmark.

walkthroughsjailbreakbenchbenchmarktesting

Intermediate

Community Project: Benchmark Suite

Community-developed benchmark suite for evaluating LLM security that covers injection, exfiltration, jailbreaking, and agent exploitation attack classes.

communityprojectbenchmarkevaluation

Intermediate

Monthly Competition: 模型 Breaker

Monthly competitions focused on discovering novel jailbreak techniques against updated model versions, with community-validated scoring.

communitycompetitionjailbreakbenchmark

Advanced

HarmBench：標準化紅隊評估

深入解析 HarmBench 框架之標準化紅隊評估：攻擊方法、評估管線、執行基準、結果解讀，以及跨供應商的模型安全比較。

harmbenchbenchmarkevaluationstandardized

Advanced

實驗室: Create a Safety Benchmark

labexpertbenchmarkevaluationsafetyhands-on

Expert

Benchmark Suite Comparison

Comparison of AI safety benchmark suites including HarmBench, 越獄Bench, and custom evaluation frameworks with coverage analysis.

referencebenchmarkcomparisonevaluation

Intermediate

安全 Benchmark Runner Development

Build a benchmark runner for standardized evaluation of LLM security across models and configurations.

benchmarkrunnertoolsdevelopmentwalkthroughs

Intermediate

HarmBench Evaluation 導覽

Run standardized attack evaluations using the HarmBench framework and interpret results.

walkthroughsharmbenchevaluationbenchmark

Advanced

越獄Bench Usage and Submission

Use 越獄Bench to evaluate jailbreak techniques and submit results to the benchmark.

walkthroughsjailbreakbenchbenchmarktesting

Intermediate