# benchmark

標記為「benchmark」的 16 篇文章

Community Project: Benchmark Suite

Community-developed benchmark suite for evaluating LLM security that covers injection, exfiltration, jailbreaking, and agent exploitation attack classes.

communityprojectbenchmarkevaluation

中級

Monthly Competition: Model Breaker

Monthly competitions focused on discovering novel jailbreak techniques against updated model versions, with community-validated scoring.

communitycompetitionjailbreakbenchmark

進階

HarmBench: Standardized Red Team Evaluation

Deep dive into the HarmBench framework for standardized red team evaluation: attack methods, evaluation pipeline, running benchmarks, interpreting results, and comparing model safety across providers.

harmbenchbenchmarkevaluationstandardized

進階

Lab: Create a Safety Benchmark

Design, build, and validate a comprehensive AI safety evaluation suite. Learn benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.

labexpertbenchmarkevaluationsafetyhands-on

專家

Benchmark Suite Comparison

Comparison of AI safety benchmark suites including HarmBench, JailbreakBench, and custom evaluation frameworks with coverage analysis.

referencebenchmarkcomparisonevaluation

中級

Security Benchmark Runner Development

Build a benchmark runner for standardized evaluation of LLM security across models and configurations.

benchmarkrunnertoolsdevelopmentwalkthroughs

中級

HarmBench Evaluation Walkthrough

Run standardized attack evaluations using the HarmBench framework and interpret results.

walkthroughsharmbenchevaluationbenchmark

進階

JailbreakBench Usage and Submission

Use JailbreakBench to evaluate jailbreak techniques and submit results to the benchmark.

walkthroughsjailbreakbenchbenchmarktesting

中級

Community Project: Benchmark Suite

Community-developed benchmark suite for evaluating LLM security that covers injection, exfiltration, jailbreaking, and agent exploitation attack classes.

communityprojectbenchmarkevaluation

中級

Monthly Competition: 模型 Breaker

Monthly competitions focused on discovering novel jailbreak techniques against updated model versions, with community-validated scoring.

communitycompetitionjailbreakbenchmark

進階

HarmBench：標準化紅隊評估

深入解析 HarmBench 框架之標準化紅隊評估：攻擊方法、評估管線、執行基準、結果解讀，以及跨供應商的模型安全比較。

harmbenchbenchmarkevaluationstandardized

進階

實驗室: Create a Safety Benchmark

labexpertbenchmarkevaluationsafetyhands-on

專家

Benchmark Suite Comparison

Comparison of AI safety benchmark suites including HarmBench, 越獄Bench, and custom evaluation frameworks with coverage analysis.

referencebenchmarkcomparisonevaluation

中級

安全 Benchmark Runner Development

Build a benchmark runner for standardized evaluation of LLM security across models and configurations.

benchmarkrunnertoolsdevelopmentwalkthroughs

中級

HarmBench Evaluation 導覽

Run standardized attack evaluations using the HarmBench framework and interpret results.

walkthroughsharmbenchevaluationbenchmark

進階

越獄Bench Usage and Submission

Use 越獄Bench to evaluate jailbreak techniques and submit results to the benchmark.

walkthroughsjailbreakbenchbenchmarktesting

中級