# benchmark

8 artikelengetagd met “benchmark”

Communityproject: benchmark-suite

Community-developed benchmark suite for evaluating LLM security that covers injection, exfiltration, jailbreaking, and agent exploitation attack classes.

communityprojectbenchmarkevaluation

Gemiddeld

Maandelijkse competitie: Model Breaker

Monthly competitions focused on discovering novel jailbreak techniques against updated model versions, with community-validated scoring.

communitycompetitionjailbreakbenchmark

Gevorderd

HarmBench: gestandaardiseerde red team-evaluatie

Diepe duik in het HarmBench-framework voor gestandaardiseerde red team-evaluatie: aanvalsmethoden, de evaluatiepijplijn, benchmarks draaien, resultaten interpreteren en de veiligheid van modellen tussen providers vergelijken.

harmbenchbenchmarkevaluationstandardized

Gevorderd

Lab: maak een veiligheidsbenchmark

Design, build, and validate a comprehensive AI safety evaluation suite. Learn benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.

labexpertbenchmarkevaluationsafetyhands-on

Expert

Vergelijking van benchmark-suites

Vergelijking van AI-veiligheidsbenchmark-suites zoals HarmBench, JailbreakBench en custom evaluatieframeworks met dekkingsanalyse.

referencebenchmarkcomparisonevaluation

Gemiddeld

Ontwikkeling van een beveiligingsbenchmark-runner

Build a benchmark runner for standardized evaluation of LLM security across models and configurations.

benchmarkrunnertoolsdevelopmentwalkthroughs

Gemiddeld

Walkthrough: HarmBench-evaluatie

Run standardized attack evaluations using the HarmBench framework and interpret results.

walkthroughsharmbenchevaluationbenchmark

Gevorderd

Gebruik van en inzendingen bij JailbreakBench

Use JailbreakBench to evaluate jailbreak techniques and submit results to the benchmark.

walkthroughsjailbreakbenchbenchmarktesting

Gemiddeld