# evaluation

48 articlestagged with “evaluation”

Skill Verification: Defense Effectiveness Evaluation

Practical verification of skills in evaluating guardrails, classifiers, and monitoring systems.

Capstone: Design and Implement an AI Safety Benchmark Suite

Build a comprehensive, reproducible benchmark suite for evaluating LLM safety across multiple risk dimensions including toxicity, bias, hallucination, and adversarial robustness.

capstonebenchmarkingsafetyevaluationtesting

Expert

Community Project: Benchmark Suite

Community-developed benchmark suite for evaluating LLM security that covers injection, exfiltration, jailbreaking, and agent exploitation attack classes.

# evaluation

Skill Verification: Defense Effectiveness Evaluation

Capstone: Design and Implement an AI Safety Benchmark Suite

Community Project: Benchmark Suite

Benchmark Gaming Attacks

Safety Layer Benchmarking Methodology

Defense Evaluation Methodology

Evaluating Defense Effectiveness

HarmBench: Standardized Red Team Evaluation

Inspect AI: UK AISI Evaluation Framework

promptfoo for Red Teaming

Result Scoring Systems

Evaluation Evasion in Fine-Tuning

Safety Regression Testing

Evaluation and Benchmarking Basics

Alignment Faking Detection Methods

Training Implications of Alignment Faking

LLM Agent Safety Benchmarks

Building Evaluation Harnesses

AI Safety Benchmarks & Evaluation

Red Team Metrics Beyond ASR

Statistical Rigor in AI Red Teaming

Governance & Compliance

Injection Benchmark Design

Injection Benchmarking Methodology

Lab: Evaluation Framework Gaming

HarmBench Custom Attack Submission

Setting Up Promptfoo for LLM Evaluation

Lab: Promptfoo Setup and First Eval

Lab: Comparing Red Team Testing Tools

Your First HarmBench Evaluation

Your First Inspect AI Evaluation

Lab: Build Guardrail Evaluator

Lab: Create a Safety Benchmark

Lab: Building an LLM Judge Evaluator

Cross-Model Comparison

Benchmarking Multimodal Model Safety

Benchmark Suite Comparison

Evaluation Benchmark Gaming

Evaluation Set Contamination Attacks

LLM Judge Implementation

HarmBench Evaluation Walkthrough

HarmBench Evaluation Framework Walkthrough

Inspect AI Walkthrough

Running Your First Promptfoo Evaluation

Automating Red Team Evaluations with Promptfoo

Promptfoo for Red Team Evaluation

Promptfoo End-to-End Walkthrough

Creating Custom Scorers for PyRIT Attack Evaluation

# evaluation

Skill Verification: Defense Effectiveness Evaluation

Capstone: Design and Implement an AI Safety Benchmark Suite

Community Project: Benchmark Suite

Benchmark Gaming Attacks

Safety Layer Benchmarking Methodology

Defense Evaluation Methodology

Evaluating Defense Effectiveness

HarmBench: Standardized Red Team Evaluation

Inspect AI: UK AISI Evaluation Framework

promptfoo for Red Teaming

Result Scoring Systems

Evaluation Evasion in Fine-Tuning

Safety Regression Testing

Evaluation and Benchmarking Basics

Alignment Faking Detection Methods

Training Implications of Alignment Faking

LLM Agent Safety Benchmarks

Building Evaluation Harnesses

AI Safety Benchmarks & Evaluation

Red Team Metrics Beyond ASR

Statistical Rigor in AI Red Teaming

Governance & Compliance

Injection Benchmark Design

Injection Benchmarking Methodology

Lab: Evaluation Framework Gaming

HarmBench Custom Attack Submission

Setting Up Promptfoo for LLM Evaluation

Lab: Promptfoo Setup and First Eval

Lab: Comparing Red Team Testing Tools

Your First HarmBench Evaluation