# benchmarks

標記為「benchmarks」的 22 篇文章

Case Study: DeepSeek Model Safety Evaluation Findings

Comprehensive analysis of safety evaluation findings for DeepSeek models, including comparative assessments against GPT-4 and Claude, jailbreak susceptibility testing, and implications for open-weight model deployment.

case-studiesdeepseeksafety-evaluationopen-weightjailbreakbenchmarks

中級

Community Benchmark Results

Community-submitted benchmark results for attack and defense effectiveness across models.

communitybenchmarksresultsleaderboard

中級

Benchmark Gaming Attacks

Techniques for gaming evaluation benchmarks to make poisoned or compromised models appear safe and capable during standard safety evaluations.

data-trainingbenchmarksevaluationgaming

中級

Evaluating Defense Effectiveness

Metrics, benchmarks, and methodology for measuring how well AI defenses work against real attacks, including evaluation pitfalls and best practices.

evaluationmetricsbenchmarksmeasurementdefense-testing

中級

AI Agent Safety Evaluation Frameworks

Comprehensive review of agent safety benchmarks including SWE-bench safety, AgentBench, and custom evaluation suites.

frontieragent-safetybenchmarks

中級

LLM Agent Safety Benchmarks

Survey of agent safety benchmarks and evaluation frameworks for assessing autonomous AI system risks.

frontier-researchagent-safetybenchmarksevaluation

進階

AI Safety Benchmarks & Evaluation

Overview of AI safety evaluation: benchmarking frameworks, safety metrics, evaluation methodologies, and the landscape of standardized assessment tools for AI red teaming.

benchmarksevaluationsafety

進階

Injection Benchmark Design

Designing robust benchmarks for evaluating injection attack and defense effectiveness.

injection-researchbenchmarksdesignevaluation

中級

Evaluation Benchmark Gaming

Techniques for gaming evaluation benchmarks to mask vulnerabilities or inflate safety scores.

training-pipelinebenchmarksgamingevaluation

進階

HarmBench Evaluation Framework Walkthrough

Complete walkthrough of the HarmBench evaluation framework: installation, running standardized benchmarks against models, interpreting results, creating custom behavior evaluations, and comparing model safety across versions.

harmbenchevaluationbenchmarkssafetyred-team-automationwalkthrough

進階

Inspect AI Walkthrough

Complete walkthrough of UK AISI's Inspect AI framework: installation, writing evaluations, running against models, custom scorers, benchmark suites, and producing compliance-ready reports.

inspect-aiaisievaluationbenchmarksscorerscompliancewalkthrough

中級

Case Study: DeepSeek 模型 Safety Evaluation Findings