# benchmarks
標記為「benchmarks」的 22 篇文章
Case Study: DeepSeek Model Safety Evaluation Findings
Comprehensive analysis of safety evaluation findings for DeepSeek models, including comparative assessments against GPT-4 and Claude, jailbreak susceptibility testing, and implications for open-weight model deployment.
Community Benchmark Results
Community-submitted benchmark results for attack and defense effectiveness across models.
Benchmark Gaming Attacks
Techniques for gaming evaluation benchmarks to make poisoned or compromised models appear safe and capable during standard safety evaluations.
Evaluating Defense Effectiveness
Metrics, benchmarks, and methodology for measuring how well AI defenses work against real attacks, including evaluation pitfalls and best practices.
AI Agent Safety Evaluation Frameworks
Comprehensive review of agent safety benchmarks including SWE-bench safety, AgentBench, and custom evaluation suites.
LLM Agent Safety Benchmarks
Survey of agent safety benchmarks and evaluation frameworks for assessing autonomous AI system risks.
AI Safety Benchmarks & Evaluation
Overview of AI safety evaluation: benchmarking frameworks, safety metrics, evaluation methodologies, and the landscape of standardized assessment tools for AI red teaming.
Injection Benchmark Design
Designing robust benchmarks for evaluating injection attack and defense effectiveness.
Evaluation Benchmark Gaming
Techniques for gaming evaluation benchmarks to mask vulnerabilities or inflate safety scores.
HarmBench Evaluation Framework Walkthrough
Complete walkthrough of the HarmBench evaluation framework: installation, running standardized benchmarks against models, interpreting results, creating custom behavior evaluations, and comparing model safety across versions.
Inspect AI Walkthrough
Complete walkthrough of UK AISI's Inspect AI framework: installation, writing evaluations, running against models, custom scorers, benchmark suites, and producing compliance-ready reports.
Case Study: DeepSeek 模型 Safety Evaluation Findings
Comprehensive analysis of safety evaluation findings for DeepSeek models, including comparative assessments against GPT-4 and Claude, jailbreak susceptibility testing, and implications for open-weight model deployment.
Community Benchmark Results
Community-submitted benchmark results for attack and defense effectiveness across models.
Benchmark Gaming 攻擊s
Techniques for gaming evaluation benchmarks to make poisoned or compromised models appear safe and capable during standard safety evaluations.
Evaluating 防禦 Effectiveness
Metrics, benchmarks, and methodology for measuring how well AI defenses work against real attacks, including evaluation pitfalls and best practices.
AI 代理 Safety Evaluation Frameworks
Comprehensive review of agent safety benchmarks including SWE-bench safety, 代理Bench, and custom evaluation suites.
LLM 代理 Safety Benchmarks
Survey of agent safety benchmarks and evaluation frameworks for assessing autonomous AI system risks.
AI 安全基準與評估
AI 安全評估之概觀:基準框架、安全指標、評估方法論,以及供 AI 紅隊使用之標準化評估工具景觀。
Injection Benchmark Design
Designing robust benchmarks for evaluating injection attack and defense effectiveness.
Evaluation Benchmark Gaming
Techniques for gaming evaluation benchmarks to mask vulnerabilities or inflate safety scores.
HarmBench Evaluation Framework 導覽
Complete walkthrough of the HarmBench evaluation framework: installation, running standardized benchmarks against models, interpreting results, creating custom behavior evaluations, and comparing model safety across versions.
Inspect AI 導覽
Complete walkthrough of UK AISI's Inspect AI framework: installation, writing evaluations, running against models, custom scorers, benchmark suites, and producing compliance-ready reports.