# tools

practice-examtoolsgarakpyrit

Tool Proficiency Practice Exam

Practice exam on Garak, PyRIT, Promptfoo, HarmBench, and custom harness development.

Automated Red Teaming Assessment

Assessment of automated attack generation tools including PAIR, TAP, GCG, and custom harness development.

assessmentautomatedtools

assessmenttoolsframeworksautomationred-teaming-tools

Tool Proficiency Assessment

Test your knowledge of AI red teaming tools, frameworks, automation platforms, and their appropriate application in security assessments with 9 intermediate-level questions.

skill-verificationautomatedtools

Skill Verification: Automated Red Teaming

Practical verification of automated attack generation using Garak, PyRIT, and Promptfoo.

assessmentsskill-verificationtoolspractical

Skill Verification: Tool Proficiency

Hands-on verification of proficiency with Garak, PyRIT, Promptfoo, and custom tooling.

challengeagentexploitationprompt-injectiontoolsmarch-2026

March 2026: Agent Exploitation Challenge

Compromise a multi-tool agent system through prompt injection and tool abuse, completing multiple objectives with escalating difficulty and point values.

communitytoolsspotlightopen-source

Community Tool Spotlight Series

Monthly spotlight on community-developed AI red teaming tools and their usage.

communityhackathontoolsdefense

Tool Building Hackathon: Defense Toolkit

Collaborative hackathon for building open-source defense tools including guardrails, filters, and monitoring components for LLM applications.

communityhackathontoolsforensics

Tool Building Hackathon: Forensics Suite

Community hackathon building forensic analysis tools for AI incident investigation, including log parsers, timeline reconstructors, and attribution aids.

communityhackathontoolsscanner

Tool Building Hackathon: Security Scanner

A community hackathon focused on building automated security scanning tools for LLM applications, with prizes for novel detection capabilities.

burp-suiteextensionsweb-securitytools

Burp Suite & AI Security Extensions

Using Burp Suite for AI API security testing: intercepting LLM API calls, AI-specific extensions, fuzzing AI endpoints, testing prompt injection via HTTP, and integrating web security methodology with AI red teaming.

custom-harnesspatternsarchitecturetools

Custom Harness Building Patterns

Design patterns for building custom AI red team harnesses: plugin architecture, result storage, async execution, multi-model support, converter pipelines, and production-grade orchestration.

專家

Garak: LLM Vulnerability Scanner

Deep dive into NVIDIA's Garak LLM vulnerability scanner: architecture, probes, generators, evaluators, custom probe development, and CI/CD integration for automated security scanning.

garakscannertoolsvulnerability

labtoolscomparisonmethodology

Lab: Tool Comparison — Same Target, 4 Tools

Hands-on lab comparing Garak, PyRIT, promptfoo, and Inspect AI against the same target model. Evaluate coverage, speed, finding quality, and practical trade-offs of each tool.

promptfootestingevaluationtools

promptfoo for Red Teaming

Deep dive into promptfoo for AI red teaming: YAML configuration, assertion-based testing, red team plugins, custom evaluators, and regression testing workflows for LLM security.

pyritorchestrationmicrosofttools

PyRIT: Red Team Orchestration

Deep dive into Microsoft's PyRIT (Python Risk Identification Toolkit): orchestrators, scorers, converters, targets, multi-turn campaigns, and advanced red team configuration.

exploit-devreportingtoolsautomation

Reporting Tool Development

Building automated reporting tools that transform raw test results into professional assessment reports with reproducible findings.

agentstoolsreactlangchainintermediate

Agent Architectures & Tool Use Patterns

How ReAct, Plan-and-Execute, and LangGraph agent patterns work — tool definition, invocation, and result processing — and where injection happens in each architecture.

compliancetoolsrisk-assessmentauditmonitoring

AI Compliance Tools Overview

Overview of tools, methodologies, and frameworks for maintaining AI compliance, including risk assessment, audit methodology, and continuous compliance monitoring.

Lab: Setting Up Your Red Team Environment

Step-by-step guide to setting up a complete AI red teaming environment with Python, API clients, scanning tools, and local models.

labenvironmentsetuptools

Lab: Scanning with Garak

Install and run the Garak LLM vulnerability scanner against local models. Learn to configure probes, generators, and evaluators for comprehensive scanning.

labgarakscanningtools

professionallab-setupinfrastructuretools

Setting Up an AI Red Team Lab Environment

Practical guide to designing and building a lab environment for AI red team testing, from hardware selection through tool configuration.

professionaltoolsdevelopmentautomation

Developing Custom AI Red Team Tools

Guide to designing, building, and maintaining custom tools for AI red team engagements.

professionalvendor-evaluationtoolsprocurement

Evaluating AI Security Vendors and Tools

Framework for assessing, comparing, and selecting AI security vendors, tools, and services for organizational needs.

professionalprocurementtoolsbudget

Tool Procurement Strategy

Strategic approach to evaluating, procuring, and maintaining AI security testing tools including cost-benefit analysis and vendor assessment.

capability-mappingrecontoolspermissionstradecraft

Mapping Model Capabilities

Systematic approaches to discovering and mapping the full capability surface of an AI system, including tools, integrations, permissions, and hidden features.

referencescheat-sheetsquick-referencecatalogscheckliststools

References & Quick Reference

Comprehensive collection of cheat sheets, quick references, catalogs, checklists, and comparison matrices for AI red teaming, covering attack techniques, defense bypasses, tools, frameworks, and compliance.

referencetoolscomparisonpyritgarakdeepteamautoredteamerharmbenchart

Automated Red Teaming Tools Comparison

Comprehensive comparison of automated AI red teaming tools including PyRIT, Garak, DeepTeam, AutoRedTeamer, HarmBench, and ART, with detailed capability matrices, strengths analysis, and use case recommendations.

referencetoolscomparisongarakpyritpromptfoo

Red Team Tool Comparison Matrix

Side-by-side comparison of AI red teaming tools -- Garak, PyRIT, promptfoo, Inspect AI, and HarmBench -- covering capabilities, use cases, and integration options.

toolscomparisongarakpyritpromptfooinspect-ai

Red Team Tool Comparison

Comparison of major AI red teaming tools -- Garak, PyRIT, promptfoo, and Inspect AI -- covering capabilities, strengths, limitations, and use cases.

tradecrafttoolsselectionmethodology

Tool Selection for AI Red Teaming

Framework for selecting and configuring tools for AI red team engagements based on target architecture, engagement scope, and team capabilities.

walkthroughstutorialstoolsmethodologydefenseplatformsengagements

Walkthroughs

Step-by-step guided walkthroughs covering red team tools, engagement methodology, defense implementation, platform-specific testing, and full engagement workflows.

langchainchainsagentstoolsragmemoryprompt-injectionwalkthrough

LangChain Application Security Testing

End-to-end walkthrough for security testing LangChain applications: chain enumeration, prompt injection through chains, tool and agent exploitation, retrieval augmented generation attacks, and memory manipulation.

benchmarkrunnertoolsdevelopmentwalkthroughs

Security Benchmark Runner Development

Build a benchmark runner for standardized evaluation of LLM security across models and configurations.

replaytoolsbuildingtoolattackwalkthroughs

Building an Attack Replay Tool

Build a tool that records and replays attack sequences for regression testing and defense validation.

customtoolsmutationwalkthroughsengine

Building a Custom Payload Mutation Engine

Develop a mutation engine that evolves prompt injection payloads through genetic algorithms.

garakcustomtoolsprobeswalkthroughs

Building Custom Garak Probes (Tool Walkthrough)

Develop custom probe modules for NVIDIA's Garak to test domain-specific vulnerabilities.

customtoolsbehaviorsharmbenchwalkthroughs

HarmBench Custom Behavior Sets

Extend HarmBench with custom behavior specifications for domain-specific vulnerability testing.

toolswalkthroughsgarakpyritpromptfooburp-suiteinspect-aiollamapython

Tool Walkthroughs

End-to-end practical walkthroughs for essential AI red teaming tools, covering installation, configuration, execution, and result interpretation.

llmtoolstrafficanalyzerwalkthroughs

Building an LLM Traffic Analyzer

Build a proxy-based LLM traffic analyzer for intercepting and analyzing API communications.

integrationtoolscicdpromptfoowalkthroughs

Promptfoo CI/CD Pipeline Integration

Integrate Promptfoo into CI/CD pipelines for continuous LLM security regression testing.

integrationtoolspyritscoringwalkthroughs

PyRIT Custom Scoring Integration

Integrate custom scoring metrics into PyRIT for organization-specific red team evaluation criteria.

walkthroughstoolsagent-scannerdevelopment

Agent Security Scanner Development

Build an automated security scanner for AI agent systems that tests tool use, memory, and workflow vulnerabilities.

walkthroughstoolsreport-generationautomation

Automated Red Team Report Generation

Build an automated system for generating structured red team reports from testing data and findings.

walkthroughstoolsattack-proxydevelopment

Building an LLM Attack Proxy

Build an intercepting proxy for LLM API traffic that enables payload injection, response analysis, and logging.

walkthroughstoolsbenchmarkingdefense

Defense Benchmarking Tool Development

Build a tool for benchmarking the effectiveness of defensive measures against standardized attack suites.

walkthroughstoolsembedding-attackstoolkit

Building an Embedding Attack Toolkit

Build a toolkit for conducting embedding-level attacks including inversion, poisoning, and manipulation.

walkthroughstoolsdataset-curationjailbreaks

Jailbreak Dataset Curation Tool

Build a tool for curating, organizing, and testing jailbreak prompt datasets with effectiveness tracking.

walkthroughstoolstraffic-analysismonitoring

LLM Traffic Analysis Tool

Build a tool for analyzing and visualizing LLM API traffic patterns to identify attack indicators.

walkthroughstoolsmcp-auditsecurity

MCP Security Audit Tool

Build a tool for auditing MCP server implementations for common security vulnerabilities and misconfigurations.

walkthroughstoolstest-harnessmulti-model

Multi-Model Test Harness Construction

Build a unified test harness for running attacks across OpenAI, Anthropic, Google, and local model endpoints.

walkthroughstoolsmutation-frameworkpayloads

Payload Mutation Framework Development

Build a payload mutation framework with grammar-based, semantic, and adversarial mutation strategies.

walkthroughstoolsrag-testingframework

RAG Security Testing Framework

Build a framework for systematic security testing of RAG applications including poisoning and exfiltration.

forensicstoolssoftwareinvestigation

Forensic 工具ing for AI Systems

概覽 of forensic tools and techniques specifically designed for AI system investigation including model analyzers, log parsers, and behavior profilers.

practice-examtoolsgarakpyrit

工具 Proficiency Practice Exam

Practice exam on Garak, PyRIT, Promptfoo, HarmBench, and custom harness development.

Automated 紅隊演練評量

評量 of automated attack generation tools including PAIR, TAP, GCG, and custom harness development.

assessmentautomatedtools

assessmenttoolsgarakpyritpromptfooevaluation

章節評量：工具熟練度

15 題校準評量，測試你對 AI 紅隊演練工具生態系的理解——Garak、PyRIT、Promptfoo 與自訂自動化。

skill-verificationautomatedtools

Skill Verification: Automated 紅隊演練

Practical verification of automated attack generation using Garak, PyRIT, and Promptfoo.

assessmentsskill-verificationtoolspractical

Skill Verification: 工具 Proficiency

Hands-on verification of proficiency with Garak, PyRIT, Promptfoo, and custom tooling.

labsetuptoolsgetting-started

建立你的 AI 紅隊實驗室

使用開源模型、測試框架與真實目標應用程式設定本地 AI 紅隊演練實驗室的實務指南。

AI 安全最新動態 — 2026 年 3 月

2026 年 3 月最重要 AI 安全發展、工具更新、研究亮點與新興攻擊向量的月度彙整。

roundupai-securitytoolsresearchregulationmonthly

2026 年 3 月：代理利用挑戰

經提示注入與工具濫用破壞多工具代理系統，以升級之難度與分數值完成多個目標。

challengeagentexploitationprompt-injectiontoolsmarch-2026

communitytoolsspotlightopen-source

Community 工具 Spotlight Series

Monthly spotlight on community-developed AI red teaming tools and their usage.

communityhackathontoolsdefense

工具 Building Hackathon: 防禦工具kit

Collaborative hackathon for building open-source defense tools including guardrails, filters, and monitoring components for LLM applications.

communityhackathontoolsforensics

工具 Building Hackathon: Forensics Suite

Community hackathon building forensic analysis tools for AI incident investigation, including log parsers, timeline reconstructors, and attribution aids.

communityhackathontoolsscanner

工具 Building Hackathon: 安全 Scanner

A community hackathon focused on building automated security scanning tools for LLM applications, with prizes for novel detection capabilities.

burp-suiteextensionsweb-securitytools

Burp Suite 與 AI 安全擴充

以 Burp Suite 進行 AI API 安全測試：攔截 LLM API 呼叫、AI 特定擴充、對 AI 端點模糊測試、以 HTTP 測試提示注入，以及將 Web 資安方法論與 AI 紅隊整合。

custom-harnesspatternsarchitecturetools

Custom Harness Building Patterns

Design patterns for building custom AI red team harnesses: plugin architecture, result storage, async execution, multi-model support, converter pipelines, and production-grade orchestration.

專家

Garak：LLM 漏洞掃描器

深入 NVIDIA 之 Garak LLM 漏洞掃描器：架構、探測、生成器、評估器、自訂探測開發，與自動化安全掃描之 CI/CD 整合。

garakscannertoolsvulnerability

labtoolscomparisonmethodology

實作：工具比較——同一目標，4 個工具

對同一目標模型比較 Garak、PyRIT、promptfoo 與 Inspect AI 之實作。評估各工具之覆蓋、速度、發現品質與實務取捨。

promptfootestingevaluationtools

promptfoo 用於紅隊

promptfoo 用於 AI 紅隊的深入解析：YAML 組態、以斷言為基礎的測試、紅隊外掛、自製評估器，以及 LLM 安全的回歸測試流程。

pyritorchestrationmicrosofttools

PyRIT：紅隊協作編排

深入探討 Microsoft PyRIT（Python Risk Identification Toolkit）：orchestrator、scorer、converter、target、多輪活動，以及進階紅隊組態。

exploit-devreportingtoolsautomation

Reporting 工具 Development

Building automated reporting tools that transform raw test results into professional assessment reports with reproducible findings.

agentstoolsreactlangchainintermediate

代理架構與工具使用模式

ReAct、Plan-and-Execute 與 LangGraph 代理模式如何運作——工具定義、呼叫與結果處理——以及注入於每個架構中發生之處。

compliancetoolsrisk-assessmentauditmonitoring

AI 合規工具概觀

用以維持 AI 合規的工具、方法論與框架概觀，包括風險評估、稽核方法論，以及持續性合規監控。

實作：設置你的紅隊環境

設置完整 AI 紅隊環境之逐步指南——Python、API 客戶端、掃描工具與本地模型。

labenvironmentsetuptools

實作：以 Garak 掃描

安裝並對本地模型執行 Garak LLM 漏洞掃描器。學習組態 probe、generator 與 evaluator 以進行完整掃描。

labgarakscanningtools

professionallab-setupinfrastructuretools

Setting Up an AI 紅隊實驗室 Environment

Practical guide to designing and building a lab environment for AI red team testing, from hardware selection through tool configuration.

professionaltoolsdevelopmentautomation

Developing Custom AI 紅隊工具s

指南 to designing, building, and maintaining custom tools for AI red team engagements.

professionalvendor-evaluationtoolsprocurement

Evaluating AI 安全 Vendors and 工具s

Framework for assessing, comparing, and selecting AI security vendors, tools, and services for organizational needs.

professionalprocurementtoolsbudget

工具 Procurement Strategy

Strategic approach to evaluating, procuring, and maintaining AI security testing tools including cost-benefit analysis and vendor assessment.

capability-mappingrecontoolspermissionstradecraft

繪製模型能力

有系統地發掘並繪製 AI 系統完整能力面的做法，涵蓋工具、整合、權限與隱藏功能。

referencescheat-sheetsquick-referencecatalogscheckliststools

參考資料與速查

AI 紅隊演練的速查表、快速參考、目錄、檢核清單與比較矩陣的完整集合，涵蓋攻擊技術、防禦繞過、工具、框架與合規。

referencetoolscomparisonpyritgarakdeepteamautoredteamerharmbenchart

Automated 紅隊演練工具s Comparison

referencetoolscomparisongarakpyritpromptfoo

紅隊工具比較矩陣

AI 紅隊工具的並列比較——Garak、PyRIT、promptfoo、Inspect AI 與 HarmBench——涵蓋能力、使用案例與整合選項。

toolscomparisongarakpyritpromptfooinspect-ai

紅隊工具比較

主要 AI 紅隊工具的比較——Garak、PyRIT、promptfoo 與 Inspect AI——涵蓋能力、強項、侷限與用途。

tradecrafttoolsselectionmethodology

工具 Selection for AI 紅隊ing

Framework for selecting and configuring tools for AI red team engagements based on target architecture, engagement scope, and team capabilities.

walkthroughstutorialstoolsmethodologydefenseplatformsengagements

逐步導覽

涵蓋紅隊工具、案件方法論、防禦實作、平台特定測試與完整案件工作流程的逐步引導式導覽。

langchainchainsagentstoolsragmemoryprompt-injectionwalkthrough

LangChain Application 安全 Testing

benchmarkrunnertoolsdevelopmentwalkthroughs

安全 Benchmark Runner Development

Build a benchmark runner for standardized evaluation of LLM security across models and configurations.

replaytoolsbuildingtoolattackwalkthroughs

Building an 攻擊 Replay 工具

Build a tool that records and replays attack sequences for regression testing and defense validation.

customtoolsmutationwalkthroughsengine

Building a Custom Payload Mutation Engine

Develop a mutation engine that evolves prompt injection payloads through genetic algorithms.

garakcustomtoolsprobeswalkthroughs

Building Custom Garak Probes (工具導覽)

Develop custom probe modules for NVIDIA's Garak to test domain-specific vulnerabilities.

customtoolsbehaviorsharmbenchwalkthroughs

HarmBench Custom Behavior Sets

Extend HarmBench with custom behavior specifications for domain-specific vulnerability testing.

toolswalkthroughsgarakpyritpromptfooburp-suiteinspect-aiollamapython

工具導覽

必備 AI 紅隊演練工具的端對端實務導覽，涵蓋安裝、設定、執行與結果詮釋。

llmtoolstrafficanalyzerwalkthroughs

Building an LLM Traffic Analyzer

Build a proxy-based LLM traffic analyzer for intercepting and analyzing API communications.

integrationtoolscicdpromptfoowalkthroughs

Promptfoo CI/CD Pipeline Integration

Integrate Promptfoo into CI/CD pipelines for continuous LLM security regression testing.

integrationtoolspyritscoringwalkthroughs

PyRIT Custom Scoring Integration

Integrate custom scoring metrics into PyRIT for organization-specific red team evaluation criteria.

walkthroughstoolsagent-scannerdevelopment

代理安全 Scanner Development

Build an automated security scanner for AI agent systems that tests tool use, memory, and workflow vulnerabilities.

walkthroughstoolsreport-generationautomation

Automated 紅隊 Report Generation

Build an automated system for generating structured red team reports from testing data and findings.

walkthroughstoolsattack-proxydevelopment

Building an LLM 攻擊 Proxy

Build an intercepting proxy for LLM API traffic that enables payload injection, response analysis, and logging.

walkthroughstoolsbenchmarkingdefense

防禦 Benchmarking 工具 Development

Build a tool for benchmarking the effectiveness of defensive measures against standardized attack suites.

walkthroughstoolsembedding-attackstoolkit

Building an Embedding 攻擊工具kit

Build a toolkit for conducting embedding-level attacks including inversion, poisoning, and manipulation.

walkthroughstoolsdataset-curationjailbreaks

越獄 Dataset Curation 工具

Build a tool for curating, organizing, and testing jailbreak prompt datasets with effectiveness tracking.

walkthroughstoolstraffic-analysismonitoring

LLM Traffic Analysis 工具

Build a tool for analyzing and visualizing LLM API traffic patterns to identify attack indicators.

walkthroughstoolsmcp-auditsecurity

MCP 安全 Audit 工具

Build a tool for auditing MCP server implementations for common security vulnerabilities and misconfigurations.

walkthroughstoolstest-harnessmulti-model

Multi-模型 Test Harness Construction

Build a unified test harness for running attacks across OpenAI, Anthropic, Google, and local model endpoints.

walkthroughstoolsmutation-frameworkpayloads

Payload Mutation Framework Development

Build a payload mutation framework with grammar-based, semantic, and adversarial mutation strategies.

walkthroughstoolsrag-testingframework

RAG 安全 Testing Framework

Build a framework for systematic security testing of RAG applications including poisoning and exfiltration.