# tools

practice-examtoolsgarakpyrit

Tool Proficiency Practice Exam

Practice exam on Garak, PyRIT, Promptfoo, HarmBench, and custom harness development.

Automated Red Teaming Assessment

Assessment of automated attack generation tools including PAIR, TAP, GCG, and custom harness development.

assessmentautomatedtools

assessmenttoolsframeworksautomationred-teaming-tools

Tool Proficiency Assessment

Test your knowledge of AI red teaming tools, frameworks, automation platforms, and their appropriate application in security assessments with 9 intermediate-level questions.

skill-verificationautomatedtools

Skill Verification: Automated Red Teaming

Practical verification of automated attack generation using Garak, PyRIT, and Promptfoo.

assessmentsskill-verificationtoolspractical

Skill Verification: Tool Proficiency

Hands-on verification of proficiency with Garak, PyRIT, Promptfoo, and custom tooling.

challengeagentexploitationprompt-injectiontoolsmarch-2026

March 2026: Agent Exploitation Challenge

Compromise a multi-tool agent system through prompt injection and tool abuse, completing multiple objectives with escalating difficulty and point values.

communitytoolsspotlightopen-source

Community Tool Spotlight Series

Monthly spotlight on community-developed AI red teaming tools and their usage.

communityhackathontoolsdefense

Tool Building Hackathon: Defense Toolkit

Collaborative hackathon for building open-source defense tools including guardrails, filters, and monitoring components for LLM applications.

communityhackathontoolsforensics

Tool Building Hackathon: Forensics Suite

Community hackathon building forensic analysis tools for AI incident investigation, including log parsers, timeline reconstructors, and attribution aids.

communityhackathontoolsscanner

Tool Building Hackathon: Security Scanner

A community hackathon focused on building automated security scanning tools for LLM applications, with prizes for novel detection capabilities.

burp-suiteextensionsweb-securitytools

Burp Suite & AI Security Extensions

Using Burp Suite for AI API security testing: intercepting LLM API calls, AI-specific extensions, fuzzing AI endpoints, testing prompt injection via HTTP, and integrating web security methodology with AI red teaming.

custom-harnesspatternsarchitecturetools

Custom Harness Building Patterns

Design patterns for building custom AI red team harnesses: plugin architecture, result storage, async execution, multi-model support, converter pipelines, and production-grade orchestration.

Expert

Garak: LLM Vulnerability Scanner

Deep dive into NVIDIA's Garak LLM vulnerability scanner: architecture, probes, generators, evaluators, custom probe development, and CI/CD integration for automated security scanning.

garakscannertoolsvulnerability

labtoolscomparisonmethodology

Lab: Tool Comparison — Same Target, 4 Tools

Hands-on lab comparing Garak, PyRIT, promptfoo, and Inspect AI against the same target model. Evaluate coverage, speed, finding quality, and practical trade-offs of each tool.

promptfootestingevaluationtools

promptfoo for Red Teaming

Deep dive into promptfoo for AI red teaming: YAML configuration, assertion-based testing, red team plugins, custom evaluators, and regression testing workflows for LLM security.

pyritorchestrationmicrosofttools

PyRIT: Red Team Orchestration

Deep dive into Microsoft's PyRIT (Python Risk Identification Toolkit): orchestrators, scorers, converters, targets, multi-turn campaigns, and advanced red team configuration.

exploit-devreportingtoolsautomation

Reporting Tool Development

Building automated reporting tools that transform raw test results into professional assessment reports with reproducible findings.

agentstoolsreactlangchainintermediate

Agent Architectures & Tool Use Patterns

How ReAct, Plan-and-Execute, and LangGraph agent patterns work — tool definition, invocation, and result processing — and where injection happens in each architecture.

compliancetoolsrisk-assessmentauditmonitoring

AI Compliance Tools Overview

Overview of tools, methodologies, and frameworks for maintaining AI compliance, including risk assessment, audit methodology, and continuous compliance monitoring.

Lab: Setting Up Your Red Team Environment

Step-by-step guide to setting up a complete AI red teaming environment with Python, API clients, scanning tools, and local models.

labenvironmentsetuptools

Lab: Scanning with Garak

Install and run the Garak LLM vulnerability scanner against local models. Learn to configure probes, generators, and evaluators for comprehensive scanning.

labgarakscanningtools

professionallab-setupinfrastructuretools

Setting Up an AI Red Team Lab Environment

Practical guide to designing and building a lab environment for AI red team testing, from hardware selection through tool configuration.

professionaltoolsdevelopmentautomation

Developing Custom AI Red Team Tools

Guide to designing, building, and maintaining custom tools for AI red team engagements.

professionalvendor-evaluationtoolsprocurement

Evaluating AI Security Vendors and Tools

Framework for assessing, comparing, and selecting AI security vendors, tools, and services for organizational needs.

professionalprocurementtoolsbudget

Tool Procurement Strategy

Strategic approach to evaluating, procuring, and maintaining AI security testing tools including cost-benefit analysis and vendor assessment.

capability-mappingrecontoolspermissionstradecraft

Mapping Model Capabilities

Systematic approaches to discovering and mapping the full capability surface of an AI system, including tools, integrations, permissions, and hidden features.

referencescheat-sheetsquick-referencecatalogscheckliststools

References & Quick Reference

Comprehensive collection of cheat sheets, quick references, catalogs, checklists, and comparison matrices for AI red teaming, covering attack techniques, defense bypasses, tools, frameworks, and compliance.

referencetoolscomparisonpyritgarakdeepteamautoredteamerharmbenchart

Automated Red Teaming Tools Comparison

Comprehensive comparison of automated AI red teaming tools including PyRIT, Garak, DeepTeam, AutoRedTeamer, HarmBench, and ART, with detailed capability matrices, strengths analysis, and use case recommendations.

referencetoolscomparisongarakpyritpromptfoo

Red Team Tool Comparison Matrix

Side-by-side comparison of AI red teaming tools -- Garak, PyRIT, promptfoo, Inspect AI, and HarmBench -- covering capabilities, use cases, and integration options.

toolscomparisongarakpyritpromptfooinspect-ai

Red Team Tool Comparison

Comparison of major AI red teaming tools -- Garak, PyRIT, promptfoo, and Inspect AI -- covering capabilities, strengths, limitations, and use cases.

tradecrafttoolsselectionmethodology

Tool Selection for AI Red Teaming

Framework for selecting and configuring tools for AI red team engagements based on target architecture, engagement scope, and team capabilities.

walkthroughstutorialstoolsmethodologydefenseplatformsengagements

Walkthroughs

Step-by-step guided walkthroughs covering red team tools, engagement methodology, defense implementation, platform-specific testing, and full engagement workflows.

langchainchainsagentstoolsragmemoryprompt-injectionwalkthrough

LangChain Application Security Testing

End-to-end walkthrough for security testing LangChain applications: chain enumeration, prompt injection through chains, tool and agent exploitation, retrieval augmented generation attacks, and memory manipulation.

benchmarkrunnertoolsdevelopmentwalkthroughs

Security Benchmark Runner Development

Build a benchmark runner for standardized evaluation of LLM security across models and configurations.

replaytoolsbuildingtoolattackwalkthroughs

Building an Attack Replay Tool

Build a tool that records and replays attack sequences for regression testing and defense validation.

customtoolsmutationwalkthroughsengine

Building a Custom Payload Mutation Engine

Develop a mutation engine that evolves prompt injection payloads through genetic algorithms.

garakcustomtoolsprobeswalkthroughs

Building Custom Garak Probes (Tool Walkthrough)

Develop custom probe modules for NVIDIA's Garak to test domain-specific vulnerabilities.

customtoolsbehaviorsharmbenchwalkthroughs

HarmBench Custom Behavior Sets

Extend HarmBench with custom behavior specifications for domain-specific vulnerability testing.

toolswalkthroughsgarakpyritpromptfooburp-suiteinspect-aiollamapython

Tool Walkthroughs

End-to-end practical walkthroughs for essential AI red teaming tools, covering installation, configuration, execution, and result interpretation.

llmtoolstrafficanalyzerwalkthroughs

Building an LLM Traffic Analyzer

Build a proxy-based LLM traffic analyzer for intercepting and analyzing API communications.

integrationtoolscicdpromptfoowalkthroughs

Promptfoo CI/CD Pipeline Integration

Integrate Promptfoo into CI/CD pipelines for continuous LLM security regression testing.

integrationtoolspyritscoringwalkthroughs

PyRIT Custom Scoring Integration

Integrate custom scoring metrics into PyRIT for organization-specific red team evaluation criteria.

walkthroughstoolsagent-scannerdevelopment

Agent Security Scanner Development

Build an automated security scanner for AI agent systems that tests tool use, memory, and workflow vulnerabilities.

walkthroughstoolsreport-generationautomation

Automated Red Team Report Generation

Build an automated system for generating structured red team reports from testing data and findings.

walkthroughstoolsattack-proxydevelopment

Building an LLM Attack Proxy

Build an intercepting proxy for LLM API traffic that enables payload injection, response analysis, and logging.

walkthroughstoolsbenchmarkingdefense

Defense Benchmarking Tool Development

Build a tool for benchmarking the effectiveness of defensive measures against standardized attack suites.

walkthroughstoolsembedding-attackstoolkit

Building an Embedding Attack Toolkit

Build a toolkit for conducting embedding-level attacks including inversion, poisoning, and manipulation.

walkthroughstoolsdataset-curationjailbreaks

Jailbreak Dataset Curation Tool

Build a tool for curating, organizing, and testing jailbreak prompt datasets with effectiveness tracking.

walkthroughstoolstraffic-analysismonitoring

LLM Traffic Analysis Tool

Build a tool for analyzing and visualizing LLM API traffic patterns to identify attack indicators.

walkthroughstoolsmcp-auditsecurity

MCP Security Audit Tool

Build a tool for auditing MCP server implementations for common security vulnerabilities and misconfigurations.

walkthroughstoolstest-harnessmulti-model

Multi-Model Test Harness Construction

Build a unified test harness for running attacks across OpenAI, Anthropic, Google, and local model endpoints.

walkthroughstoolsmutation-frameworkpayloads

Payload Mutation Framework Development

Build a payload mutation framework with grammar-based, semantic, and adversarial mutation strategies.

walkthroughstoolsrag-testingframework

RAG Security Testing Framework

Build a framework for systematic security testing of RAG applications including poisoning and exfiltration.