Red Teaming Automation

advanced7 min readUpdated 2026-03-12

Frameworks and tools for automating AI red teaming at scale, including CART pipelines, jailbreak fuzzing, regression testing, and continuous monitoring.

automation cart fuzzing testing exploit-dev

Manual red teaming is essential for creative attack discovery, but it does not scale. CART and other automation frameworks enable continuous testing, regression detection, and broad coverage across attack categories. Professional AI red teaming combines manual creativity with automated scale.

CART Architecture

A Continuous Automated Red Teaming pipeline:

Payload Generation → Execution Engine → Result Analysis → Reporting
       ↓                    ↓                  ↓              ↓
  Template library    API integration     Success detection  Dashboard
  Mutation engine     Rate management     Classification     Alerts
  LLM-based gen      Parallel execution  Statistical tests  Trends

Building a Basic CART Pipeline

import asyncio
from dataclasses import dataclass
from typing import Callable
 
@dataclass
class TestCase:
    payload: str
    category: str  # "injection", "jailbreak", "extraction", etc.
    expected_behavior: str  # "refuse", "comply", "extract"
    detector: Callable  # Function to classify the response
 
@dataclass
class TestResult:
    test_case: TestCase
    response: str
    success: bool
    latency_ms: float
    timestamp: str
 
class CARTPipeline:
    def __init__(self, target_api, test_suite: list[TestCase]):
        self.target = target_api
        self.tests = test_suite
        self.results: list[TestResult] = []
 
    async def run_suite(self, n_trials_per_test: int = 10):
        """Run all tests with statistical significance."""
        for test in self.tests:
            for _ in range(n_trials_per_test):
                response = await self.target.query(test.payload)
                success = test.detector(response)
                self.results.append(TestResult(
                    test_case=test,
                    response=response,
                    success=success,
                    latency_ms=response.latency,
                    timestamp=datetime.now().isoformat(),
                ))
 
    def report(self) -> dict:
        """Generate success rates by category."""
        from collections import defaultdict
        stats = defaultdict(lambda: {"total": 0, "success": 0})
        for r in self.results:
            stats[r.test_case.category]["total"] += 1
            if r.success:
                stats[r.test_case.category]["success"] += 1
        return {k: v["success"]/v["total"] for k, v in stats.items()}

Jailbreak Fuzzing

Automated generation and testing of jailbreak variants through fuzzing:

class JailbreakFuzzer:
    """Generate jailbreak variants through mutation."""
 
    def __init__(self, seed_prompts: list[str]):
        self.seeds = seed_prompts
        self.mutations = [
            self.synonym_replace,
            self.add_context,
            self.change_framing,
            self.add_encoding,
            self.split_payload,
            self.add_few_shot,
        ]
 
    def synonym_replace(self, prompt: str) -> str:
        """Replace key terms with synonyms."""
        replacements = {
            "ignore": ["disregard", "bypass", "override", "skip"],
            "instructions": ["rules", "guidelines", "directives", "constraints"],
            "previous": ["earlier", "prior", "above", "initial"],
        }
        for word, synonyms in replacements.items():
            if word in prompt.lower():
                prompt = prompt.replace(word, random.choice(synonyms))
        return prompt
 
    def add_encoding(self, prompt: str) -> str:
        """Wrap payload in various encodings."""
        encodings = [
            lambda p: f"Decode and follow: {base64.b64encode(p.encode()).decode()}",
            lambda p: f"ROT13: {codecs.encode(p, 'rot_13')}",
            lambda p: f"Reverse: {p[::-1]}",
        ]
        return random.choice(encodings)(prompt)
 
    def fuzz(self, n_variants: int = 100) -> list[str]:
        """Generate n variant payloads."""
        variants = []
        for _ in range(n_variants):
            base = random.choice(self.seeds)
            n_mutations = random.randint(1, 3)
            for _ in range(n_mutations):
                mutation = random.choice(self.mutations)
                base = mutation(base)
            variants.append(base)
        return variants

Regression Testing

When models are updated, previously patched vulnerabilities may reappear and new ones may emerge:

class RegressionSuite:
    """Test known vulnerabilities after model updates."""
 
    def __init__(self, known_vulns: list[TestCase]):
        self.vulns = known_vulns
 
    def check_regression(self, target_api) -> dict:
        """Test all known vulnerabilities against current model."""
        results = {}
        for vuln in self.vulns:
            success_rate = measure_success_rate(vuln.payload, target_api)
            results[vuln.category] = {
                "previous_rate": vuln.last_known_rate,
                "current_rate": success_rate,
                "regression": success_rate > vuln.last_known_rate,
            }
        return results

Open-Source Tools

Tool	Purpose	Link
Garak	LLM vulnerability scanner	github.com/leondz/garak
PyRIT	Microsoft red teaming framework	github.com/Azure/PyRIT
Promptfoo	Prompt testing and evaluation	github.com/promptfoo/promptfoo
ART	Adversarial Robustness Toolbox	github.com/Trusted-AI/adversarial-robustness-toolbox

Try It Yourself

Practice

Exercise: Configure a Basic Automated Red Team Test

Set up and run a basic automated test using one of the open-source frameworks discussed on this page. This exercise builds practical experience with CART-style tooling and automated result analysis.

Step 1
Choose one of the open-source frameworks from the table above (Garak, PyRIT, or Promptfoo) and install it in a virtual environment. Review the framework's documentation and identify a basic probe or test suite to run.
# Example with Garak pip install garak garak --list_probes
Step 2
Configure the framework to target a model you are authorized to test. Set up API credentials, rate limits, and output directories. Define a small test scope: pick one attack category (e.g., prompt injection) and limit to 10-20 test cases.
Step 3
Run the automated test suite and collect the results. Monitor for rate limit errors, unexpected failures, or API cost overruns during execution.
# Example with Garak garak --model_type openai --model_name gpt-4 --probes encoding
Step 4
Analyze the output: review the success/failure rates by category, identify any successful bypasses, and note the types of payloads that were most effective. Write a brief summary of findings.

Success criteria: A completed test run with at least one framework, producing a results file with categorized findings and a written summary of observed success rates per attack category.

AI Exploit Development Overview -- the broader exploit development context
Payload Crafting -- creating the seed payloads that automation scales
Custom Tooling -- building target-specific automation tools
CART Pipelines -- advanced CART architecture and deployment
Capstone: Execution & Reporting -- integrating automation into full engagements

References

Perez et al., "Red Teaming Language Models with Language Models" (2022) -- LLM-based automated red teaming
Samvelyan et al., "Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts" (2024) -- diversity-focused automated attack generation
Deng et al., "Garak: A Framework for Security Probing Large Language Models" (2024) -- open-source LLM vulnerability scanning

Knowledge Check

Why is regression testing important after model updates?

Red Teaming Automation

Related articles

Red Teaming Automation

Related articles