# labs
421 articlestagged with “labs”
Skill Verification Overview
Overview of timed skill verification labs for AI red teaming, including format, pass/fail criteria, and preparation guidance.
Advanced AI Red Team Labs
Advanced hands-on labs covering automated attack algorithms, production red team tooling, multi-model testing, and sophisticated exploitation techniques for experienced AI security practitioners.
Advanced RAG Manipulation Lab
Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.
Adversarial Robustness Evaluation
Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.
Adversarial Suffix Optimization (GCG)
Implement the Greedy Coordinate Gradient attack to generate adversarial suffixes.
Agent Tool Chain Exploitation
Chain multiple tool calls in an agent system to achieve multi-step exploitation.
Lab: Alignment Boundary Probing
Systematically probe alignment boundaries to map the exact thresholds where safety training engages and disengages.
AutoDAN Implementation Lab
Implement the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.
Automated Defense Evaluation Framework
Build an automated framework to evaluate defensive measures across attack categories.
Lab: Building an Automated Red Team Pipeline
Build a complete automated red teaming pipeline with attack generation, execution, scoring, and reporting.
Lab: Backdoor Persistence Through Safety Training
Test whether fine-tuned backdoors persist through subsequent safety training rounds and RLHF alignment.
Lab: Constitutional AI Bypass Techniques
Test and bypass Constitutional AI safety mechanisms by exploiting the critique-revision training methodology.
Lab: Cross-Lingual Transfer Attacks
Develop attacks in one language that transfer to others by exploiting shared multilingual representation spaces.
Cross-Model Transfer Attacks
Develop attacks on open-source models that transfer to closed-source commercial APIs.
Cross-Plugin Exploitation Chain
Chain exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.
Lab: Training Curriculum Poisoning
Exploit training data ordering and curriculum learning to amplify the impact of small numbers of poisoned examples.
Custom Safety Classifier Training
Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.
Building a Custom Red Team Harness
Build a complete red team testing harness with parallel execution, logging, and scoring.
Defense vs Red Team War Game
Alternate between attacking and defending an LLM application to develop skills in both offensive and defensive operations.
Differential Testing Across Models
Use differential testing to find behavior inconsistencies across model providers.
Lab: Distributed Training Attack Simulation
Simulate attacks on distributed training infrastructure including gradient poisoning and aggregation manipulation.
Embedding Adversarial Perturbation
Craft adversarial inputs that produce target embeddings for retrieval manipulation.
Lab: Evaluation Framework Gaming
Demonstrate how to game safety evaluation frameworks to produce artificially high safety scores while retaining vulnerabilities.
Lab: Federated Learning Poisoning Attacks
Execute model poisoning attacks in a federated learning simulation by manipulating local model updates.
Fine-Tuning Backdoor Insertion
Insert a triggered backdoor during fine-tuning that activates on specific input patterns.
Lab: Gradient-Based Attack Implementation
Implement gradient-based adversarial attacks on open-weight models including PGD, FGSM adapted for text inputs.
Gradient-Guided Data Poisoning
Use gradient information from open-source models to craft optimally poisoned training examples.
Guardrail Bypass Ensemble
Build an ensemble attack system that combines multiple bypass techniques for robustness.
Lab: Inference-Time Compute Exploitation
Exploit inference-time compute scaling to manipulate reasoning depth and resource consumption in thinking models.
Lab: Jailbreak Transferability Analysis
Analyze jailbreak transferability across model families to discover universal vulnerability patterns.
Multi-Hop MCP Server Exploitation
Chain exploits across multiple MCP servers to achieve lateral movement in agent systems.
Lab: Long-Term Memory Extraction
Extract information from long-term agent memory stores through crafted queries and memory retrieval manipulation.
Model Distillation Security Lab
Extract model capabilities through distillation techniques using only black-box API access.
Lab: Model Merging Security Analysis
Analyze security implications of model merging techniques and test for backdoor propagation through merged model weights.
Lab: Model Steering with Activation Vectors
Use activation steering vectors to control model behavior without prompt modification for security testing.
Lab: Multi-Modal Attack Chain Construction
Construct attack chains that span text, image, and audio modalities to exploit cross-modal processing gaps.
Multi-Objective Attack Optimization
Optimize attack payloads for multiple simultaneous objectives: jailbreaking, data extraction, and defense evasion.
Multi-Turn Crescendo Attack
Implement Microsoft's Crescendo attack pattern for gradual multi-turn jailbreaking.
Multimodal Attack Chain Lab
Chain attacks across text, image, and structured data modalities to exploit multimodal system vulnerabilities.
Multimodal Image Injection
Embed adversarial text in images that triggers prompt injection in vision-language models.
Lab: Neural Network Trojan Detection
Implement and test neural network trojan detection methods including activation clustering and spectral analysis.
Output Constraint Exploitation
Exploit structured output requirements to force models into revealing restricted information.
Implementing PAIR Automated Attacks
Implement the PAIR (Prompt Automatic Iterative Refinement) attack for automated jailbreaking.
Production Environment Simulation Lab
Test attacks against a simulated production environment with realistic logging, monitoring, and alerting.
Lab: Building a Prompt Injection Compiler
Build a compiler that transforms high-level attack specifications into optimized prompt injection payloads.
Prompt Leakage via Log Probabilities
Extract system prompt information using token log probability analysis.
Lab: Quantization Security Testing
Test behavioral differences between full-precision and quantized models to discover quantization-induced vulnerabilities.
Advanced RAG Poisoning Techniques
Execute sophisticated RAG poisoning including gradient-guided document crafting.
Lab: Reasoning Trace Manipulation
Manipulate chain-of-thought reasoning traces to inject false premises and redirect model conclusions.
Lab: Advanced Red Team Orchestration
Build an advanced red team orchestration system that coordinates multiple attack agents against a defended target.
Lab: Representation Engineering for Security
Use representation engineering to analyze and manipulate internal model representations for security research.
Representation Probing for Vulnerabilities
Probe internal model representations to identify exploitable features and develop representation-level attacks.
Lab: Advanced Reward Hacking Techniques
Develop reward hacking exploits that manipulate RLHF reward models through adversarial optimization.
Reward Model Analysis Lab
Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.
Reward Model Gaming (Advanced Lab)
Craft inputs that score highly on reward models while containing adversarial content.
Lab: RLHF Training Exploitation
Exploit reward model weaknesses to craft inputs that score highly on safety metrics while achieving adversarial objectives.
Lab: Safety Classifier Model Extraction
Extract the decision boundary of safety classifiers through systematic probing to craft maximally evasive payloads.
Safety Classifier Inversion
Reverse-engineer safety classifier decision boundaries through systematic probing.
Lab: Sparse Attention Exploitation
Exploit sparse attention patterns in long-context models to hide injection payloads in low-attention regions.
Lab: Model Supply Chain Poisoning
Simulate model supply chain attacks by injecting backdoors into model weights distributed through public registries.
Tree of Attacks (TAP) Implementation
Build a TAP attack system that generates and prunes jailbreak attempts using tree search.
Lab: Token-Level Adversarial Optimization
Implement token-level adversarial optimization to discover minimal perturbations that bypass safety training.
Lab: Tool Chain Lateral Movement
Chain tool call exploits to achieve lateral movement across services connected to LLM agent systems.
A2A Protocol Task Injection
Inject malicious tasks into Agent-to-Agent protocol communication channels to redirect multi-agent workflows.
Agent Workflow State Tampering
Manipulate agent workflow state machines to skip validation steps and reach privileged execution paths.
Attention Pattern Manipulation
Craft inputs that manipulate transformer attention patterns to prioritize adversarial content over safety instructions.
Audio Injection via Speech-to-Text Models
Craft adversarial audio that embeds prompt injection payloads when transcribed by speech-to-text models.
AutoDAN Implementation and Testing
Implement the AutoDAN methodology for generating human-readable stealthy jailbreak prompts using gradient guidance.
Constitutional Classifier Bypass
Develop techniques to bypass Anthropic-style constitutional classifiers through adversarial input crafting.
Cross-Model GCG Transfer Attacks
Generate adversarial suffixes on open-source models and test their transferability to commercial APIs.
Building a Red Team Results Dashboard
Build a real-time dashboard for tracking and visualizing red team campaign results across targets and techniques.
Differential Model Behavior Analysis
Use differential testing across model versions and providers to discover inconsistent safety behaviors.
Embedding Inversion Attack Implementation
Implement embedding inversion to recover original text from vector database embeddings.
Fine-Tuning Alignment Removal Attack
Use fine-tuning API access to systematically remove safety alignment with minimal training examples.
GUI Agent Visual Injection Attack
Inject adversarial content into screenshots and UI elements processed by computer-use AI agents.
HarmBench Custom Attack Submission
Develop and evaluate custom attack methods against the HarmBench standardized evaluation framework.
LangChain CVE Exploitation Lab
Reproduce and analyze LangChain CVEs including CVE-2023-29374 and CVE-2023-36258 in a safe lab environment.
LoRA Backdoor Insertion Attack
Insert triggered backdoors through LoRA fine-tuning that activate on specific input patterns while passing safety evals.
MCP Multi-Server Pivot Chain
Chain exploits across multiple MCP servers to achieve lateral movement and capability escalation in agent systems.
Agent Memory Injection for Persistent Access
Inject persistent instructions into agent memory systems that survive across conversation sessions.
Multi-Agent Trust Boundary Exploitation
Exploit trust boundaries between cooperating agents to escalate privileges and access restricted capabilities.
Multi-Modal Attack Chain Orchestration
Orchestrate attacks across text, image, and document modalities to bypass per-modality safety filters.
Promptfoo Red Team Test Suite Development
Build comprehensive red team test suites in Promptfoo with custom graders and multi-model targeting.
Quantization-Induced Safety Regression Testing
Test how model quantization (INT8, INT4, GPTQ) degrades safety alignment and introduces exploitable gaps.
RAG Access Control Bypass
Bypass document-level access controls in enterprise RAG systems through query manipulation and context injection.
Reasoning Trace Exploitation in CoT Models
Exploit visible chain-of-thought reasoning traces in models like o1 and DeepSeek-R1 to manipulate outputs.
Representation Probing for Vulnerability Discovery
Probe model internal representations to discover exploitable features and latent vulnerability patterns.
Adversarial Reward Model Exploitation
Craft inputs that exploit reward model weaknesses to achieve high safety scores while containing harmful content.
Safety Classifier Reverse Engineering
Reverse-engineer a safety classifier's decision boundaries through systematic adversarial probing.
Code Agent Sandbox Escape Techniques
Develop and test sandbox escape techniques against code execution environments in AI coding assistants.
Steganographic Payload Embedding in Images
Hide prompt injection payloads in images using steganographic techniques undetectable to human observers.
Training Data Extraction from Production LLMs
Implement Carlini et al.'s techniques to extract memorized training data from production language model APIs.
LLM Watermark Detection and Removal
Detect and remove statistical watermarks from LLM-generated text while preserving content quality.
Lab: LLM Watermark Detection and Removal
Detect and analyze LLM text watermarks using statistical methods and test watermark removal through paraphrasing.
Getting Started with AI Red Teaming Labs
Introduction to the hands-on lab environment for AI red teaming. Learn what you'll need, how labs are structured, and get an overview of all beginner exercises.
Lab: API Authentication Security Testing
Test LLM API authentication mechanisms for vulnerabilities including token leakage, replay attacks, and scope bypass.
Multi-Provider API Exploration
Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.
API Rate Limit and Error Handling
Test LLM API rate limits and implement proper error handling for automated testing workflows.
Lab: API Response Parsing and Analysis
Parse and analyze LLM API responses to identify anomalies, safety filter triggers, and information leakage patterns.
Lab: Setting Up Automated LLM Testing
Configure automated testing pipelines for LLM red teaming using Python scripts, test runners, and result collection.
Lab: Crafting Basic Adversarial Examples
Create basic adversarial examples that cause LLMs to misclassify, misinterpret, or bypass safety checks on text input.
Basic Automated Testing Setup
Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.
Basic Classifier Evasion
Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.
Basic Data Exfiltration Techniques
Extract sensitive information from LLM applications using social engineering and misdirection.
Basic Defense Mechanism Testing
Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.
Lab: Introduction to LLM Fuzzing
Learn the basics of fuzzing LLM inputs by mutating prompts to discover unexpected behaviors and edge cases.
Lab: Jailbreak Technique Taxonomy
Explore the major categories of jailbreak techniques and practice classifying attack payloads by technique type.
Lab: Basic RAG System Security Testing
Introduction to testing Retrieval-Augmented Generation systems for document injection and retrieval manipulation vulnerabilities.
Security Finding Documentation Exercise
Practice documenting security findings in a professional format with reproducible steps and impact assessment.
Basic Role-Play Prompt Injection
Use role-play techniques to bypass basic LLM safety guardrails through character assumption.
Lab: Social Engineering LLM Applications
Practice social engineering techniques adapted for LLM applications including authority escalation and urgency injection.
Lab: Content Policy Boundary Mapping
Map the content policy boundaries of LLM applications to identify permitted and restricted topic areas systematically.
Basic Context Window Overflow
Fill the context window with padding content to push safety instructions out of the attention window.
Conversation History Analysis
Analyze how conversation history influences model behavior and identify opportunities for context manipulation.
Lab: Conversation History Manipulation
Test how LLM applications handle conversation history including truncation, injection, and context window management.
Identifying LLM Defenses
Map the defensive layers of an LLM application through systematic probing and error analysis.
Lab: Introduction to Defense Testing
Learn to systematically test LLM application defenses by probing input filters, output validators, and guardrails.
Delimiter Escape Techniques
Practice escaping common delimiters used to separate system prompts from user input.
Embedding Basics for Security
Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.
Lab: Embedding Fundamentals for Red Teamers
Learn embedding fundamentals including vector similarity, semantic search, and how embeddings enable RAG systems.
Encoding and Obfuscation Basics
Explore how Base64, ROT13, and Unicode encoding can bypass input filters in LLM applications.
Lab: Testing Environment Hardening
Secure your AI red team testing environment to prevent credential leaks, accidental production testing, and data exposure.
Lab: Error Message Analysis for Recon
Analyze LLM application error messages to extract information about system architecture, models, and defensive layers.
Error Message Intelligence Gathering
Extract system architecture information from error messages and response patterns in LLM applications.
Lab: Evidence Collection for LLM Testing
Set up evidence collection workflows including request/response logging, screenshots, and reproducibility documentation.
Few-Shot Injection Fundamentals
Craft few-shot examples that prime the model to follow attacker instructions in subsequent turns.
Your First Claude API Call
Set up the Anthropic SDK and make your first Claude API call with system prompts and messages.
Your First LLM API Call with OpenAI
Set up your Python environment and make your first LLM API call to understand request/response patterns.
Lab: Format String Injection in LLMs
Practice injecting format strings and template directives to manipulate LLM output structure and content.
Your First Garak Vulnerability Scan
Install and run NVIDIA's Garak tool to perform automated vulnerability scanning on an LLM endpoint.
Lab: Hallucination Detection Basics
Learn to detect and trigger hallucinations in LLM outputs including factual errors, fabricated citations, and invented APIs.
Hello World Prompt Injection
Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.
Lab: Injection Attempt Log Analysis
Build a logging pipeline to capture and analyze injection attempts for pattern discovery and technique refinement.
Instruction Hierarchy Testing
Test how models prioritize conflicting instructions between system, user, and assistant roles.
Temperature and Sampling Effects on Jailbreaks
Experiment with temperature and top-p settings to understand their impact on jailbreak success rates.
JSON Output Exploitation Basics
Exploit LLM JSON mode to bypass content filters and extract information through structured output channels.
Lab: JSON Output Mode Security Testing
Test JSON output mode for injection vulnerabilities including schema manipulation and structured output exploitation.
Lab: Multi-Language Prompt Testing
Test LLM safety boundaries across multiple languages to identify inconsistencies in safety training coverage.
Lab: Local Model Setup for Testing
Set up local LLM instances using Ollama for safe, cost-free red team testing without API costs or rate limits.
Model Security Comparison Lab
Compare the security posture of different LLM models by running identical test suites across providers.
Lab: Basic Model Fingerprinting
Learn to identify which LLM model powers an application through behavioral fingerprinting and response analysis.
Multi-Turn Conversation Probing
Use multi-turn conversations to gradually escalate requests and probe LLM safety boundaries.
Lab: Multimodal Input Testing Basics
Introduction to testing multimodal LLMs with image and text inputs to understand cross-modal behavior.
Lab: Detecting Output Filters
Identify and characterize output filtering mechanisms in LLM applications through systematic response analysis.
Output Format Control Lab
Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.
Output Format Manipulation
Force LLMs to output in specific formats (JSON, XML, code) to bypass output sanitization.
Setting Up Payload Logging
Build a payload logging system to track prompt injection attempts and model responses.
Lab: LLM Playground Exploration
Explore LLM provider playgrounds to understand model behavior, parameter effects, and interactive testing workflows.
LLM Playground Security Testing
Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.
Lab: Testing Prompt Leaking Defenses
Test various prompt leaking defense configurations to evaluate their effectiveness against extraction techniques.
Prompt Structure Analysis Lab
Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.
Lab: Prompt Template Vulnerability Testing
Test common prompt template patterns for injection vulnerabilities including string interpolation and placeholder abuse.
Setting Up Promptfoo for LLM Evaluation
Configure Promptfoo to create automated test suites for evaluating LLM safety and robustness.
Lab: Rate Limit Enumeration and Bypass
Enumerate API rate limits and test common bypass techniques including header manipulation and request distribution.
Rate Limit and Quota Mapping
Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.
Lab: Analyzing Model Refusal Patterns
Systematically map model refusal patterns to understand safety training boundaries and identify potential bypasses.
Lab: Red Team Report Writing Basics
Practice writing clear, actionable red team findings reports with evidence, risk ratings, and remediation guidance.
Response Analysis Fundamentals
Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.
Lab: Response Consistency Testing
Test LLM response consistency across repeated queries to identify non-deterministic safety boundary behavior.
Lab: Running Safety Benchmarks
Run standardized safety benchmarks against LLM models to establish baseline safety profiles for comparison.
Safety Boundary Mapping
Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.
Lab: Introduction to Safety Testing
Learn the fundamentals of LLM safety testing including test case design, baseline measurement, and result documentation.
Safety Training Boundary Probing
Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.
Lab: Vulnerability Scoring Fundamentals
Learn vulnerability scoring frameworks adapted for LLM systems including severity, exploitability, and impact assessment.
Lab: Simple Payload Encoding Techniques
Practice encoding injection payloads using Base64, hex, URL encoding, and Unicode to bypass basic input filters.
Lab: System Prompt Enumeration Techniques
Practice systematic approaches to enumerating hidden system prompt content through varied probing strategies.
System Prompt Extraction Fundamentals
Learn to extract hidden system prompts from LLM applications using direct and indirect techniques.
Lab: Temperature and Sampling Security Effects
Explore how temperature and sampling parameters affect model susceptibility to jailbreaks and safety boundary enforcement.
Lab: Designing LLM Red Team Test Cases
Design effective red team test cases with clear objectives, success criteria, and reproducible execution procedures.
Token Counting and Estimation
Understand tokenization by counting and estimating tokens across different models and encoders.
Lab: Token Manipulation Basics
Understand how tokenization affects LLM behavior and practice basic token-level manipulation techniques.
Lab: Comparing Red Team Testing Tools
Compare Garak, PyRIT, and Promptfoo capabilities through hands-on exercises using each tool against the same target.
Red Team Tool Installation and Configuration
Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.
API Key Scope and Permission Testing
Test API key scoping and permission boundaries to identify over-privileged access configurations.
API Response Header Analysis
Analyze HTTP response headers from LLM APIs to fingerprint providers, versions, and middleware.
Basic Indirect Prompt Injection
Plant and trigger a basic indirect prompt injection payload in content consumed by an LLM.
Basic Payload Mutation Techniques
Learn fundamental payload mutation strategies including synonym replacement and structural transforms.
Basic RAG Query Injection
Craft user queries that manipulate RAG retrieval to surface unintended documents.
Basic Rate Limit Abuse Patterns
Test common rate-limit bypass patterns including header manipulation and endpoint discovery.
Character Encoding Bypass Techniques
Bypass input filters using Unicode normalization, homoglyph substitution, and mixed-script encoding.
Chatbot Persona and Capability Mapping
Map the configured persona, capabilities, and restrictions of an LLM chatbot through systematic probing.
Completion Hijacking Fundamentals
Craft partial sentences that steer model completions toward attacker-desired outputs.
Conversation Reset Attacks
Exploit conversation resets and context clearing to weaken model adherence to safety instructions.
Emoji and Unicode Injection Techniques
Use emoji sequences and Unicode special characters to bypass text-based input filters.
Error Message Exploitation
Trigger and analyze error messages from LLM applications to reveal architecture and configuration details.
Your First HarmBench Evaluation
Run a standardized safety evaluation using the HarmBench framework against a target model.
Your First Inspect AI Evaluation
Set up and run a basic AI safety evaluation using the UK AISI Inspect framework.
JSON Injection Basics
Inject adversarial content through JSON-formatted inputs to exploit structured data processing.
Your First LLM Guard Scan
Install and configure LLM Guard to scan inputs and outputs, understanding detection capabilities and gaps.
Markdown Rendering Exfiltration
Exploit Markdown rendering in chat UIs to exfiltrate data through image tags and links.
Model Fingerprinting Basics
Identify which LLM model powers an application through behavioral fingerprinting techniques.
Introduction to NeMo Guardrails
Set up NVIDIA NeMo Guardrails and create your first Colang-based guardrail for an LLM application.
Prompt Injection via File Names
Embed prompt injection payloads in filenames and metadata of uploaded documents.
Prompt Leaking via Summarization Requests
Use summarization requests to trick an LLM into revealing its hidden system prompt through indirect disclosure.
Analyzing Refusal Messages for Intel
Extract useful information about model configuration and guardrails by systematically analyzing refusal messages.
Response Timing Side-Channel Analysis
Use response timing differences to infer information about model processing and guardrail activation.
Safety Boundary Mapping Exercise
Systematically map the safety boundaries of an LLM application across multiple topic categories.
Simple Output Constraint Attacks
Force models to output in constrained formats that bypass output safety filters.
System Prompt Reconstruction from Clues
Piece together a hidden system prompt by collecting partial information from multiple probe techniques.
System Prompt Extraction via Error Injection
Trigger specific error conditions that cause the model to inadvertently reveal system prompt contents.
Temperature and Top-K Effects on Safety
Systematically test how temperature, top-k, and top-p parameters affect safety guardrail effectiveness.
Prompt Injection via Translation
Exploit LLM translation capabilities to smuggle instructions through language boundaries.
XML Injection in LLM Contexts
Exploit XML tag handling in LLM applications to manipulate instruction parsing.
Agent Heist: Level 2 — MCP Server Takeover
Take control of an MCP-enabled agent by poisoning tool descriptions and chaining exploits.
Alignment Breaker: Level 2 — Safety Fine-Tuned Model
Bypass safety fine-tuning on a model with RLHF, constitutional AI, and classifier-based defenses.
Cloud Infiltrator Challenge
Navigate through cloud AI service misconfigurations to access a protected model endpoint and extract its secrets.
Context Heist: System Prompt Extraction Championship
Extract system prompts from 10 progressively harder LLM applications using diverse techniques.
Data Heist Challenge
Extract a secret database of customer records from a RAG-enabled chatbot with strict data access controls.
Defense Destroyer Challenge
Bypass a state-of-the-art multi-layer defense system to extract a protected secret from the model.
Defense Gauntlet: Level 2 — Layered Security
Bypass a layered defense system with input classification, guardrails, and output filtering.
Encoding Maestro Challenge
Solve 10 encoding puzzles that require creative use of Base64, Unicode, ROT13, and custom encodings to bypass filters.
Forensics Detective Challenge
Analyze logs and artifacts from an AI security incident to reconstruct the attack chain and identify the attacker's technique.
Guardrail Olympics: Multi-Framework Bypass
Bypass guardrail implementations from NeMo, LLM Guard, Rebuff, and custom classifiers.
Jailbreak Gauntlet: Level 2 — Adaptive Defenses
Break through adaptive defense systems that learn from your previous attack attempts.
Model Whisperer Challenge
Extract the exact system prompt from 8 progressively harder LLM applications using creative extraction techniques.
Prompt Smuggler Challenge
Smuggle a specific payload through 5 layers of increasingly sophisticated input filtering to capture the flag.
RAG Infiltrator: Level 2 — Enterprise Knowledge Base
Infiltrate and exfiltrate data from a multi-tier enterprise RAG system with access controls.
Shadow Agent Challenge
Take covert control of a multi-agent system by poisoning inter-agent communication without triggering monitors.
Supply Chain Saboteur: Level 2 — Model Registry
Compromise a model registry to inject backdoored model weights into a deployment pipeline.
Time Traveler Challenge
Exploit conversation memory and session management to manipulate the temporal context of an AI assistant.
Token Wars: Adversarial Optimization Challenge
Optimize adversarial tokens to achieve target model behaviors under strict character limits.
Tool Master Challenge
Exploit the tool-use capabilities of an agent system to access unauthorized functions and capture hidden flags.
Agent Maze Runner: Multi-Tool Navigation
Navigate a maze of agent tools, each with unique vulnerabilities, to reach and exfiltrate a hidden flag.
AI Wargame: Attacker vs Defender Real-Time
Compete in a real-time wargame where you attack an LLM system while an AI defender adapts to your techniques.
Classifier Gauntlet: 10-Stage Evasion
Bypass 10 progressively harder input classifiers using different evasion techniques at each stage.
Data Heist: Multi-Channel Exfiltration
Exfiltrate a secret key through a heavily monitored LLM system using covert exfiltration channels.
Defense Builder: Construct and Survive
Build defensive guardrails for an LLM application and survive 100 automated attack rounds.
AI Escape Room: Agent Breakout Challenge
Break an AI agent out of its sandboxed environment by chaining tool-use vulnerabilities and injection techniques.
Function Calling Fortress Breach
Bypass function calling restrictions to invoke unauthorized tools and extract a flag from a sandboxed agent.
Guardrail Speedrun: Fastest Bypass Challenge
Bypass 5 different guardrail implementations as fast as possible in a timed competition format.
Invisible Ink: Hidden Payload Extraction Challenge
Extract a hidden flag by decoding multiple layers of obfuscation in model responses across encoded channels.
MCP Server Takeover Challenge
Compromise an MCP server through tool description injection and use it to pivot to other connected services.
Multimodal Cipher: Cross-Modal Decryption
Decode a flag split across text, image, and audio inputs processed by a multimodal AI system.
Polyglot Injection Challenge
Craft a single injection payload that works across 5 different languages and 3 different model providers.
Prompt Archaeology: Deep System Prompt Recovery
Recover fragments of a deeply hidden, multi-layer system prompt using various extraction techniques.
RAG Treasure Hunt: Vector DB Infiltration
Infiltrate a vector database by crafting documents that surface for specific queries to reveal flag fragments.
Reasoning Chain Hijack: CoT Exploitation CTF
Hijack a reasoning model's chain-of-thought to redirect its conclusions while maintaining coherent reasoning.
Red vs Blue Tournament Challenge
Alternate between attacking and defending LLM systems in a scored tournament format.
Stealth Exfiltration Challenge: Zero Detection
Exfiltrate data from a monitored LLM system without triggering any of the deployed anomaly detectors.
Supply Chain Detective: Find the Backdoor
Analyze a model pipeline to find where a backdoor was inserted — data, training, or post-processing.
Time Bomb Defusal: Sleeper Agent Detection
Detect and neutralize a sleeper agent behavior trigger hidden in a fine-tuned model before it activates.
Token Smuggler Extreme: Adversarial Token Crafting
Craft adversarial token sequences under extreme character limits to achieve target model behaviors.
Activation Steering for Exploitation
Use activation steering techniques to manipulate model behavior at the representation level without prompt modification.
Backdoor Detection Evasion
Insert backdoors into fine-tuned models that evade state-of-the-art detection methods.
Cross-Architecture Transfer Lab
Develop attacks on one model architecture that reliably transfer to fundamentally different architectures.
Emergent Behavior Exploitation
Identify and exploit emergent behaviors in frontier models that arise from scale and are not present in smaller models.
Formal Verification Limits Lab
Explore the limits of formal verification for neural network behavior and identify gaps exploitable by adversaries.
Interpretability-Guided Attack Design
Use mechanistic interpretability to identify exploitable circuits and design targeted attacks.
Model Extraction via API Access
Extract a functionally equivalent model using only API query access.
MoE Routing Exploitation Lab
Exploit Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.
Neural Backdoor Engineering
Engineer sophisticated neural backdoors that activate on specific trigger patterns while evading detection methods.
Quantization-Induced Vulnerability Exploitation
Exploit behavioral differences between full-precision and quantized models.
Reasoning Model Chain-of-Thought Exploitation
Exploit extended thinking and chain-of-thought reasoning in models like o1 and DeepSeek-R1.
Representation Engineering Attacks
Manipulate internal model representations to alter behavior without prompt modification.
End-to-End Training Time Attacks
Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.
Activation Patching for Targeted Behavior Modification
Use activation patching techniques to modify specific model behaviors without retraining or fine-tuning.
Alignment Faking Detection Lab
Implement detection methods for alignment faking behaviors where models behave differently during evaluation versus deployment.
Continuous Learning Pipeline Poisoning
Exploit continuous/online learning pipelines by injecting adversarial examples that gradually shift model behavior.
Deceptive Alignment Testing Framework
Build a testing framework for detecting mesa-optimization and deceptive alignment in fine-tuned models.
Emergent Capability Security Probing
Design evaluations that discover security-relevant emergent capabilities in frontier language models.
Federated Learning Poisoning Attack
Execute model poisoning attacks in a federated learning setting through adversarial participant manipulation.
Formal Verification of LLM Safety Properties
Apply formal verification methods to prove or disprove safety properties of LLM-based applications.
Gradient-Based Universal Prompt Search
Implement gradient-based optimization to search for universal adversarial prompts that transfer across contexts.
KV Cache Manipulation Attacks
Manipulate key-value caches in inference servers to inject or alter cached context across requests.
Mechanistic Interpretability for Exploit Discovery
Use mechanistic interpretability tools to discover exploitable circuits and features in transformer models.
Membership Inference Against Production LLMs
Implement membership inference attacks to determine whether specific data was used in training an LLM.
Model Extraction via Knowledge Distillation
Extract a functionally equivalent model from a commercial API using systematic distillation queries.
Model Merging Backdoor Propagation
Demonstrate how backdoors propagate through model merging techniques like TIES, DARE, and spherical interpolation.
MoE Routing Exploitation
Exploit Mixture-of-Experts routing mechanisms to activate specific expert networks for adversarial purposes.
Multi-Agent Consensus Poisoning
Poison multi-agent consensus mechanisms to manipulate collective decision-making in agent swarms.
Representation Engineering for Behavior Steering
Use representation engineering to steer model behavior by manipulating activation vectors during inference.
Reward Hacking in RLHF Pipelines
Craft adversarial training examples that exploit reward model weaknesses in RLHF training pipelines.
Minimal-Data Safety Fine-Tuning Removal
Determine the minimum number of fine-tuning examples needed to remove safety alignment from various model families.
Speculative Decoding Side-Channel Attacks
Exploit speculative decoding implementations to extract information about draft and verifier model behavior.
Token Probability System Prompt Reconstruction
Reconstruct complete system prompts using token-by-token probability analysis through API access.
LLM Watermark Removal Attacks
Develop techniques to remove or corrupt watermarks embedded in LLM-generated text.
Labs & Hands-On Practice
Progressive hands-on lab exercises spanning beginner to expert difficulty, including CTF challenges and full engagement simulations for AI red teaming skill development.
Intermediate AI Red Team Labs
Overview of intermediate-level hands-on labs covering agent exploitation, RAG poisoning, multi-turn attacks, automated campaigns, and advanced injection techniques for AI red teaming.
Advanced Encoding Chain Attacks
Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
Lab: Advanced Function Calling Exploitation
Exploit advanced function calling patterns including nested calls, parallel execution, and schema manipulation.
Lab: Automated Adversarial Prompt Generation
Build automated systems for generating adversarial prompts using mutation, crossover, and selection strategies.
Agent Memory Manipulation
Exploit persistent memory in LLM agents to plant false context that persists across sessions.
Lab: Agent Workflow Security Testing
Test multi-step agent workflows for injection points, state manipulation, and unauthorized action execution.
API Abuse Pattern Lab
Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.
API Parameter Fuzzing for LLMs
Systematically fuzz LLM API parameters to discover unexpected behaviors and bypasses.
Lab: API Versioning Security Gaps
Exploit version differences in LLM APIs to find deprecated endpoints and backward-compatible safety bypasses.
Lab: Canary Token Effectiveness Testing
Deploy and test canary tokens in system prompts to evaluate their effectiveness at detecting prompt extraction.
Lab: Chain-of-Thought Exploitation
Exploit chain-of-thought reasoning to leak internal model reasoning, bypass safety filters, and manipulate decision processes.
Lab: CI/CD Pipeline Security for LLM Apps
Test CI/CD pipeline security for LLM applications including prompt template injection, model artifact tampering, and deployment attacks.
Lab: Advanced Classifier Evasion
Evade advanced input classifiers using adversarial perturbation, semantic paraphrasing, and gradient-free optimization.
ML Classifier Evasion Lab
Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.
Lab: Cloud AI Platform Security Testing
Test cloud AI platform configurations including AWS Bedrock guardrails, Azure content filters, and GCP safety settings.
Lab: Code Generation Security Testing
Test LLM code generation for insecure patterns, injection vulnerabilities, and code execution safety issues.
Lab: AI Compliance Testing Fundamentals
Test LLM applications against OWASP LLM Top 10 and NIST AI RMF requirements using structured test procedures.
Lab: Context Window Attack Optimization
Optimize prompt injection placement within the context window to maximize attack effectiveness using attention dynamics.
Context Window Poisoning Lab
Exploit context window management to inject persistent adversarial content that influences future model responses.
Conversation Splitting Attack Lab
Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.
Conversation State Poisoning
Exploit conversation history manipulation to alter model behavior across turns.
Cross-Context Injection
Inject prompts that persist across separate conversation contexts in shared deployments.
Custom Jailbreak Wordlist Generation
Generate targeted jailbreak wordlists using mutation strategies and seed prompts.
Data Exfiltration via Side Channels
Extract data from LLM systems using timing, token probability, and formatting side channels.
Building Defense Bypass Chains
Combine multiple bypass techniques into reliable chains that defeat layered defenses.
Lab: Defense Regression Testing Setup
Build a regression testing framework to continuously verify that LLM defenses remain effective against known attack patterns.
Document-Based RAG Injection Lab
Inject adversarial content into documents that will be processed by a RAG system to influence model responses.
Lab: Advanced Embedding Manipulation
Manipulate embedding vectors to achieve target similarity scores for RAG poisoning and retrieval manipulation.
Practical Embedding Manipulation
Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.
Embedding Similarity Attacks
Manipulate text to achieve target embedding similarity scores for retrieval poisoning.
Fine-Tuning API Security Probing
Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.
Lab: Fine-Tuning Safety Impact Testing
Measure how fine-tuning affects model safety by comparing pre and post fine-tuning safety benchmark scores.
Lab: Guardrail Bypass Technique Laboratory
Practice guardrail bypass techniques against NeMo Guardrails, LLM Guard, and custom classifier-based defenses.
Guardrail Fingerprinting
Systematically map the rules and thresholds of input/output guardrail systems.
Guardrail Rule Extraction Lab
Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.
Lab: Exploiting LLM Hallucinations
Exploit hallucination tendencies to trigger fabricated tool calls, invented API endpoints, and false fact injection.
Lab: AI Incident Response Simulation
Practice AI incident response procedures through a simulated prompt injection incident with escalation and containment.
Indirect Injection via Web Content
Plant prompt injection payloads in web pages consumed by RAG-enabled LLM applications.
Lab: Instruction Following Measurement
Quantitatively measure instruction following compliance to identify where models prioritize competing instructions.
LLM Judge Manipulation
Craft responses that exploit LLM-as-judge evaluation patterns to achieve high safety scores while embedding harmful content.
Lab: Log Probability Analysis for Attacks
Use token log probabilities to extract information about training data, system prompts, and model internals.
Lab: Advanced MCP Protocol Testing
Perform advanced security testing of MCP servers including transport analysis, capability enumeration, and tool poisoning.
MCP Tool Poisoning Attack
Exploit MCP tool descriptions to inject instructions that redirect agent behavior.
Lab: Model Context Protocol Fuzzing
Fuzz MCP protocol messages to discover parsing errors, validation gaps, and unexpected behavior in MCP servers.
Practical Model Fingerprinting Lab
Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.
Lab: LLM Security Monitoring Setup
Deploy a comprehensive security monitoring system for LLM applications with anomaly detection and alerting.
Lab: Multi-Agent Communication Testing
Test security of inter-agent communication in multi-agent systems including message injection and trust boundary violations.
Multi-Language Safety Bypass Lab
Exploit gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.
Multi-Language Injection Attacks
Exploit language switching and low-resource language gaps to bypass safety training.
Multi-Model Attack Chaining
Chain attacks across multiple LLM models in a pipeline to bypass per-model defenses.
Lab: Multi-Model Comparison Security Testing
Compare security postures across multiple LLM providers by running identical attack suites and analyzing differential responses.
Multi-Model Pipeline Attack Lab
Attack a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.
Lab: Multi-Turn State Tracking Attacks
Exploit conversation state tracking to build persistent attack context across multiple conversation turns.
Lab: Intermediate Multimodal Security Testing
Test multimodal LLMs with crafted images containing embedded text, adversarial perturbations, and visual injection payloads.
Output Channel Discovery Lab
Discover alternative output channels in LLM applications that bypass standard output filtering.
Lab: Output Schema Exploitation
Exploit structured output schemas to force models into revealing information through required field population.
Lab: Prompt Caching Side-Channel Attacks
Exploit prompt caching mechanisms to detect cached prompts and extract information through timing side channels.
Prompt Leakage Technique Lab
Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.
Automated Red Teaming with PyRIT
Use Microsoft's PyRIT framework to run automated multi-turn attack campaigns.
Lab: Advanced RAG Security Testing
Test RAG systems for chunking exploitation, reranking manipulation, and cross-document injection attacks.
RAG Context Poisoning
Poison a vector database to inject adversarial content into RAG retrieval results.
Reasoning Chain Hijacking
Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.
Lab: Role Hierarchy Exploitation
Exploit multi-role LLM architectures by manipulating the instruction hierarchy between system, user, and assistant roles.
Lab: Safety Filter Benchmarking
Benchmark safety filters across providers using standardized test suites to compare detection rates and false positives.
Semantic Deception Crafting Lab
Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.
Semantic Injection Crafting
Craft semantically coherent injections that evade both classifiers and human review.
Lab: Semantic Search Manipulation
Manipulate semantic search results by crafting documents that achieve target similarity scores against specific queries.
Lab: Session Isolation Testing
Test session isolation in multi-tenant LLM deployments to detect cross-session data leakage and context contamination.
Session State Exploitation Lab
Exploit session management vulnerabilities in LLM applications to access other users data or escalate privileges.
Lab: Streaming Response Manipulation
Exploit streaming response mechanisms to bypass output filters that only check complete responses.
Structured Output Safety Bypass
Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.
Structured Output Exploitation
Exploit JSON mode and structured output constraints to bypass safety filters.
Lab: System Prompt Reconstruction (Intermediate Lab)
Reconstruct full system prompts through iterative partial extraction and cross-referencing multiple leakage vectors.
Advanced Token Smuggling
Use Unicode homoglyphs, zero-width characters, and tokenizer quirks to smuggle payloads.
Lab: Tool Result Injection Attacks
Inject adversarial content through tool call results to poison model reasoning and redirect subsequent actions.
Lab: Unicode Normalization Bypass Attacks
Exploit Unicode normalization differences between input validation and model processing to smuggle injection payloads.
Lab: Vector Database Access Control Testing
Test vector database access controls for bypass vulnerabilities including namespace traversal and filter manipulation.
Agent Goal Hijacking
Redirect an AI agent's objectives through carefully crafted inputs that override its primary task.
Assistant Prefill Injection Attacks
Exploit assistant message prefilling to prime model responses and bypass safety alignment.
Chat History Manipulation Attacks
Inject false messages into conversation history to alter model behavior in subsequent turns.
Input Classifier Boundary Mapping
Systematically probe input classifiers to map their decision boundaries and find bypass regions.
Context Window Boundary Exploitation
Exploit the boundaries of context windows to push safety instructions beyond the model's attention.
Crescendo Attack Implementation
Implement Microsoft's Crescendo multi-turn escalation attack with automated conversation management.
Cross-Plugin Exploitation Chaining
Chain exploits across multiple plugins/tools in an agent system to achieve multi-step attacks.
Function Schema Manipulation
Exploit function calling schemas to inject instructions through parameter descriptions and enums.
Building Custom Garak Probes
Develop custom probe modules for NVIDIA Garak to test organization-specific vulnerability patterns.
Guardrail Latency-Based Detection
Use timing side channels to identify and characterize guardrail implementations in LLM applications.
Custom Inspect AI Safety Evaluations
Build custom safety evaluation tasks in Inspect AI to test application-specific vulnerability classes.
JailbreakBench Comparative Testing
Run standardized jailbreak benchmarks across multiple models and compare vulnerability profiles.
JSON Mode Safety Bypass
Exploit structured JSON output modes to bypass safety filters that only check plain-text responses.
LLM Guard Evasion Techniques
Develop evasion techniques against LLM Guard input scanners and output detectors.
Implementing Many-Shot Jailbreaking
Implement Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.
MCP Resource Data Exfiltration (Intermediate Lab)
Exploit MCP resource URIs to access and exfiltrate data beyond intended scope boundaries.
Multi-Language Safety Gap Exploitation
Systematically identify and exploit safety training gaps across low-resource languages.
Multi-Turn State Tracking Exploitation
Exploit state management in multi-turn conversations to gradually shift model behavior over many turns.
NeMo Guardrails Bypass Techniques
Test and bypass NVIDIA NeMo Guardrails configurations using known evasion techniques.
Output Filter Bypass via Encoding
Bypass output safety filters by instructing the model to encode responses in Base64, hex, or custom schemes.
PDF Document Injection for RAG Systems
Craft adversarial PDF documents that inject instructions when processed by RAG document loaders.
Prompt Template Variable Injection
Inject adversarial content through template variable expansion in prompt management systems.
PyRIT Custom Attack Strategies
Build custom attack strategies and converters in Microsoft's PyRIT framework for targeted campaigns.
RAG Document Injection Campaign
Design and execute a document injection campaign against a RAG-powered application with vector search.
Semantic Search Poisoning
Craft adversarial documents that rank highly in semantic search for targeted queries in RAG systems.
Streaming Response Exfiltration
Exploit streaming API responses to exfiltrate data that might be caught by batch output filters.
System Prompt Extraction via Token Probabilities
Exploit token probability APIs to reconstruct hidden system prompts character by character.
Token Boundary Manipulation
Exploit tokenizer-specific behavior by crafting inputs that split across token boundaries in unexpected ways.
Tool Result Injection Attacks
Craft malicious tool return values that inject instructions back into the model's reasoning chain.
Injecting Text via Images to VLMs
Embed adversarial text in images that vision-language models read and follow as instructions.
Lab: Data Exfiltration via Webhooks
Test data exfiltration paths through LLM-triggered webhooks, API callbacks, and external URL rendering.
Autonomous Vehicle AI Communication Assessment
Assess the AI communication and decision-making systems in an autonomous vehicle scenario.
Code Review Assistant Assessment
Test a code review AI for vulnerabilities in code analysis, suggestion generation, and repository access.
Content Moderation AI Assessment
Assess an AI content moderation system for bypass techniques, false positive manipulation, and adversarial content generation.
Customer Service Agent Red Team
Red team a customer service agent with tool access to order systems, refunds, and customer data.
Educational AI Tutor Assessment
Assess an AI tutoring system for content safety, answer manipulation, and student data exposure.
FinTech Chatbot Security Assessment
Conduct a full security assessment of a financial services chatbot handling sensitive transactions.
Government Services AI Assessment
Assess an AI system that helps citizens navigate government services for data privacy and manipulation risks.
Healthcare Diagnostic AI Assessment
Assess a healthcare diagnostic AI for safety-critical vulnerabilities and data privacy compliance.
Insurance Claims AI Assessment
Conduct a complete red team assessment of an AI system that processes insurance claims and makes approval recommendations.
Legal AI Document Review Assessment
Assess a legal AI system that reviews contracts for vulnerabilities in document processing and privilege escalation.
Medical Triage AI Assessment
Assess an AI-powered medical triage system for safety-critical vulnerabilities and clinical decision manipulation.
Multi-Agent Workflow Assessment
Red team a multi-agent system with specialized agents communicating via A2A protocol.
AI Recruitment System Assessment
Red team an AI recruitment system for bias exploitation, resume injection, and candidate data privacy issues.
Smart Home AI Assistant Assessment
Red team a smart home AI assistant with device control, user profile access, and purchase capabilities.
AI Supply Chain Pipeline Assessment
Assess the full ML pipeline from data ingestion through model deployment for supply chain attacks.
AI Trading Bot Assessment
Red team an AI-powered trading system for market manipulation, data leakage, and adversarial input attacks.
Automated Pentesting Agent Assessment
Assess an AI-powered automated penetration testing agent for scope violations, unauthorized actions, and data handling.
Autonomous Vehicle Planning AI Assessment
Assess the safety boundaries of an autonomous vehicle planning AI through adversarial scenario injection.
Content Moderation AI Platform Assessment
Assess an AI content moderation system for bypass techniques, false negative exploitation, and bias.
Customer Data Platform AI Assessment
Red team an AI system integrated with a customer data platform handling PII, behavioral data, and segmentation.
Cyber Threat Intelligence AI Assessment
Red team a cyber threat intelligence AI that processes IOCs, threat reports, and attack attribution.
Data Analytics Copilot Assessment
Red team a data analytics copilot with SQL generation capabilities and access to enterprise databases.
DevOps AI Assistant Security Assessment
Assess a DevOps AI assistant with access to CI/CD pipelines, cloud infrastructure, and deployment systems.
Enterprise Knowledge Management AI Assessment
Assess an enterprise knowledge management system with role-based access, document permissions, and multi-tenant isolation.
Government Services Chatbot Assessment
Red team a government citizen services chatbot that accesses tax records, benefits, and identity information.
AI-Powered Incident Response System Assessment
Red team an AI incident response system in a SOC environment with access to SIEM, EDR, and ticketing.
Insurance Claims Bot Security Assessment
Conduct a full security assessment of an insurance claims processing chatbot with access to policy and claims data.
Legal Contract Review AI Assessment
Red team an AI contract review system for privilege violations, document injection, and hallucinated clauses.
Medical Triage Chatbot Assessment
Red team a medical triage chatbot for dangerous medical advice, data exposure, and safety-critical failures.
Multi-Agent Research Team Assessment
Assess a multi-agent research team with specialized researcher, writer, and reviewer agents communicating via A2A.
Real Estate Valuation AI Assessment
Assess an AI real estate valuation system for manipulation of property valuations and data exposure.
AI Recruiting Assistant Assessment
Red team an AI recruiting assistant that screens resumes, schedules interviews, and accesses candidate data.
Smart Home AI Assistant Assessment (Simulation)
Assess a smart home AI assistant that controls lights, locks, cameras, and thermostat via tool integrations.
Supply Chain Optimization AI Assessment
Assess an AI supply chain optimization system for manipulation of demand forecasts and routing decisions.
AI Trading Assistant Security Assessment
Conduct a security assessment of an AI trading assistant with access to portfolio data and trade execution.
Travel Booking Agent Red Team
Red team a travel booking AI agent with access to payment systems, loyalty programs, and personal data.