# labs

Advanced AI Red Team Labs

Advanced hands-on labs covering automated attack algorithms, production red team tooling, multi-model testing, and sophisticated exploitation techniques for experienced AI security practitioners.

labsadvanced

advancedlabragmanipulationlabs

Advanced RAG Manipulation Lab

Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.

advancedlabadversarialrobustnessevallabs

Adversarial Robustness Evaluation

Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.

labsgcgadversarial-suffixoptimization

Adversarial Suffix Optimization (GCG)

Implement the Greedy Coordinate Gradient attack to generate adversarial suffixes.

labsagenttool-chainexploitation

Agent Tool Chain Exploitation

Chain multiple tool calls in an agent system to achieve multi-step exploitation.

labsalignmentboundary-probingadvanced

Lab: Alignment Boundary Probing

Systematically probe alignment boundaries to map the exact thresholds where safety training engages and disengages.

autodanadvancedlabimplementationlabs

AutoDAN Implementation Lab

Implement the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.

labsdefense-evaluationautomatedadvanced

Automated Defense Evaluation Framework

Build an automated framework to evaluate defensive measures across attack categories.

labsautomated-pipelinered-teamadvanced

Lab: Building an Automated Red Team Pipeline

Build a complete automated red teaming pipeline with attack generation, execution, scoring, and reporting.

labsbackdoorpersistence-testingadvanced

Lab: Backdoor Persistence Through Safety Training

Test whether fine-tuned backdoors persist through subsequent safety training rounds and RLHF alignment.

labsconstitutional-aibypassadvanced

Lab: Constitutional AI Bypass Techniques

Test and bypass Constitutional AI safety mechanisms by exploiting the critique-revision training methodology.

labscross-lingualtransfer-attacksadvanced

Lab: Cross-Lingual Transfer Attacks

Develop attacks in one language that transfer to others by exploiting shared multilingual representation spaces.

labstransfercross-modeladvanced

Cross-Model Transfer Attacks

Develop attacks on open-source models that transfer to closed-source commercial APIs.

exploitationadvancedcrosslabpluginlabs

Cross-Plugin Exploitation Chain

Chain exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.

labscurriculumpoisoningadvanced

Lab: Training Curriculum Poisoning

Exploit training data ordering and curriculum learning to amplify the impact of small numbers of poisoned examples.

classifiercustomadvancedlabtraininglabs

Custom Safety Classifier Training

Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.

labsharnesscustomred-team

Building a Custom Red Team Harness

Build a complete red team testing harness with parallel execution, logging, and scoring.

defenseadvancedlabteamredlabs

Defense vs Red Team War Game

Alternate between attacking and defending an LLM application to develop skills in both offensive and defensive operations.

labsdifferential-testingcross-modeladvanced

Differential Testing Across Models

Use differential testing to find behavior inconsistencies across model providers.

labsdistributed-trainingattack-simulationadvanced

Lab: Distributed Training Attack Simulation

Simulate attacks on distributed training infrastructure including gradient poisoning and aggregation manipulation.

labsembeddingadversarialadvanced

Embedding Adversarial Perturbation

Craft adversarial inputs that produce target embeddings for retrieval manipulation.

labsevaluationframework-gamingadvanced

Lab: Evaluation Framework Gaming

Demonstrate how to game safety evaluation frameworks to produce artificially high safety scores while retaining vulnerabilities.

labsfederated-learningpoisoningadvanced

Lab: Federated Learning Poisoning Attacks

Execute model poisoning attacks in a federated learning simulation by manipulating local model updates.

labsfine-tuningbackdooradvanced

Fine-Tuning Backdoor Insertion

Insert a triggered backdoor during fine-tuning that activates on specific input patterns.

labsgradient-basedattacksadvanced

Lab: Gradient-Based Attack Implementation

Implement gradient-based adversarial attacks on open-weight models including PGD, FGSM adapted for text inputs.

advancedlabgradientguidedpoisoninglabs

Gradient-Guided Data Poisoning

Use gradient information from open-source models to craft optimally poisoned training examples.

labsguardrail-bypassensembleadvanced

Guardrail Bypass Ensemble

Build an ensemble attack system that combines multiple bypass techniques for robustness.

labsinference-timecompute-exploitationadvanced

Lab: Inference-Time Compute Exploitation

Exploit inference-time compute scaling to manipulate reasoning depth and resource consumption in thinking models.

labsjailbreaktransferabilityadvanced

Lab: Jailbreak Transferability Analysis

Analyze jailbreak transferability across model families to discover universal vulnerability patterns.

Multi-Hop MCP Server Exploitation

Chain exploits across multiple MCP servers to achieve lateral movement in agent systems.

labsmcpmulti-hopadvanced

labsmemory-extractionlong-termadvanced

Lab: Long-Term Memory Extraction

Extract information from long-term agent memory stores through crafted queries and memory retrieval manipulation.

distillationadvancedlabattacklabsmodel

Model Distillation Security Lab

Extract model capabilities through distillation techniques using only black-box API access.

labsmodel-mergingsecurity-analysisadvanced

Lab: Model Merging Security Analysis

Analyze security implications of model merging techniques and test for backdoor propagation through merged model weights.

labssteering-vectorsactivationadvanced

Lab: Model Steering with Activation Vectors

Use activation steering vectors to control model behavior without prompt modification for security testing.

labsmulti-modalattack-chainadvanced

Lab: Multi-Modal Attack Chain Construction

Construct attack chains that span text, image, and audio modalities to exploit cross-modal processing gaps.

advancedlablabsattackmultiobjective

Multi-Objective Attack Optimization

Optimize attack payloads for multiple simultaneous objectives: jailbreaking, data extraction, and defense evasion.

labscrescendomulti-turnadvanced

Multi-Turn Crescendo Attack

Implement Microsoft's Crescendo attack pattern for gradual multi-turn jailbreaking.

chainadvancedlabmultimodalattacklabs

Multimodal Attack Chain Lab

Chain attacks across text, image, and structured data modalities to exploit multimodal system vulnerabilities.

labsmultimodalimageinjection

Multimodal Image Injection

Embed adversarial text in images that triggers prompt injection in vision-language models.

labsneural-trojandetectionadvanced

Lab: Neural Network Trojan Detection

Implement and test neural network trojan detection methods including activation clustering and spectral analysis.

labsoutput-constraintsexploitationadvanced

Output Constraint Exploitation

Exploit structured output requirements to force models into revealing restricted information.

labspairautomatedjailbreaking

Implementing PAIR Automated Attacks

Implement the PAIR (Prompt Automatic Iterative Refinement) attack for automated jailbreaking.

productionadvancedlabsimulationlabs

Production Environment Simulation Lab

Test attacks against a simulated production environment with realistic logging, monitoring, and alerting.

labsinjection-compilertoolingadvanced

Lab: Building a Prompt Injection Compiler

Build a compiler that transforms high-level attack specifications into optimized prompt injection payloads.

labslogprobsprompt-leakageadvanced

Prompt Leakage via Log Probabilities

Extract system prompt information using token log probability analysis.

labsquantizationsecurity-testingadvanced

Lab: Quantization Security Testing

Test behavioral differences between full-precision and quantized models to discover quantization-induced vulnerabilities.

Advanced RAG Poisoning Techniques

Execute sophisticated RAG poisoning including gradient-guided document crafting.

labsragpoisoningadvanced

labsreasoning-tracemanipulationadvanced

Lab: Reasoning Trace Manipulation

Manipulate chain-of-thought reasoning traces to inject false premises and redirect model conclusions.

labsred-team-orchestrationadvanced-techniquesadvanced

Lab: Advanced Red Team Orchestration

Build an advanced red team orchestration system that coordinates multiple attack agents against a defended target.

labsrepresentation-engineeringsecurityadvanced

Lab: Representation Engineering for Security

Use representation engineering to analyze and manipulate internal model representations for security research.

probingadvancedlabrepresentationlabs

Representation Probing for Vulnerabilities

Probe internal model representations to identify exploitable features and develop representation-level attacks.

labsreward-hackingadvanced-techniquesadvanced

Lab: Advanced Reward Hacking Techniques

Develop reward hacking exploits that manipulate RLHF reward models through adversarial optimization.

analysisadvancedlabrewardlabsmodel

Reward Model Analysis Lab

Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.

labsreward-modelgamingadvanced

Reward Model Gaming (Advanced Lab)

Craft inputs that score highly on reward models while containing adversarial content.

labsrlhfexploitationadvanced

Lab: RLHF Training Exploitation

Exploit reward model weaknesses to craft inputs that score highly on safety metrics while achieving adversarial objectives.

labssafety-classifierextractionadvanced

Lab: Safety Classifier Model Extraction

Extract the decision boundary of safety classifiers through systematic probing to craft maximally evasive payloads.

labsclassifierinversionadvanced

Safety Classifier Inversion

Reverse-engineer safety classifier decision boundaries through systematic probing.

labssparse-attentionexploitationadvanced

Lab: Sparse Attention Exploitation

Exploit sparse attention patterns in long-context models to hide injection payloads in low-attention regions.

labssupply-chainmodel-poisoningadvanced

Lab: Model Supply Chain Poisoning

Simulate model supply chain attacks by injecting backdoors into model weights distributed through public registries.

labstapautomatedjailbreaking

Tree of Attacks (TAP) Implementation

Build a TAP attack system that generates and prunes jailbreak attempts using tree search.

labstoken-leveladversarial-optimizationadvanced

Lab: Token-Level Adversarial Optimization

Implement token-level adversarial optimization to discover minimal perturbations that bypass safety training.

labstool-chainlateral-movementadvanced

Lab: Tool Chain Lateral Movement

Chain tool call exploits to achieve lateral movement across services connected to LLM agent systems.

labsa2atask-injectionadvanced

A2A Protocol Task Injection

Inject malicious tasks into Agent-to-Agent protocol communication channels to redirect multi-agent workflows.

labsagent-workflowstate-tamperingadvanced

Agent Workflow State Tampering

Manipulate agent workflow state machines to skip validation steps and reach privileged execution paths.

labsattentionmanipulationtransformeradvanced

Attention Pattern Manipulation

Craft inputs that manipulate transformer attention patterns to prioritize adversarial content over safety instructions.

labsaudioinjectionwhisperadvanced

Audio Injection via Speech-to-Text Models

Craft adversarial audio that embeds prompt injection payloads when transcribed by speech-to-text models.

labsautodanstealthy-jailbreakadvanced

AutoDAN Implementation and Testing

Implement the AutoDAN methodology for generating human-readable stealthy jailbreak prompts using gradient guidance.

labsconstitutional-classifiersbypassadvanced

Constitutional Classifier Bypass

Develop techniques to bypass Anthropic-style constitutional classifiers through adversarial input crafting.

labsgcgcross-modeltransferadvanced

Cross-Model GCG Transfer Attacks

Generate adversarial suffixes on open-source models and test their transferability to commercial APIs.

labsdashboardred-teamvisualizationadvanced

Building a Red Team Results Dashboard

Build a real-time dashboard for tracking and visualizing red team campaign results across targets and techniques.

labsdifferential-testingmodel-behavioradvanced

Differential Model Behavior Analysis

Use differential testing across model versions and providers to discover inconsistent safety behaviors.

labsembeddinginversionprivacyadvanced

Embedding Inversion Attack Implementation

Implement embedding inversion to recover original text from vector database embeddings.

labsfine-tuningalignment-removaladvanced

Fine-Tuning Alignment Removal Attack

Use fine-tuning API access to systematically remove safety alignment with minimal training examples.

labsgui-agentvisual-injectionadvanced

GUI Agent Visual Injection Attack

Inject adversarial content into screenshots and UI elements processed by computer-use AI agents.

labsharmbenchcustom-attacksevaluationadvanced

HarmBench Custom Attack Submission

Develop and evaluate custom attack methods against the HarmBench standardized evaluation framework.

labslangchaincveexploitationadvanced

LangChain CVE Exploitation Lab

Reproduce and analyze LangChain CVEs including CVE-2023-29374 and CVE-2023-36258 in a safe lab environment.

labslorabackdoorinsertionadvanced

LoRA Backdoor Insertion Attack

Insert triggered backdoors through LoRA fine-tuning that activate on specific input patterns while passing safety evals.

labsmcpmulti-serverpivotadvanced

MCP Multi-Server Pivot Chain

Chain exploits across multiple MCP servers to achieve lateral movement and capability escalation in agent systems.

labsagent-memoryinjectionpersistenceadvanced

Agent Memory Injection for Persistent Access

Inject persistent instructions into agent memory systems that survive across conversation sessions.

labsmulti-agenttrust-boundaryadvanced

Multi-Agent Trust Boundary Exploitation

Exploit trust boundaries between cooperating agents to escalate privileges and access restricted capabilities.

labsmultimodalchain-attackorchestrationadvanced

Multi-Modal Attack Chain Orchestration

Orchestrate attacks across text, image, and document modalities to bypass per-modality safety filters.

labspromptfoored-teamtest-suiteadvanced

Promptfoo Red Team Test Suite Development

Build comprehensive red team test suites in Promptfoo with custom graders and multi-model targeting.

labsquantizationsafety-regressionadvanced

Quantization-Induced Safety Regression Testing

Test how model quantization (INT8, INT4, GPTQ) degrades safety alignment and introduces exploitable gaps.

labsragaccess-controlbypassadvanced

RAG Access Control Bypass

Bypass document-level access controls in enterprise RAG systems through query manipulation and context injection.

labsreasoning-tracechain-of-thoughtexploitationadvanced

Reasoning Trace Exploitation in CoT Models

Exploit visible chain-of-thought reasoning traces in models like o1 and DeepSeek-R1 to manipulate outputs.

labsrepresentation-probinginternalsadvanced

Representation Probing for Vulnerability Discovery

Probe model internal representations to discover exploitable features and latent vulnerability patterns.

labsreward-modeladversarialadvanced

Adversarial Reward Model Exploitation

Craft inputs that exploit reward model weaknesses to achieve high safety scores while containing harmful content.

labssafety-classifierreverse-engineeringadvanced

Safety Classifier Reverse Engineering

Reverse-engineer a safety classifier's decision boundaries through systematic adversarial probing.

labssandbox-escapecode-agentsadvanced

Code Agent Sandbox Escape Techniques

Develop and test sandbox escape techniques against code execution environments in AI coding assistants.

labssteganographyimagespayloadadvanced

Steganographic Payload Embedding in Images

Hide prompt injection payloads in images using steganographic techniques undetectable to human observers.

labstraining-dataextractionproductionadvanced

Training Data Extraction from Production LLMs

Implement Carlini et al.'s techniques to extract memorized training data from production language model APIs.

labswatermarkdetectionremovaladvanced

LLM Watermark Detection and Removal

Detect and remove statistical watermarks from LLM-generated text while preserving content quality.

labswatermarkingdetection-removaladvanced

Lab: LLM Watermark Detection and Removal

Detect and analyze LLM text watermarks using statistical methods and test watermark removal through paraphrasing.

labsgetting-startedbeginnersetup

Getting Started with AI Red Teaming Labs

Introduction to the hands-on lab environment for AI red teaming. Learn what you'll need, how labs are structured, and get an overview of all beginner exercises.

labsapi-authenticationtestingbeginner

Lab: API Authentication Security Testing

Test LLM API authentication mechanisms for vulnerabilities including token leakage, replay attacks, and scope bypass.

explorationlabbeginnerlabsmultiapi

Multi-Provider API Exploration

Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.

labsapirate-limitsbeginner

API Rate Limit and Error Handling

Test LLM API rate limits and implement proper error handling for automated testing workflows.

labsapiresponse-parsingbeginner

Lab: API Response Parsing and Analysis

Parse and analyze LLM API responses to identify anomalies, safety filter triggers, and information leakage patterns.

labsautomated-testingsetupbeginner

Lab: Setting Up Automated LLM Testing

Configure automated testing pipelines for LLM red teaming using Python scripts, test runners, and result collection.

labsadversarial-examplescraftingbeginner

Lab: Crafting Basic Adversarial Examples

Create basic adversarial examples that cause LLMs to misclassify, misinterpret, or bypass safety checks on text input.

basiclabbeginnerlabsautomatedtesting

Basic Automated Testing Setup

Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.

labsclassifierevasionbeginner

Basic Classifier Evasion

Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.

labsdata-exfiltrationbeginner

Basic Data Exfiltration Techniques

Extract sensitive information from LLM applications using social engineering and misdirection.

testingdefensebasiclabbeginnerlabs

Basic Defense Mechanism Testing

Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.

labsfuzzingintroductionbeginner

Lab: Introduction to LLM Fuzzing

Learn the basics of fuzzing LLM inputs by mutating prompts to discover unexpected behaviors and edge cases.

labsjailbreaktaxonomybeginner

Lab: Jailbreak Technique Taxonomy

Explore the major categories of jailbreak techniques and practice classifying attack payloads by technique type.

labsrag-testingbasicsbeginner

Lab: Basic RAG System Security Testing

Introduction to testing Retrieval-Augmented Generation systems for document injection and retrieval manipulation vulnerabilities.

reportingbasiclabbeginnerexerciselabs

Security Finding Documentation Exercise

Practice documenting security findings in a professional format with reproducible steps and impact assessment.

labsprompt-injectionrole-playbeginner

Basic Role-Play Prompt Injection

Use role-play techniques to bypass basic LLM safety guardrails through character assumption.

labssocial-engineeringllmbeginner

Lab: Social Engineering LLM Applications

Practice social engineering techniques adapted for LLM applications including authority escalation and urgency injection.

labscontent-policymappingbeginner

Lab: Content Policy Boundary Mapping

Map the content policy boundaries of LLM applications to identify permitted and restricted topic areas systematically.

labscontext-windowoverflowbeginner

Basic Context Window Overflow

Fill the context window with padding content to push safety instructions out of the attention window.

historyconversationlabbeginnerbasicslabs

Conversation History Analysis

Analyze how conversation history influences model behavior and identify opportunities for context manipulation.

labsconversation-historymanipulationbeginner

Lab: Conversation History Manipulation

Test how LLM applications handle conversation history including truncation, injection, and context window management.

labsdefense-identificationreconnaissance

Identifying LLM Defenses

Map the defensive layers of an LLM application through systematic probing and error analysis.

labsdefense-testingintroductionbeginner

Lab: Introduction to Defense Testing

Learn to systematically test LLM application defenses by probing input filters, output validators, and guardrails.

labsdelimitersescapebeginner

Delimiter Escape Techniques

Practice escaping common delimiters used to separate system prompts from user input.

securitylabbeginnerembeddingbasicslabs

Embedding Basics for Security

Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.

labsembeddingsfundamentalsbeginner

Lab: Embedding Fundamentals for Red Teamers

Learn embedding fundamentals including vector similarity, semantic search, and how embeddings enable RAG systems.

labsencodingobfuscationbeginner

Encoding and Obfuscation Basics

Explore how Base64, ROT13, and Unicode encoding can bypass input filters in LLM applications.

labsenvironmenthardeningbeginner

Lab: Testing Environment Hardening

Secure your AI red team testing environment to prevent credential leaks, accidental production testing, and data exposure.

labserror-messagesreconnaissancebeginner

Lab: Error Message Analysis for Recon

Analyze LLM application error messages to extract information about system architecture, models, and defensive layers.

errorlabbeginnerintelligencelabsmessage

Error Message Intelligence Gathering

Extract system architecture information from error messages and response patterns in LLM applications.

labsevidence-collectionloggingbeginner

Lab: Evidence Collection for LLM Testing

Set up evidence collection workflows including request/response logging, screenshots, and reproducibility documentation.

labsfew-shotinjectionbeginner

Few-Shot Injection Fundamentals

Craft few-shot examples that prime the model to follow attacker instructions in subsequent turns.

Your First Claude API Call

Set up the Anthropic SDK and make your first Claude API call with system prompts and messages.

labsanthropicapibeginner

Your First LLM API Call with OpenAI

Set up your Python environment and make your first LLM API call to understand request/response patterns.

labsopenaiapibeginner

labsformat-injectionstringsbeginner

Lab: Format String Injection in LLMs

Practice injecting format strings and template directives to manipulate LLM output structure and content.

labsgarakscanningbeginner

Your First Garak Vulnerability Scan

Install and run NVIDIA's Garak tool to perform automated vulnerability scanning on an LLM endpoint.

labshallucinationdetectionbeginner

Lab: Hallucination Detection Basics

Learn to detect and trigger hallucinations in LLM outputs including factual errors, fabricated citations, and invented APIs.

injectionlabbeginnerlabshelloworld

Hello World Prompt Injection

Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.

labsinjection-logginganalysisbeginner

Lab: Injection Attempt Log Analysis

Build a logging pipeline to capture and analyze injection attempts for pattern discovery and technique refinement.

labsinstruction-hierarchytestingbeginner

Instruction Hierarchy Testing

Test how models prioritize conflicting instructions between system, user, and assistant roles.

labstemperaturesamplingjailbreaking

Temperature and Sampling Effects on Jailbreaks

Experiment with temperature and top-p settings to understand their impact on jailbreak success rates.

exploitationoutputlabbeginnerlabsjson

JSON Output Exploitation Basics

Exploit LLM JSON mode to bypass content filters and extract information through structured output channels.

labsjson-modeoutput-testingbeginner

Lab: JSON Output Mode Security Testing

Test JSON output mode for injection vulnerabilities including schema manipulation and structured output exploitation.

labsmulti-languagetestingbeginner

Lab: Multi-Language Prompt Testing

Test LLM safety boundaries across multiple languages to identify inconsistencies in safety training coverage.

labslocal-modelollama-setupbeginner

Lab: Local Model Setup for Testing

Set up local LLM instances using Ollama for safe, cost-free red team testing without API costs or rate limits.

comparisonsecuritylabbeginnerlabsmodel

Model Security Comparison Lab

Compare the security posture of different LLM models by running identical test suites across providers.

labsfingerprintingmodel-identificationbeginner

Lab: Basic Model Fingerprinting

Learn to identify which LLM model powers an application through behavioral fingerprinting and response analysis.

labsmulti-turnprobingbeginner

Multi-Turn Conversation Probing

Use multi-turn conversations to gradually escalate requests and probe LLM safety boundaries.

labsmultimodalbasicsbeginner

Lab: Multimodal Input Testing Basics

Introduction to testing multimodal LLMs with image and text inputs to understand cross-modal behavior.

labsoutput-filteringdetectionbeginner

Lab: Detecting Output Filters

Identify and characterize output filtering mechanisms in LLM applications through systematic response analysis.

outputformatlabbeginnercontrollabs

Output Format Control Lab

Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.

labsoutput-manipulationformatbeginner

Output Format Manipulation

Force LLMs to output in specific formats (JSON, XML, code) to bypass output sanitization.

labsloggingpayloadsbeginner

Setting Up Payload Logging

Build a payload logging system to track prompt injection attempts and model responses.

labsplaygroundexplorationbeginner

Lab: LLM Playground Exploration

Explore LLM provider playgrounds to understand model behavior, parameter effects, and interactive testing workflows.

labbeginnerlabsplaygroundtesting

LLM Playground Security Testing

Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.

labsprompt-leakingdefense-testingbeginner

Lab: Testing Prompt Leaking Defenses

Test various prompt leaking defense configurations to evaluate their effectiveness against extraction techniques.

analysisstructurelabbeginnerpromptlabs

Prompt Structure Analysis Lab

Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.

labsprompt-templatestestingbeginner

Lab: Prompt Template Vulnerability Testing

Test common prompt template patterns for injection vulnerabilities including string interpolation and placeholder abuse.

labspromptfooevaluationbeginner

Setting Up Promptfoo for LLM Evaluation

Configure Promptfoo to create automated test suites for evaluating LLM safety and robustness.

labsrate-limitingenumerationbeginner

Lab: Rate Limit Enumeration and Bypass

Enumerate API rate limits and test common bypass techniques including header manipulation and request distribution.

mappingratelabbeginnerlimitlabs

Rate Limit and Quota Mapping

Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.

labsrefusal-patternsanalysisbeginner

Lab: Analyzing Model Refusal Patterns

Systematically map model refusal patterns to understand safety training boundaries and identify potential bypasses.

labsreport-writingdocumentationbeginner

Lab: Red Team Report Writing Basics

Practice writing clear, actionable red team findings reports with evidence, risk ratings, and remediation guidance.

analysisresponselabbeginnerbasicslabs

Response Analysis Fundamentals

Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.

labsconsistencytestingbeginner

Lab: Response Consistency Testing

Test LLM response consistency across repeated queries to identify non-deterministic safety boundary behavior.

labssafety-benchmarkstestingbeginner

Lab: Running Safety Benchmarks

Run standardized safety benchmarks against LLM models to establish baseline safety profiles for comparison.

safetymappinglabbeginnerboundarylabs

Safety Boundary Mapping

Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.

labssafety-testingfundamentalsbeginner

Lab: Introduction to Safety Testing

Learn the fundamentals of LLM safety testing including test case design, baseline measurement, and result documentation.

probingsafetylabbeginnertraininglabs

Safety Training Boundary Probing

Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.

labsscoringvulnerability-ratingbeginner

Lab: Vulnerability Scoring Fundamentals

Learn vulnerability scoring frameworks adapted for LLM systems including severity, exploitability, and impact assessment.

labspayload-encodingtechniquesbeginner

Lab: Simple Payload Encoding Techniques

Practice encoding injection payloads using Base64, hex, URL encoding, and Unicode to bypass basic input filters.

labssystem-promptenumerationbeginner

Lab: System Prompt Enumeration Techniques

Practice systematic approaches to enumerating hidden system prompt content through varied probing strategies.

labssystem-promptextractionbeginner

System Prompt Extraction Fundamentals

Learn to extract hidden system prompts from LLM applications using direct and indirect techniques.

labstemperaturesampling-securitybeginner

Lab: Temperature and Sampling Security Effects

Explore how temperature and sampling parameters affect model susceptibility to jailbreaks and safety boundary enforcement.

labstest-case-designmethodologybeginner

Lab: Designing LLM Red Team Test Cases

Design effective red team test cases with clear objectives, success criteria, and reproducible execution procedures.

labstokenscountingbeginner

Token Counting and Estimation

Understand tokenization by counting and estimating tokens across different models and encoders.

labstokensmanipulationbeginner

Lab: Token Manipulation Basics

Understand how tokenization affects LLM behavior and practice basic token-level manipulation techniques.

labstool-comparisonevaluationbeginner

Lab: Comparing Red Team Testing Tools

Compare Garak, PyRIT, and Promptfoo capabilities through hands-on exercises using each tool against the same target.

labbeginnerconfigtoolinstallationlabs

Red Team Tool Installation and Configuration

Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.

labsapi-keyscopepermissionsbeginner

API Key Scope and Permission Testing

Test API key scoping and permission boundaries to identify over-privileged access configurations.

labsapiheadersanalysisbeginner

API Response Header Analysis

Analyze HTTP response headers from LLM APIs to fingerprint providers, versions, and middleware.

labsindirect-injectionbasicsbeginner

Basic Indirect Prompt Injection

Plant and trigger a basic indirect prompt injection payload in content consumed by an LLM.

labsmutationpayloadsbeginner

Basic Payload Mutation Techniques

Learn fundamental payload mutation strategies including synonym replacement and structural transforms.

labsragquery-injectionbeginner

Basic RAG Query Injection

Craft user queries that manipulate RAG retrieval to surface unintended documents.

labsrate-limitingabusebeginner

Basic Rate Limit Abuse Patterns

Test common rate-limit bypass patterns including header manipulation and endpoint discovery.

labsencodingbypassunicodebeginner

Character Encoding Bypass Techniques

Bypass input filters using Unicode normalization, homoglyph substitution, and mixed-script encoding.

labspersonacapability-mappingbeginner

Chatbot Persona and Capability Mapping

Map the configured persona, capabilities, and restrictions of an LLM chatbot through systematic probing.

labscompletion-hijackingbasicsbeginner

Completion Hijacking Fundamentals

Craft partial sentences that steer model completions toward attacker-desired outputs.

labsconversation-resetcontextbeginner

Conversation Reset Attacks

Exploit conversation resets and context clearing to weaken model adherence to safety instructions.

labsemojiunicodeinjectionbeginner

Emoji and Unicode Injection Techniques

Use emoji sequences and Unicode special characters to bypass text-based input filters.

labserror-messagesexploitationbeginner

Error Message Exploitation

Trigger and analyze error messages from LLM applications to reveal architecture and configuration details.

labsharmbenchevaluationbeginner

Your First HarmBench Evaluation

Run a standardized safety evaluation using the HarmBench framework against a target model.

labsinspect-aievaluationbeginner

Your First Inspect AI Evaluation

Set up and run a basic AI safety evaluation using the UK AISI Inspect framework.

labsjsoninjectionstructured-databeginner

JSON Injection Basics

Inject adversarial content through JSON-formatted inputs to exploit structured data processing.

labsllm-guardscanningbeginner

Your First LLM Guard Scan

Install and configure LLM Guard to scan inputs and outputs, understanding detection capabilities and gaps.

labsmarkdownexfiltrationrenderingbeginner

Markdown Rendering Exfiltration

Exploit Markdown rendering in chat UIs to exfiltrate data through image tags and links.

labsfingerprintingmodel-identificationbeginner

Model Fingerprinting Basics

Identify which LLM model powers an application through behavioral fingerprinting techniques.

labsnemo-guardrailsintrobeginner

Introduction to NeMo Guardrails

Set up NVIDIA NeMo Guardrails and create your first Colang-based guardrail for an LLM application.

labsfilenamesinjectionuploadbeginner

Prompt Injection via File Names

Embed prompt injection payloads in filenames and metadata of uploaded documents.

labsprompt-leakingsummarizationbeginner

Prompt Leaking via Summarization Requests

Use summarization requests to trick an LLM into revealing its hidden system prompt through indirect disclosure.

labsrefusal-analysisreconnaissancebeginner

Analyzing Refusal Messages for Intel

Extract useful information about model configuration and guardrails by systematically analyzing refusal messages.

labstimingside-channelanalysisbeginner

Response Timing Side-Channel Analysis

Use response timing differences to infer information about model processing and guardrail activation.

labssafety-boundariesmappingbeginner

Safety Boundary Mapping Exercise

Systematically map the safety boundaries of an LLM application across multiple topic categories.

labsoutput-constraintsattacksbeginner

Simple Output Constraint Attacks

Force models to output in constrained formats that bypass output safety filters.

labssystem-promptreconstructionbeginner

System Prompt Reconstruction from Clues

Piece together a hidden system prompt by collecting partial information from multiple probe techniques.

labssystem-prompterror-injectionbeginner

System Prompt Extraction via Error Injection

Trigger specific error conditions that cause the model to inadvertently reveal system prompt contents.

labstemperaturetop-ksafetybeginner

Temperature and Top-K Effects on Safety

Systematically test how temperature, top-k, and top-p parameters affect safety guardrail effectiveness.

labstranslationinjectionmultilingualbeginner

Prompt Injection via Translation

Exploit LLM translation capabilities to smuggle instructions through language boundaries.

labsxmlinjectionstructured-databeginner

XML Injection in LLM Contexts

Exploit XML tag handling in LLM applications to manipulate instruction parsing.

Agent Heist: Level 2 — MCP Server Takeover

Take control of an MCP-enabled agent by poisoning tool descriptions and chaining exploits.

labsctfagentmcp

labsctfalignmentfine-tuned

Alignment Breaker: Level 2 — Safety Fine-Tuned Model

Bypass safety fine-tuning on a model with RLHF, constitutional AI, and classifier-based defenses.

Cloud Infiltrator Challenge

Navigate through cloud AI service misconfigurations to access a protected model endpoint and extract its secrets.

infiltratorlabscloudctf

labsctfsystem-promptextraction

Context Heist: System Prompt Extraction Championship

Extract system prompts from 10 progressively harder LLM applications using diverse techniques.

heistctfchallengedatalabs

Data Heist Challenge

Extract a secret database of customer records from a RAG-enabled chatbot with strict data access controls.

Defense Destroyer Challenge

Bypass a state-of-the-art multi-layer defense system to extract a protected secret from the model.

destroyerlabsdefensectf

Defense Gauntlet: Level 2 — Layered Security

Bypass a layered defense system with input classification, guardrails, and output filtering.

labsctfdefenselayered

Encoding Maestro Challenge

Solve 10 encoding puzzles that require creative use of Base64, Unicode, ROT13, and custom encodings to bypass filters.

encodinglabsmaestroctf

forensicsdetectivelabsctf

Forensics Detective Challenge

Analyze logs and artifacts from an AI security incident to reconstruct the attack chain and identify the attacker's technique.

Guardrail Olympics: Multi-Framework Bypass

Bypass guardrail implementations from NeMo, LLM Guard, Rebuff, and custom classifiers.

labsctfguardrailsbypass

labsctfjailbreakingadaptive

Jailbreak Gauntlet: Level 2 — Adaptive Defenses

Break through adaptive defense systems that learn from your previous attack attempts.

Model Whisperer Challenge

Extract the exact system prompt from 8 progressively harder LLM applications using creative extraction techniques.

whispererlabsmodelctf

ctfsmugglerchallengepromptlabs

Prompt Smuggler Challenge

Smuggle a specific payload through 5 layers of increasingly sophisticated input filtering to capture the flag.

RAG Infiltrator: Level 2 — Enterprise Knowledge Base

Infiltrate and exfiltrate data from a multi-tier enterprise RAG system with access controls.

labsctfragenterprise

Shadow Agent Challenge

Take covert control of a multi-agent system by poisoning inter-agent communication without triggering monitors.

agentlabsshadowctf

labsctfsupply-chainregistry

Supply Chain Saboteur: Level 2 — Model Registry

Compromise a model registry to inject backdoored model weights into a deployment pipeline.

Time Traveler Challenge

Exploit conversation memory and session management to manipulate the temporal context of an AI assistant.

timelabstravelerctf

labsctfadversarialoptimization

Token Wars: Adversarial Optimization Challenge

Optimize adversarial tokens to achieve target model behaviors under strict character limits.

Tool Master Challenge

Exploit the tool-use capabilities of an agent system to access unauthorized functions and capture hidden flags.

toollabsmasterctf

Agent Maze Runner: Multi-Tool Navigation

Navigate a maze of agent tools, each with unique vulnerabilities, to reach and exfiltrate a hidden flag.

labsctfagentmulti-tool

AI Wargame: Attacker vs Defender Real-Time

Compete in a real-time wargame where you attack an LLM system while an AI defender adapts to your techniques.

labsctfwargamereal-time

Classifier Gauntlet: 10-Stage Evasion

Bypass 10 progressively harder input classifiers using different evasion techniques at each stage.

labsctfclassifierevasion

labsctfexfiltrationmulti-channel

Data Heist: Multi-Channel Exfiltration

Exfiltrate a secret key through a heavily monitored LLM system using covert exfiltration channels.

labsctfdefenseconstruction

Defense Builder: Construct and Survive

Build defensive guardrails for an LLM application and survive 100 automated attack rounds.

AI Escape Room: Agent Breakout Challenge

Break an AI agent out of its sandboxed environment by chaining tool-use vulnerabilities and injection techniques.

labsctfagentescape-room

labsctffunction-callingfortress

Function Calling Fortress Breach

Bypass function calling restrictions to invoke unauthorized tools and extract a flag from a sandboxed agent.

labsctfguardrailsspeedrun

Guardrail Speedrun: Fastest Bypass Challenge

Bypass 5 different guardrail implementations as fast as possible in a timed competition format.

labsctfhidden-payloadencoding

Invisible Ink: Hidden Payload Extraction Challenge

Extract a hidden flag by decoding multiple layers of obfuscation in model responses across encoded channels.

labsctfmcpserver-takeover

MCP Server Takeover Challenge

Compromise an MCP server through tool description injection and use it to pivot to other connected services.

Multimodal Cipher: Cross-Modal Decryption

Decode a flag split across text, image, and audio inputs processed by a multimodal AI system.

labsctfmultimodalcipher

labsctfpolyglotmultilingual

Polyglot Injection Challenge

Craft a single injection payload that works across 5 different languages and 3 different model providers.

labsctfsystem-promptarchaeology

Prompt Archaeology: Deep System Prompt Recovery

Recover fragments of a deeply hidden, multi-layer system prompt using various extraction techniques.

RAG Treasure Hunt: Vector DB Infiltration

Infiltrate a vector database by crafting documents that surface for specific queries to reveal flag fragments.

labsctfragvector-db

labsctfreasoningchain-hijack

Reasoning Chain Hijack: CoT Exploitation CTF

Hijack a reasoning model's chain-of-thought to redirect its conclusions while maintaining coherent reasoning.

labsctfred-vs-bluetournament

Red vs Blue Tournament Challenge

Alternate between attacking and defending LLM systems in a scored tournament format.

labsctfstealthexfiltration

Stealth Exfiltration Challenge: Zero Detection

Exfiltrate data from a monitored LLM system without triggering any of the deployed anomaly detectors.

labsctfsupply-chainbackdoor-detection

Supply Chain Detective: Find the Backdoor

Analyze a model pipeline to find where a backdoor was inserted — data, training, or post-processing.

labsctfsleeper-agentdetection

Time Bomb Defusal: Sleeper Agent Detection

Detect and neutralize a sleeper agent behavior trigger hidden in a fine-tuned model before it activates.

labsctftoken-smugglingadversarial

Token Smuggler Extreme: Adversarial Token Crafting

Craft adversarial token sequences under extreme character limits to achieve target model behaviors.

labsteeringexpertlabsactivation

Activation Steering for Exploitation

Use activation steering techniques to manipulate model behavior at the representation level without prompt modification.

labsbackdoordetection-evasionexpert

Backdoor Detection Evasion

Insert backdoors into fine-tuned models that evade state-of-the-art detection methods.

architecturecrosstransferlabexpertlabs

Cross-Architecture Transfer Lab

Develop attacks on one model architecture that reliably transfer to fundamentally different architectures.

exploitationlabexpertbehavioremergentlabs

Emergent Behavior Exploitation

Identify and exploit emergent behaviors in frontier models that arise from scale and are not present in smaller models.

labexpertformalverificationlabslimits

Formal Verification Limits Lab

Explore the limits of formal verification for neural network behavior and identify gaps exploitable by adversaries.

labexpertguidedattacklabsinterpretability

Interpretability-Guided Attack Design

Use mechanistic interpretability to identify exploitable circuits and design targeted attacks.

labsmodel-extractionapiexpert

Model Extraction via API Access

Extract a functionally equivalent model using only API query access.

routingmoeexploitationlabexpertlabs

MoE Routing Exploitation Lab

Exploit Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.

neurallablabsexpertbackdoorengineering

Neural Backdoor Engineering

Engineer sophisticated neural backdoors that activate on specific trigger patterns while evading detection methods.

labsquantizationexploitationexpert

Quantization-Induced Vulnerability Exploitation

Exploit behavioral differences between full-precision and quantized models.

labsreasoningchain-of-thoughtexpert

Reasoning Model Chain-of-Thought Exploitation

Exploit extended thinking and chain-of-thought reasoning in models like o1 and DeepSeek-R1.

labsrepresentation-engineeringattacksexpert

Representation Engineering Attacks

Manipulate internal model representations to alter behavior without prompt modification.

attackstimelabexperttraininglabs

End-to-End Training Time Attacks

Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.

labsactivation-patchingbehavior-modificationexpert

Activation Patching for Targeted Behavior Modification

Use activation patching techniques to modify specific model behaviors without retraining or fine-tuning.

labsalignment-fakingdetectionexpert

Alignment Faking Detection Lab

Implement detection methods for alignment faking behaviors where models behave differently during evaluation versus deployment.

labscontinuous-learningpipeline-poisoningexpert

Continuous Learning Pipeline Poisoning

Exploit continuous/online learning pipelines by injecting adversarial examples that gradually shift model behavior.

labsdeceptive-alignmenttestingexpert

Deceptive Alignment Testing Framework

Build a testing framework for detecting mesa-optimization and deceptive alignment in fine-tuned models.

labsemergent-capabilitiessecurity-probingexpert

Emergent Capability Security Probing

Design evaluations that discover security-relevant emergent capabilities in frontier language models.

labsfederated-learningpoisoningexpert

Federated Learning Poisoning Attack

Execute model poisoning attacks in a federated learning setting through adversarial participant manipulation.

labsformal-verificationsafety-propertiesexpert

Formal Verification of LLM Safety Properties

Apply formal verification methods to prove or disprove safety properties of LLM-based applications.

labsgradient-baseduniversal-promptexpert

Gradient-Based Universal Prompt Search

Implement gradient-based optimization to search for universal adversarial prompts that transfer across contexts.

labskv-cachemanipulationinferenceexpert

KV Cache Manipulation Attacks

Manipulate key-value caches in inference servers to inject or alter cached context across requests.

labsmechanistic-interpretabilityexploit-discoveryexpert

Mechanistic Interpretability for Exploit Discovery

Use mechanistic interpretability tools to discover exploitable circuits and features in transformer models.

labsmembership-inferenceproductionexpert

Membership Inference Against Production LLMs

Implement membership inference attacks to determine whether specific data was used in training an LLM.

labsmodel-extractionknowledge-distillationexpert

Model Extraction via Knowledge Distillation

Extract a functionally equivalent model from a commercial API using systematic distillation queries.

labsmodel-mergingbackdoorpropagationexpert

Model Merging Backdoor Propagation

Demonstrate how backdoors propagate through model merging techniques like TIES, DARE, and spherical interpolation.

labsmoeroutingexploitationexpert

MoE Routing Exploitation

Exploit Mixture-of-Experts routing mechanisms to activate specific expert networks for adversarial purposes.

labsmulti-agentconsensus-poisoningexpert

Multi-Agent Consensus Poisoning

Poison multi-agent consensus mechanisms to manipulate collective decision-making in agent swarms.

labsrepresentation-engineeringbehavior-steeringexpert

Representation Engineering for Behavior Steering

Use representation engineering to steer model behavior by manipulating activation vectors during inference.

labsreward-hackingrlhfexpert

Reward Hacking in RLHF Pipelines

Craft adversarial training examples that exploit reward model weaknesses in RLHF training pipelines.

labssafety-removalminimal-dataexpert

Minimal-Data Safety Fine-Tuning Removal

Determine the minimum number of fine-tuning examples needed to remove safety alignment from various model families.

labsspeculative-decodingside-channelexpert

Speculative Decoding Side-Channel Attacks

Exploit speculative decoding implementations to extract information about draft and verifier model behavior.

labstoken-probabilitysystem-promptreconstructionexpert

Token Probability System Prompt Reconstruction

Reconstruct complete system prompts using token-by-token probability analysis through API access.

labswatermarkremovalexpert

LLM Watermark Removal Attacks

Develop techniques to remove or corrupt watermarks embedded in LLM-generated text.

labshands-onpracticectfsimulationsexercises

Labs & Hands-On Practice

Progressive hands-on lab exercises spanning beginner to expert difficulty, including CTF challenges and full engagement simulations for AI red teaming skill development.

Intermediate AI Red Team Labs

Overview of intermediate-level hands-on labs covering agent exploitation, RAG poisoning, multi-turn attacks, automated campaigns, and advanced injection techniques for AI red teaming.

labsintermediate

encodingchainsintermediateadvancedlablabs

Advanced Encoding Chain Attacks

Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.

labsfunction-callingadvancedintermediate

Lab: Advanced Function Calling Exploitation

Exploit advanced function calling patterns including nested calls, parallel execution, and schema manipulation.

labsadversarialprompt-generationintermediate

Lab: Automated Adversarial Prompt Generation

Build automated systems for generating adversarial prompts using mutation, crossover, and selection strategies.

labsagentmemoryintermediate

Agent Memory Manipulation

Exploit persistent memory in LLM agents to plant false context that persists across sessions.

labsagent-workflowtestingintermediate

Lab: Agent Workflow Security Testing

Test multi-step agent workflows for injection points, state manipulation, and unauthorized action execution.

intermediateabuselabpatternslabsapi

API Abuse Pattern Lab

Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.

labsapifuzzingintermediate

API Parameter Fuzzing for LLMs

Systematically fuzz LLM API parameters to discover unexpected behaviors and bypasses.

labsapi-versioningsecurity-gapsintermediate

Lab: API Versioning Security Gaps

Exploit version differences in LLM APIs to find deprecated endpoints and backward-compatible safety bypasses.

labscanary-tokenstestingintermediate

Lab: Canary Token Effectiveness Testing

Deploy and test canary tokens in system prompts to evaluate their effectiveness at detecting prompt extraction.

labschain-of-thoughtexploitationintermediate

Lab: Chain-of-Thought Exploitation

Exploit chain-of-thought reasoning to leak internal model reasoning, bypass safety filters, and manipulate decision processes.

labscicdpipeline-testingintermediate

Lab: CI/CD Pipeline Security for LLM Apps

Test CI/CD pipeline security for LLM applications including prompt template injection, model artifact tampering, and deployment attacks.

labsclassifierevasion-advancedintermediate

Lab: Advanced Classifier Evasion

Evade advanced input classifiers using adversarial perturbation, semantic paraphrasing, and gradient-free optimization.

classifierintermediateevasionlablabs

ML Classifier Evasion Lab

Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.

labscloud-platformtestingintermediate

Lab: Cloud AI Platform Security Testing

Test cloud AI platform configurations including AWS Bedrock guardrails, Azure content filters, and GCP safety settings.

labscode-generationsecurity-testingintermediate

Lab: Code Generation Security Testing

Test LLM code generation for insecure patterns, injection vulnerabilities, and code execution safety issues.

labscompliancetestingintermediate

Lab: AI Compliance Testing Fundamentals

Test LLM applications against OWASP LLM Top 10 and NIST AI RMF requirements using structured test procedures.

labscontext-windowoptimizationintermediate

Lab: Context Window Attack Optimization

Optimize prompt injection placement within the context window to maximize attack effectiveness using attention dynamics.

contextintermediatelabpoisoningwindowlabs

Context Window Poisoning Lab

Exploit context window management to inject persistent adversarial content that influences future model responses.

intermediatesplittingconversationlablabs

Conversation Splitting Attack Lab

Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.

labsconversationstate-poisoningintermediate

Conversation State Poisoning

Exploit conversation history manipulation to alter model behavior across turns.

labscross-contextinjectionintermediate

Cross-Context Injection

Inject prompts that persist across separate conversation contexts in shared deployments.

labswordlistsjailbreakingintermediate

Custom Jailbreak Wordlist Generation

Generate targeted jailbreak wordlists using mutation strategies and seed prompts.

labsexfiltrationside-channelsintermediate

Data Exfiltration via Side Channels

Extract data from LLM systems using timing, token probability, and formatting side channels.

labsdefense-bypasschainsintermediate

Building Defense Bypass Chains

Combine multiple bypass techniques into reliable chains that defeat layered defenses.

labsdefenseregression-testingintermediate

Lab: Defense Regression Testing Setup

Build a regression testing framework to continuously verify that LLM defenses remain effective against known attack patterns.

intermediateinjectionlablabsragdocument

Document-Based RAG Injection Lab

Inject adversarial content into documents that will be processed by a RAG system to influence model responses.

labsembeddingmanipulation-advancedintermediate

Lab: Advanced Embedding Manipulation

Manipulate embedding vectors to achieve target similarity scores for RAG poisoning and retrieval manipulation.

intermediatelabpracticalmanipulationembeddinglabs

Practical Embedding Manipulation

Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.

labsembeddingssimilarityintermediate

Embedding Similarity Attacks

Manipulate text to achieve target embedding similarity scores for retrieval poisoning.

fineintermediatetuninglablabsapi

Fine-Tuning API Security Probing

Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.

labsfine-tuningsafety-testingintermediate

Lab: Fine-Tuning Safety Impact Testing

Measure how fine-tuning affects model safety by comparing pre and post fine-tuning safety benchmark scores.

labsguardrail-bypasstechniquesintermediate

Lab: Guardrail Bypass Technique Laboratory

Practice guardrail bypass techniques against NeMo Guardrails, LLM Guard, and custom classifier-based defenses.

labsguardrailsfingerprintingintermediate

Guardrail Fingerprinting

Systematically map the rules and thresholds of input/output guardrail systems.

guardrailruleintermediatelablabsextraction

Guardrail Rule Extraction Lab

Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.

labshallucinationexploitationintermediate

Lab: Exploiting LLM Hallucinations

Exploit hallucination tendencies to trigger fabricated tool calls, invented API endpoints, and false fact injection.

labsincident-responsesimulationintermediate

Lab: AI Incident Response Simulation

Practice AI incident response procedures through a simulated prompt injection incident with escalation and containment.

labsindirect-injectionwebintermediate

Indirect Injection via Web Content

Plant prompt injection payloads in web pages consumed by RAG-enabled LLM applications.

labsinstruction-followingmeasurementintermediate

Lab: Instruction Following Measurement

Quantitatively measure instruction following compliance to identify where models prioritize competing instructions.

labsllm-judgemanipulationintermediate

LLM Judge Manipulation

Craft responses that exploit LLM-as-judge evaluation patterns to achieve high safety scores while embedding harmful content.

labslogprobsanalysisintermediate

Lab: Log Probability Analysis for Attacks

Use token log probabilities to extract information about training data, system prompts, and model internals.

labsmcpadvanced-testingintermediate

Lab: Advanced MCP Protocol Testing

Perform advanced security testing of MCP servers including transport analysis, capability enumeration, and tool poisoning.

labsmcptool-poisoningintermediate

MCP Tool Poisoning Attack

Exploit MCP tool descriptions to inject instructions that redirect agent behavior.

labsmcp-fuzzingprotocolintermediate

Lab: Model Context Protocol Fuzzing

Fuzz MCP protocol messages to discover parsing errors, validation gaps, and unexpected behavior in MCP servers.

fingerprintingintermediatelabpracticallabsmodel

Practical Model Fingerprinting Lab

Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.

labsmonitoringsecurity-setupintermediate

Lab: LLM Security Monitoring Setup

Deploy a comprehensive security monitoring system for LLM applications with anomaly detection and alerting.

labsmulti-agentcommunication-testingintermediate

Lab: Multi-Agent Communication Testing

Test security of inter-agent communication in multi-agent systems including message injection and trust boundary violations.

intermediatebypasslablanguagelabsmulti

Multi-Language Safety Bypass Lab

Exploit gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.

labsmultilingualinjectionintermediate

Multi-Language Injection Attacks

Exploit language switching and low-resource language gaps to bypass safety training.

labsmulti-modelchainingintermediate

Multi-Model Attack Chaining

Chain attacks across multiple LLM models in a pipeline to bypass per-model defenses.

labsmulti-modelcomparison-testingintermediate

Lab: Multi-Model Comparison Security Testing

Compare security postures across multiple LLM providers by running identical attack suites and analyzing differential responses.

pipelineintermediatelablabsmultimodel

Multi-Model Pipeline Attack Lab

Attack a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.

labsmulti-turnstate-trackingintermediate

Lab: Multi-Turn State Tracking Attacks

Exploit conversation state tracking to build persistent attack context across multiple conversation turns.

labsmultimodalintermediate-testingintermediate

Lab: Intermediate Multimodal Security Testing

Test multimodal LLMs with crafted images containing embedded text, adversarial perturbations, and visual injection payloads.

discoverychannelintermediateoutputlablabs

Output Channel Discovery Lab

Discover alternative output channels in LLM applications that bypass standard output filtering.

labsoutput-schemaexploitationintermediate

Lab: Output Schema Exploitation

Exploit structured output schemas to force models into revealing information through required field population.

labsprompt-cachingside-channelintermediate

Lab: Prompt Caching Side-Channel Attacks

Exploit prompt caching mechanisms to detect cached prompts and extract information through timing side channels.

intermediatetechniqueslabpromptlabsleakage

Prompt Leakage Technique Lab

Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.

labspyritautomatedintermediate

Automated Red Teaming with PyRIT

Use Microsoft's PyRIT framework to run automated multi-turn attack campaigns.

labsragadvanced-testingintermediate

Lab: Advanced RAG Security Testing

Test RAG systems for chunking exploitation, reranking manipulation, and cross-document injection attacks.

labsragpoisoningintermediate

RAG Context Poisoning

Poison a vector database to inject adversarial content into RAG retrieval results.

intermediatechainlabhijackingreasoninglabs

Reasoning Chain Hijacking

Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.

labsrole-hierarchyexploitationintermediate

Lab: Role Hierarchy Exploitation

Exploit multi-role LLM architectures by manipulating the instruction hierarchy between system, user, and assistant roles.

labssafety-filterbenchmarkingintermediate

Lab: Safety Filter Benchmarking

Benchmark safety filters across providers using standardized test suites to compare detection rates and false positives.

craftingintermediatesemanticlabdeceptionlabs

Semantic Deception Crafting Lab

Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.

labssemanticinjectionintermediate

Semantic Injection Crafting

Craft semantically coherent injections that evade both classifiers and human review.

labssemantic-searchmanipulationintermediate

Lab: Semantic Search Manipulation

Manipulate semantic search results by crafting documents that achieve target similarity scores against specific queries.

labssession-isolationtestingintermediate

Lab: Session Isolation Testing

Test session isolation in multi-tenant LLM deployments to detect cross-session data leakage and context contamination.

exploitationintermediatesessionlablabsstate

Session State Exploitation Lab

Exploit session management vulnerabilities in LLM applications to access other users data or escalate privileges.

labsstreamingresponse-manipulationintermediate

Lab: Streaming Response Manipulation

Exploit streaming response mechanisms to bypass output filters that only check complete responses.

intermediatebypassoutputlabstructuredlabs

Structured Output Safety Bypass

Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.

labsstructured-outputexploitationintermediate

Structured Output Exploitation

Exploit JSON mode and structured output constraints to bypass safety filters.

labssystem-promptreconstructionintermediate

Lab: System Prompt Reconstruction (Intermediate Lab)

Reconstruct full system prompts through iterative partial extraction and cross-referencing multiple leakage vectors.

labstoken-smugglingadvancedintermediate

Advanced Token Smuggling

Use Unicode homoglyphs, zero-width characters, and tokenizer quirks to smuggle payloads.

labstool-resultinjectionintermediate

Lab: Tool Result Injection Attacks

Inject adversarial content through tool call results to poison model reasoning and redirect subsequent actions.

labsunicodenormalization-bypassintermediate

Lab: Unicode Normalization Bypass Attacks

Exploit Unicode normalization differences between input validation and model processing to smuggle injection payloads.

labsvector-dbaccess-controlintermediate

Lab: Vector Database Access Control Testing

Test vector database access controls for bypass vulnerabilities including namespace traversal and filter manipulation.

labsagentgoal-hijackingintermediate

Agent Goal Hijacking

Redirect an AI agent's objectives through carefully crafted inputs that override its primary task.

labsassistant-prefillinjectionintermediate

Assistant Prefill Injection Attacks

Exploit assistant message prefilling to prime model responses and bypass safety alignment.

labschat-historymanipulationintermediate

Chat History Manipulation Attacks

Inject false messages into conversation history to alter model behavior in subsequent turns.

labsclassifierboundary-mappingintermediate

Input Classifier Boundary Mapping

Systematically probe input classifiers to map their decision boundaries and find bypass regions.

labscontext-windowboundaryintermediate

Context Window Boundary Exploitation

Exploit the boundaries of context windows to push safety instructions beyond the model's attention.

labscrescendoimplementationintermediate

Crescendo Attack Implementation

Implement Microsoft's Crescendo multi-turn escalation attack with automated conversation management.

labscross-pluginchainingintermediate

Cross-Plugin Exploitation Chaining

Chain exploits across multiple plugins/tools in an agent system to achieve multi-step attacks.

labsfunction-schemamanipulationintermediate

Function Schema Manipulation

Exploit function calling schemas to inject instructions through parameter descriptions and enums.

labsgarakcustom-probesintermediate

Building Custom Garak Probes

Develop custom probe modules for NVIDIA Garak to test organization-specific vulnerability patterns.

labsguardrailslatencydetectionintermediate

Guardrail Latency-Based Detection

Use timing side channels to identify and characterize guardrail implementations in LLM applications.

labsinspect-aicustom-evaluationsintermediate

Custom Inspect AI Safety Evaluations

Build custom safety evaluation tasks in Inspect AI to test application-specific vulnerability classes.

labsjailbreakbenchcomparativeintermediate

JailbreakBench Comparative Testing

Run standardized jailbreak benchmarks across multiple models and compare vulnerability profiles.

labsjson-modesafety-bypassintermediate

JSON Mode Safety Bypass

Exploit structured JSON output modes to bypass safety filters that only check plain-text responses.

labsllm-guardevasionintermediate

LLM Guard Evasion Techniques

Develop evasion techniques against LLM Guard input scanners and output detectors.

labsmany-shotjailbreakingintermediate

Implementing Many-Shot Jailbreaking

Implement Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.

labsmcpresourceexfiltrationintermediate

MCP Resource Data Exfiltration (Intermediate Lab)

Exploit MCP resource URIs to access and exfiltrate data beyond intended scope boundaries.

labsmultilingualsafety-gapsintermediate

Multi-Language Safety Gap Exploitation

Systematically identify and exploit safety training gaps across low-resource languages.

labsmulti-turnstate-trackingintermediate

Multi-Turn State Tracking Exploitation

Exploit state management in multi-turn conversations to gradually shift model behavior over many turns.

labsnemo-guardrailsbypassintermediate

NeMo Guardrails Bypass Techniques

Test and bypass NVIDIA NeMo Guardrails configurations using known evasion techniques.

labsoutput-filterbypassencodingintermediate

Output Filter Bypass via Encoding

Bypass output safety filters by instructing the model to encode responses in Base64, hex, or custom schemes.

labspdfinjectionragintermediate

PDF Document Injection for RAG Systems

Craft adversarial PDF documents that inject instructions when processed by RAG document loaders.

labsprompt-templatevariable-injectionintermediate

Prompt Template Variable Injection

Inject adversarial content through template variable expansion in prompt management systems.

labspyritcustom-strategiesintermediate

PyRIT Custom Attack Strategies

Build custom attack strategies and converters in Microsoft's PyRIT framework for targeted campaigns.

labsragdocument-injectionintermediate

RAG Document Injection Campaign

Design and execute a document injection campaign against a RAG-powered application with vector search.

labssemantic-searchpoisoningembeddingsintermediate

Semantic Search Poisoning

Craft adversarial documents that rank highly in semantic search for targeted queries in RAG systems.

labsstreamingresponseexfiltrationintermediate

Streaming Response Exfiltration

Exploit streaming API responses to exfiltrate data that might be caught by batch output filters.

labslogprobssystem-promptintermediate

System Prompt Extraction via Token Probabilities

Exploit token probability APIs to reconstruct hidden system prompts character by character.

labstokenizerboundary-manipulationintermediate

Token Boundary Manipulation

Exploit tokenizer-specific behavior by crafting inputs that split across token boundaries in unexpected ways.

labstool-resultinjectionintermediate

Tool Result Injection Attacks

Craft malicious tool return values that inject instructions back into the model's reasoning chain.

labsvisiontext-injectionimagesintermediate

Injecting Text via Images to VLMs

Embed adversarial text in images that vision-language models read and follow as instructions.

labswebhookexfiltrationintermediate

Lab: Data Exfiltration via Webhooks

Test data exfiltration paths through LLM-triggered webhooks, API callbacks, and external URL rendering.

vehiclesimautonomoussimulationslabs

Autonomous Vehicle AI Communication Assessment

Assess the AI communication and decision-making systems in an autonomous vehicle scenario.

labssimulationcode-reviewsecurity

Code Review Assistant Assessment

Test a code review AI for vulnerabilities in code analysis, suggestion generation, and repository access.

moderationsimcontentsimulationslabs

Content Moderation AI Assessment

Assess an AI content moderation system for bypass techniques, false positive manipulation, and adversarial content generation.

labssimulationcustomer-serviceagent

Customer Service Agent Red Team

Red team a customer service agent with tool access to order systems, refunds, and customer data.

labssimulationeducationtutor

Educational AI Tutor Assessment

Assess an AI tutoring system for content safety, answer manipulation, and student data exposure.

labssimulationfintechassessment

FinTech Chatbot Security Assessment

Conduct a full security assessment of a financial services chatbot handling sensitive transactions.

governmentservicessimsimulationslabs

Government Services AI Assessment

Assess an AI system that helps citizens navigate government services for data privacy and manipulation risks.

labssimulationhealthcarediagnostic

Healthcare Diagnostic AI Assessment

Assess a healthcare diagnostic AI for safety-critical vulnerabilities and data privacy compliance.

siminsurancesimulationslabsclaims

Insurance Claims AI Assessment

Conduct a complete red team assessment of an AI system that processes insurance claims and makes approval recommendations.

labssimulationlegalassessment

Legal AI Document Review Assessment

Assess a legal AI system that reviews contracts for vulnerabilities in document processing and privilege escalation.

simmedicaltriagesimulationslabs

Medical Triage AI Assessment

Assess an AI-powered medical triage system for safety-critical vulnerabilities and clinical decision manipulation.

labssimulationmulti-agentworkflow

Multi-Agent Workflow Assessment

Red team a multi-agent system with specialized agents communicating via A2A protocol.

simrecruitmentsimulationslabs

AI Recruitment System Assessment

Red team an AI recruitment system for bias exploitation, resume injection, and candidate data privacy issues.

smartsimhomesimulationslabs

Smart Home AI Assistant Assessment

Red team a smart home AI assistant with device control, user profile access, and purchase capabilities.

labssimulationsupply-chainpipeline

AI Supply Chain Pipeline Assessment

Assess the full ML pipeline from data ingestion through model deployment for supply chain attacks.

tradingsimsimulationslabsbot

AI Trading Bot Assessment

Red team an AI-powered trading system for market manipulation, data leakage, and adversarial input attacks.

labssimulationpentestingautomated-agent

Automated Pentesting Agent Assessment

Assess an AI-powered automated penetration testing agent for scope violations, unauthorized actions, and data handling.

labssimulationautonomous-vehiclesafety-critical

Autonomous Vehicle Planning AI Assessment

Assess the safety boundaries of an autonomous vehicle planning AI through adversarial scenario injection.

labssimulationcontent-moderationplatform

Content Moderation AI Platform Assessment

Assess an AI content moderation system for bypass techniques, false negative exploitation, and bias.

labssimulationcdpcustomer-data

Customer Data Platform AI Assessment

Red team an AI system integrated with a customer data platform handling PII, behavioral data, and segmentation.

labssimulationcyber-threat-intelsecurity

Cyber Threat Intelligence AI Assessment

Red team a cyber threat intelligence AI that processes IOCs, threat reports, and attack attribution.

labssimulationdata-analyticscopilot

Data Analytics Copilot Assessment

Red team a data analytics copilot with SQL generation capabilities and access to enterprise databases.

labssimulationdevopsinfrastructure

DevOps AI Assistant Security Assessment

Assess a DevOps AI assistant with access to CI/CD pipelines, cloud infrastructure, and deployment systems.

labssimulationenterpriseknowledge-management

Enterprise Knowledge Management AI Assessment

Assess an enterprise knowledge management system with role-based access, document permissions, and multi-tenant isolation.

labssimulationgovernmentcitizen-services

Government Services Chatbot Assessment

Red team a government citizen services chatbot that accesses tax records, benefits, and identity information.

labssimulationincident-responsesoc

AI-Powered Incident Response System Assessment

Red team an AI incident response system in a SOC environment with access to SIEM, EDR, and ticketing.

labssimulationinsuranceclaims

Insurance Claims Bot Security Assessment

Conduct a full security assessment of an insurance claims processing chatbot with access to policy and claims data.

labssimulationlegalcontract-review

Legal Contract Review AI Assessment

Red team an AI contract review system for privilege violations, document injection, and hallucinated clauses.

labssimulationmedicaltriage

Medical Triage Chatbot Assessment

Red team a medical triage chatbot for dangerous medical advice, data exposure, and safety-critical failures.

labssimulationmulti-agentresearch-team

Multi-Agent Research Team Assessment

Assess a multi-agent research team with specialized researcher, writer, and reviewer agents communicating via A2A.

labssimulationreal-estatevaluation

Real Estate Valuation AI Assessment

Assess an AI real estate valuation system for manipulation of property valuations and data exposure.

labssimulationrecruitinghr

AI Recruiting Assistant Assessment

Red team an AI recruiting assistant that screens resumes, schedules interviews, and accesses candidate data.

labssimulationsmart-homeiot

Smart Home AI Assistant Assessment (Simulation)

Assess a smart home AI assistant that controls lights, locks, cameras, and thermostat via tool integrations.

labssimulationsupply-chainoptimization

Supply Chain Optimization AI Assessment

Assess an AI supply chain optimization system for manipulation of demand forecasts and routing decisions.

labssimulationtradingfinance

AI Trading Assistant Security Assessment

Conduct a security assessment of an AI trading assistant with access to portfolio data and trade execution.

labssimulationtravelbooking-agent

Travel Booking Agent Red Team

Red team a travel booking AI agent with access to payment systems, loyalty programs, and personal data.