# lab

labtoolscomparisonmethodology

Lab: Tool Comparison — Same Target, 4 Tools

Hands-on lab comparing Garak, PyRIT, promptfoo, and Inspect AI against the same target model. Evaluate coverage, speed, finding quality, and practical trade-offs of each tool.

labembeddingshands-onpythonintermediate

Lab: Exploring Embedding Spaces

Hands-on lab using Python to visualize embedding spaces, measure semantic similarity, and demonstrate how adversarial documents can be crafted to match target queries.

labreconnaissanceattack-surfacemappingintermediate

Lab: Mapping an AI System's Attack Surface

Hands-on lab walking through reconnaissance of an AI system — identifying components, mapping data flows, enumerating tools, and documenting the attack surface.

Lab: Simulated Robot Control Exploitation

Hands-on lab exercises exploiting LLM-controlled robots in simulation: environment setup, injection attacks, safety bypass testing, and multi-step exploitation chains using PyBullet.

labrobotsimulation

advancedlabragmanipulationlabs

Advanced RAG Manipulation Lab

Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.

advancedlabadversarialrobustnessevallabs

Adversarial Robustness Evaluation

Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.

Lab: Generating Adversarial Suffixes

Implement the Greedy Coordinate Gradient (GCG) algorithm to generate adversarial suffixes that cause language models to comply with harmful requests by appending optimized token sequences.

labadversarial-suffixgcg

labattack-chainexploitmulti-stageadvanced

Lab: Attack Chain Construction

Hands-on lab for chaining three or more distinct vulnerabilities into a complete exploit sequence that achieves objectives impossible with any single technique alone.

labaudioadversarialmultimodaladvancedhands-on

Lab: Audio Adversarial Examples

Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.

autodanadvancedlabimplementationlabs

AutoDAN Implementation Lab

Implement the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.

labbackdoordetectionforensicsfine-tuning

Lab: Backdoor Detection in Fine-Tuned Models

Analyze a fine-tuned language model to find and characterize an inserted backdoor, using behavioral probing, activation analysis, and statistical testing techniques.

Lab: Continuous Automated Red Teaming Pipeline

Build an end-to-end CART pipeline that continuously generates, executes, and scores adversarial attacks against LLM applications, with alerting and trend tracking.

labcartcontinuous

labcloudassessmentend-to-endapi-securityadvancedhands-on

Lab: Cloud AI Security Assessment

Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.

labcloudassessmentinfrastructuresecurityadvanced

Lab: Cloud AI Assessment

Hands-on lab for conducting an end-to-end security assessment of a cloud-deployed AI system including infrastructure review, API testing, model security evaluation, and data flow analysis.

labcontainer-securitybreakoutinfrastructure

Lab: Containerized Model Breakout

Explore techniques for escaping from containerized AI applications to the host system, testing container isolation boundaries in ML deployment environments.

labcross-modeltransfer-attacksmulti-model

Lab: Cross-Model Transfer Attacks

Test whether jailbreaks discovered on one language model transfer effectively to others, building a systematic methodology for cross-model vulnerability research.

exploitationadvancedcrosslabpluginlabs

Cross-Plugin Exploitation Chain

Chain exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.

classifiercustomadvancedlabtraininglabs

Custom Safety Classifier Training

Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.

labharnessproductionframework

Lab: Building a Production Red Team Harness

Build a full-featured, production-quality red team harness with multi-model support, async testing, structured result storage, and HTML reporting.

labcustom-harnesstesting-frameworkdomain-specificadvancedhands-on

Lab: Custom Test Harness for Specific Applications

Build a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.

defenseadvancedlabteamredlabs

Defense vs Red Team War Game

Alternate between attacking and defending an LLM application to develop skills in both offensive and defensive operations.

labensemble-attacksmulti-modelautomated-attacks

Lab: Ensemble Attacks

Use multiple language models collaboratively to discover attack strategies that bypass any single model's defenses, leveraging model diversity for more effective red teaming.

labfederated-learningpoisoningexperthands-on

Lab: Federated Learning Poisoning Attack

Hands-on lab for understanding and simulating poisoning attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted gradient updates.

Lab: Inserting a Fine-Tuning Backdoor

Advanced lab demonstrating how fine-tuning can insert hidden backdoors into language models that activate on specific trigger phrases while maintaining normal behavior otherwise.

labfine-tuningbackdoor

labfuzzingautomated-testingboundary-testing

Lab: Build an AI Fuzzing Harness

Create a systematic fuzzing framework for testing LLM boundaries, generating and mutating inputs to discover unexpected model behaviors and safety edge cases.

labgpuside-channelhardwareinformation-leakage

Lab: GPU Side-Channel Attacks

Demonstrate information leakage through GPU memory residuals and timing side channels, showing how shared GPU infrastructure can expose sensitive model data.

advancedlabgradientguidedpoisoninglabs

Gradient-Guided Data Poisoning

Use gradient information from open-source models to craft optimally poisoned training examples.

Lab: Chaining Guardrail Bypasses

Advanced lab on identifying, isolating, and chaining multiple guardrail bypass techniques to defeat layered defense systems in production LLM applications.

labguardrailsbypass

labforensicsincident-responseinvestigation

Lab: AI Incident Investigation

Investigate logs and artifacts from a compromised AI system to reconstruct the attack chain, identify the vulnerability exploited, and determine the scope of the breach.

labinference-serverinfrastructurevllmtriton

Lab: Inference Server Exploitation

Attack vLLM, TGI, and Triton inference servers to discover information disclosure vulnerabilities, denial-of-service vectors, and configuration weaknesses in model serving infrastructure.

labmcpexploitationtool-usefull-chainadvanced

Lab: MCP Full Exploitation

Hands-on lab for conducting a complete Model Context Protocol server compromise including tool enumeration, permission escalation, cross-tool attacks, and data exfiltration through MCP channels.

labmodel-comparisonsafety-evaluationbenchmarking

Lab: Model Security Comparison

Systematically compare the safety posture of major language models using a standardized test suite, building quantitative security profiles for GPT-4, Claude, and Gemini.

distillationadvancedlabattacklabsmodel

Model Distillation Security Lab

Extract model capabilities through distillation techniques using only black-box API access.

labmodel-servinginfrastructuretensorflow-servingtorchserve

Lab: Model Serving Framework Attacks

Exploit vulnerabilities in TensorFlow Serving, TorchServe, and Triton Inference Server, targeting model loading, API endpoints, and management interfaces.

labmulti-modelcomparativebenchmarking

Lab: Multi-Model Comparative Red Teaming

Test the same attack suite across GPT-4, Claude, Llama, and Gemini. Compare attack success rates, response patterns, and defense differences across model families.

advancedlablabsattackmultiobjective

Multi-Objective Attack Optimization

Optimize attack payloads for multiple simultaneous objectives: jailbreaking, data extraction, and defense evasion.

labmulti-turnconversation-steeringcrescendoadvanced

Lab: Advanced Multi-Turn Attacks

Hands-on lab for executing 20+ turn conversation steering strategies that gradually erode AI safety boundaries to extract protected information through patient, escalating dialogue.

chainadvancedlabmultimodalattacklabs

Multimodal Attack Chain Lab

Chain attacks across text, image, and structured data modalities to exploit multimodal system vulnerabilities.

labmultimodalpipelinevision

Lab: Multimodal Attack Pipeline

Build an automated multimodal attack pipeline that generates adversarial images, combines them with text prompts, and tests against vision-language models (VLMs).

labpairautomatedattack-algorithm

Lab: Implementing PAIR Attack

Implement the PAIR (Prompt Automatic Iterative Refinement) algorithm where an attacker LLM iteratively refines jailbreak prompts against a target LLM until a successful attack is found.

labpairautomated-attacksjailbreaking

Lab: PAIR Attack Implementation

Build a complete Prompt Automatic Iterative Refinement system that uses an attacker LLM to automatically generate and refine jailbreak prompts against a target model.

productionadvancedlabsimulationlabs

Production Environment Simulation Lab

Test attacks against a simulated production environment with realistic logging, monitoring, and alerting.

labpurple-teamattack-defensecollaborativeadvancedhands-on

Lab: Purple Team Exercise

Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.

labpurple-teamattack-defensemetricsadvanced

Lab: Purple Team Exercise (Advanced Lab)

Hands-on lab for conducting simultaneous attack and defense operations against an AI system with real-time metrics tracking, adaptive defense deployment, and coordinated red-blue team workflows.

labragfull-chaindocument-injectionexfiltrationadvanced

Lab: RAG Full Chain Attack

Hands-on lab for executing a complete RAG attack chain from document injection through retrieval manipulation to data exfiltration, targeting every stage of the Retrieval-Augmented Generation pipeline.

labreasoningo1chain-of-thought

Lab: Reasoning Model Exploitation

Attack reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection techniques.

laborchestrationautomationred-team-ops

Lab: Red Team Orchestration

Build an orchestration system that coordinates multiple attack strategies simultaneously, managing parallel attack campaigns and synthesizing results into comprehensive risk assessments.

labregression-testingsafetyautomationci-cd

Lab: Safety Regression Testing at Scale

Build automated pipelines that detect safety degradation across model versions, ensuring that updates and fine-tuning do not introduce new vulnerabilities or weaken existing protections.

probingadvancedlabrepresentationlabs

Representation Probing for Vulnerabilities

Probe internal model representations to identify exploitable features and develop representation-level attacks.

analysisadvancedlabrewardlabsmodel

Reward Model Analysis Lab

Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.

labroutingmodel-selectionarchitecture-attacks

Lab: Model Routing Exploitation

Attack systems that route requests to different models based on complexity or content, exploiting routing logic to reach less-defended models or bypass safety filters.

labtaptree-of-attacksalgorithm

Lab: TAP Algorithm Implementation

Implement the TAP (Tree of Attacks with Pruning) algorithm that uses tree-based search over attack prompts with branch pruning to efficiently find jailbreaks.

labtraining-dataextractionprivacy

Lab: Training Data Extraction at Scale

Extract memorized training data from language models using prefix-based extraction, divergence testing, and membership inference. Measure extraction rates and assess privacy risks.

labtransfer-attackscross-modeladversarialadvanced

Lab: Transfer Attack Development

Hands-on lab for crafting adversarial prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.

labtransfer-attacksadversarialcross-modeladvancedhands-on

Lab: Transfer Attack Development (Advanced Lab)

Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.

labanthropicclaudeapibeginner

Lab: Anthropic Claude API Basics

Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.

explorationlabbeginnerlabsmultiapi

Multi-Provider API Exploration

Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.

labapi-keyssecuritysecrets-managementbeginner

Lab: API Key Security

Learn common API key exposure vectors, secure key management with .env files, detect keys in git history, implement key rotation, and apply least-privilege principles.

Lab: API-Based Model Testing

Learn to test language models through their APIs including OpenAI, Anthropic, and local Ollama endpoints. Build reusable API testing functions with proper error handling.

labapitestingopenai

basiclabbeginnerlabsautomatedtesting

Basic Automated Testing Setup

Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.

labdefenseinput-filterregexbeginner

Lab: Build Your First Defense

Hands-on lab for building a basic AI input defense system using keyword matching, regex filters, and simple classifiers to block prompt injection attempts.

testingdefensebasiclabbeginnerlabs

Basic Defense Mechanism Testing

Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.

labdefenseinput-filterbeginnerhands-on

Lab: Build Your First Defense (Beginner Lab)

Create a simple input filter that blocks common prompt injection patterns, then test it against the attack techniques you have learned in previous labs.

reportingbasiclabbeginnerexerciselabs

Security Finding Documentation Exercise

Practice documenting security findings in a professional format with reproducible steps and impact assessment.

labmodel-comparisonsafetybenchmarkingbeginnerhands-on

Lab: Model Comparison

Test the same attack techniques against different language models and compare their safety behaviors, refusal patterns, and vulnerability profiles.

labcomparisonsafetymulti-modelbeginner

Lab: Compare Model Safety

Hands-on lab for running identical safety tests against GPT-4, Claude, Gemini, and Llama to compare how different models handle prompt injection, jailbreaks, and safety boundary enforcement.

labcontext-manipulationprompt-injectionpersonabeginnerhands-on

Lab: Context Manipulation

Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.

labcontext-overflowprompt-injectionattentionbeginner

Lab: Context Overflow Attacks

Explore context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.

historyconversationlabbeginnerbasicslabs

Conversation History Analysis

Analyze how conversation history influences model behavior and identify opportunities for context manipulation.

labdefense-bypassevasionfiltersbeginnerhands-on

Lab: Defense Bypass Basics

Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.

labdefense-evasionfiltersbypass

Lab: Basic Defense Evasion

Encounter and bypass basic LLM defenses including keyword filters, content classifiers, and input preprocessing through encoding tricks, synonym substitution, and payload splitting.

labdelimiter-injectionprompt-injectionxmlmarkdownbeginner

Lab: Delimiter Injection Attacks

Craft payloads that exploit XML, markdown, and special token delimiters to break instruction boundaries and manipulate how LLMs parse system prompts versus user input.

labdelimiter-escapeprompt-injectionboundariesbeginnerhands-on

Lab: Delimiter Escape Attacks

Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.

securitylabbeginnerembeddingbasicslabs

Embedding Basics for Security

Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.

Lab: Encoding & Obfuscation Techniques

Hands-on lab exploring how text encoding, character substitution, and obfuscation methods can bypass LLM content filters and safety mechanisms.

labencodingobfuscation

Lab: Setting Up Your Red Team Environment

Step-by-step guide to setting up a complete AI red teaming environment with Python, API clients, scanning tools, and local models.

labenvironmentsetuptools

errorlabbeginnerintelligencelabsmessage

Error Message Intelligence Gathering

Extract system architecture information from error messages and response patterns in LLM applications.

labethicsresponsibledocumentationbeginnerhands-on

Lab: Ethical Red Teaming

Practice responsible AI red teaming with proper documentation, scope management, and ethical decision-making frameworks.

labethicsresponsible-testingdocumentationbeginner

Lab: Ethical Red Teaming (Beginner Lab)

Hands-on lab for practicing responsible AI red teaming with proper documentation, scope management, ethical boundaries, and disclosure procedures.

labfew-shotin-context-learningprompt-injectionbeginner

Lab: Few-Shot Manipulation Attacks

Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired adversarial behavior through fabricated conversation examples.

labprompt-injectionbeginnerhands-on

Lab: Your First Prompt Injection

Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.

labjailbreakollamabeginnerhands-on

Lab: Your First Jailbreak

Try basic jailbreak techniques against a local model using Ollama, learning the difference between prompt injection and jailbreaking through hands-on experimentation.

Lab: Scanning with Garak

Install and run the Garak LLM vulnerability scanner against local models. Learn to configure probes, generators, and evaluators for comprehensive scanning.

labgarakscanningtools

labgaraktool-setupvulnerability-scannerbeginnerhands-on

Lab: Garak Setup and First Scan

Install and configure garak, the LLM vulnerability scanner, then run your first automated scan against a local model to discover common vulnerabilities.

injectionlabbeginnerlabshelloworld

Hello World Prompt Injection

Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.

labinjection-detectiondefenseclassificationbeginnerhands-on

Lab: Injection Detection Tool

Build a basic prompt injection detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.

labprompt-injectionsurveytechniquesbeginnerhands-on

Lab: Injection Techniques Survey

Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.

labinstruction-followingpriorityconflicting-instructionsbeginnerhands-on

Lab: Instruction Following Priority

Test how language models prioritize conflicting instructions from system prompts, user messages, and embedded directives to understand the instruction hierarchy.

labjailbreakbeginnertechniques

Lab: Basic Jailbreak Techniques

Hands-on exploration of jailbreak techniques including role-play, DAN-style prompts, and academic framing against multiple models.

exploitationoutputlabbeginnerlabsjson

JSON Output Exploitation Basics

Exploit LLM JSON mode to bypass content filters and extract information through structured output channels.

labmarkdown-injectiondata-exfiltrationrenderingbeginner

Lab: Markdown Injection

Inject images, links, and formatting into LLM responses that exfiltrate data or alter display rendering in chat interfaces.

comparisonsecuritylabbeginnerlabsmodel

Model Security Comparison Lab

Compare the security posture of different LLM models by running identical test suites across providers.

labmulti-languageprompt-injectiontranslationbeginnerhands-on

Lab: Multi-Language Injection

Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.

labmulti-turnescalationsocial-engineeringbeginner

Lab: Multi-Turn Escalation Attacks

Build trust over several conversation messages, then exploit established rapport to bypass safety mechanisms through gradual, multi-turn escalation techniques.

labollamasetuplocal-modelsbeginner

Lab: Setting Up Ollama for Local LLM Testing

Install and configure Ollama for local LLM red teaming, download models, perform basic interactions, and compare safety behavior between local and API-hosted models.

outputformatlabbeginnercontrollabs

Output Format Control Lab

Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.

laboutput-formatexploitationdata-leakagebeginnerhands-on

Lab: Output Format Exploitation

Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.

laboutputformatmanipulation

Lab: Output Format Manipulation

Hands-on lab on forcing LLMs to output in specific formats including JSON injection, markdown manipulation, and structured output exploitation for data exfiltration.

laboutput-steeringprompt-injectionoutput-manipulationbeginnerhands-on

Lab: Output Steering

Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.

labpayload-craftingprompt-injectionred-teamingbeginnerhands-on

Lab: Payload Crafting

Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.

labencodingbypasscontent-filtersbeginner

Lab: Encoding Bypasses

Test Base64, ROT13, hex, and Unicode encoding techniques to evade LLM content filters while preserving the semantic meaning of adversarial payloads.

labbeginnerlabsplaygroundtesting

LLM Playground Security Testing

Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.

labprompt-leakingsystem-promptextractionbeginnerhands-on

Lab: Prompt Leaking Basics

Learn to extract system prompts from chatbots using common leaking techniques including direct requests, indirect elicitation, and format manipulation.

analysisstructurelabbeginnerpromptlabs

Prompt Structure Analysis Lab

Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.

labpromptfootool-setupevaluationbeginnerhands-on

Lab: Promptfoo Setup and First Eval

Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.

labpyrittool-setupred-teamingmicrosoftbeginnerhands-on

Lab: PyRIT Setup and First Attack

Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated red teaming, then run your first orchestrated attack against a local model.

mappingratelabbeginnerlimitlabs

Rate Limit and Quota Mapping

Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.

analysisresponselabbeginnerbasicslabs

Response Analysis Fundamentals

Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.

labresponse-analysisfingerprintingsafety-refusalsbeginner

Lab: Analyzing LLM Responses

Learn to identify safety refusals, partial compliance, inconsistent behavior, and response fingerprinting techniques when analyzing LLM outputs during red team testing.

labrole-playpersonajailbreakbeginnerhands-on

Lab: Role-Play Attacks

Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.

Lab: Role-Play & Persona Attacks

Hands-on lab exploring role-play and persona-based jailbreak techniques, where you craft scenarios that trick LLMs into adopting alternative identities to bypass safety guardrails.

labrole-playpersona

labsafetyboundariesmappingbeginnerhands-on

Lab: Mapping Safety Boundaries

Systematically discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.

safetymappinglabbeginnerboundarylabs

Safety Boundary Mapping

Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.

probingsafetylabbeginnertraininglabs

Safety Training Boundary Probing

Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.

labharnessautomationpython

Lab: Building a Simple Test Harness

Build a reusable Python test harness that automates sending test prompts, recording results, and calculating attack success metrics.

labstructured-outputjsonschema-injectionbeginner

Lab: Structured Output Manipulation

Explore how JSON mode, schema injection, and forced output formats can bypass LLM safety filters and manipulate structured responses.

labsystem-promptextractionreconnaissance

Lab: System Prompt Extraction

Hands-on techniques for extracting hidden system prompts from LLM-powered applications using direct asking, role-play, instruction replay, and other methods.

labprompt-injectionsystem-promptoverridebeginnerhands-on

Lab: System Prompt Override

Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.

labsystem-promptreconnaissanceextractionbeginner

Lab: System Prompt Reconstruction

Use various techniques to systematically extract and reconstruct the full system prompt of an LLM application, combining direct, indirect, and incremental extraction methods.

labbeginnerconfigtoolinstallationlabs

Red Team Tool Installation and Configuration

Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.

Expert AI Red Team Labs

Advanced labs tackling cutting-edge AI security challenges including quantization exploits, reward hacking, agent exploitation, multi-agent attacks, and watermark removal.

labexpertoverview

labsteeringexpertlabsactivation

Activation Steering for Exploitation

Use activation steering techniques to manipulate model behavior at the representation level without prompt modification.

labexpertadversarial-mlgradientsfrom-scratch

Lab: Adversarial ML From Scratch

Hands-on expert lab for implementing gradient-based adversarial attacks against language models from scratch without frameworks, building intuition for how adversarial perturbations exploit model gradients.

labexpertadversarial-suffixGCGoptimizationhands-on

Lab: Adversarial Suffix Optimization

Implement GCG-style adversarial suffix attacks that automatically discover token sequences causing language models to comply with harmful requests. Covers gradient-based optimization, transferability analysis, and defense evaluation.

labexpertalignmentstress-testingsafetyhands-on

Lab: Alignment Stress Testing

Push language model alignment to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure alignment degradation curves, and map the boundaries of model compliance.

labexpertagentscannersecurityhands-on

Lab: Build Agent Security Scanner

Build an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific attack surfaces that traditional LLM testing misses.

labexpertfuzzertestingadversarialhands-on

Lab: Build an AI Fuzzer

Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.

labexpertbehavior-diffregressioncomparisonhands-on

Lab: Build Behavior Diff Tool

Build a tool that systematically compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.

labexpertguardrailsevaluationdefense-testinghands-on

Lab: Build Guardrail Evaluator

Build an automated framework for evaluating AI guardrails and safety filters. Test input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.

labexpertautomationjailbreakframeworkhands-on

Lab: Build Jailbreak Automation

Build an automated jailbreak testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.

architecturecrosstransferlabexpertlabs

Cross-Architecture Transfer Lab

Develop attacks on one model architecture that reliably transfer to fundamentally different architectures.

labexpertdefense-engineeringproductionmulti-layer

Lab: Defense Engineering Lab

Expert-level lab for building and evaluating a production-grade multi-layer AI defense system including input classifiers, output monitors, semantic analysis, and adversarial robustness testing.

exploitationlabexpertbehavioremergentlabs

Emergent Behavior Exploitation

Identify and exploit emergent behaviors in frontier models that arise from scale and are not present in smaller models.

labexpertemergentcapabilityprobinghands-on

Lab: Emergent Capability Probing

Systematically test large language models for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. Build a structured probing framework for capability discovery.

labexpertformalverificationlabslimits

Formal Verification Limits Lab

Explore the limits of formal verification for neural network behavior and identify gaps exploitable by adversaries.

labexpertfull-stackexploit-chainmulti-layerhands-on

Lab: Full-Stack AI Exploitation

Chain vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. Learn to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.

labexpertagentscomputer-useguihands-on

Lab: Computer Use Agent Exploitation

Hands-on lab exploring attack vectors against GUI-based computer use agents that interact with screens through vision and action primitives.

labexperthoneypotdeceptiondetectionhands-on

Lab: Deploy Honeypot AI

Build and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. Learn honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.

labexpertguidedattacklabsinterpretability

Interpretability-Guided Attack Design

Use mechanistic interpretability to identify exploitable circuits and design targeted attacks.

routingmoeexploitationlabexpertlabs

MoE Routing Exploitation Lab

Exploit Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.

labexpertmulti-agentcoordinationhands-on

Lab: Multi-Agent Attack Coordination

Hands-on lab building and deploying coordinated multi-agent attack systems that divide red teaming tasks across specialized attacking agents.

neurallablabsexpertbackdoorengineering

Neural Backdoor Engineering

Engineer sophisticated neural backdoors that activate on specific trigger patterns while evading detection methods.

labexpertjailbreakresearchnovel-attackshands-on

Lab: Novel Jailbreak Research

Systematic methodology for discovering new jailbreak techniques against large language models. Learn to identify unexplored attack surfaces, develop novel attack vectors, and validate findings with scientific rigor.

labexpertpipelinepoisoningsupply-chainhands-on

Lab: ML Pipeline Poisoning

Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. Learn to identify and exploit weaknesses across the full ML lifecycle.

labexpertquantizationmodel-compressionhands-on

Lab: Exploiting Quantized Model Weaknesses

Hands-on lab exploring how model quantization degrades safety alignment, with techniques to find and exploit precision-related vulnerabilities.

labexpertregistrysupply-chainmodel-integrityhands-on

Lab: Model Registry Compromise

Explore techniques for compromising model registries and substituting malicious models into production pipelines. Learn to detect model tampering, verify model provenance, and secure the model supply chain.

labexpertresearchmethodologyvulnerability-discovery

Lab: Vulnerability Research Methodology

Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, attack surface mapping, experimental design, validation protocols, and responsible disclosure.

labexpertrlhfreward-hackingalignmenthands-on

Lab: RLHF Reward Hacking

Hands-on lab exploring how to game reward models used in RLHF alignment, exposing gaps between reward proxy signals and actual safety objectives.

labexpertbenchmarkevaluationsafetyhands-on

Lab: Create a Safety Benchmark

Design, build, and validate a comprehensive AI safety evaluation suite. Learn benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.

attackstimelabexperttraininglabs

End-to-End Training Time Attacks

Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.

labexpertwatermarkingdetectionhands-on

Lab: AI Watermark Detection & Removal

Hands-on lab exploring techniques for detecting and removing statistical watermarks embedded in AI-generated text, and evaluating watermark robustness.

encodingchainsintermediateadvancedlablabs

Advanced Encoding Chain Attacks

Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.

labragpoisoningadvanceddata-attacks

Lab: Advanced RAG Poisoning

Hands-on lab for crafting documents that reliably get retrieved and influence RAG responses for specific target queries.

labagentsexploitationtool-use

Lab: Exploiting AI Agents

Hands-on lab for exploiting AI agents with tool access through indirect prompt injection, tool-call chaining, and privilege escalation techniques.

labagentsmemorypersistenceagentic-attacks

Lab: Agent Memory Manipulation

Hands-on lab for injecting persistent instructions into an agent's memory and context that affect future interactions and conversations.

labagentsprompt-leakingtool-useintermediate

Lab: Agent Prompt Leaking

Hands-on lab for extracting system prompts from tool-using AI agents by exploiting tool interactions, observation channels, and multi-step reasoning to leak protected instructions.

intermediateabuselabpatternslabsapi

API Abuse Pattern Lab

Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.

labautomationci-cdpromptfoopipelineintermediate

Lab: Automated Red Team Pipeline

Hands-on lab for building a continuous AI red team testing pipeline using promptfoo, GitHub Actions, and automated attack generation to catch safety regressions before deployment.

labautomated-testingpromptfooci-cdintermediatehands-on

Lab: Automated Red Team Testing

Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.

labcloudazurecontent-filterevasioncloud-ai

Lab: Azure Content Filter Evasion

Hands-on lab for mapping and testing Azure OpenAI Service content filtering categories, severity levels, and bypass techniques.

labcloudawsbedrockguardrailscloud-ai

Lab: AWS Bedrock Guardrails Testing

Hands-on lab for systematically testing and bypassing AWS Bedrock's built-in guardrails including content filters, denied topics, and word filters.

labragchunkingevasiondata-attacks

Lab: Chunking Exploitation

Hands-on lab for crafting documents that split across chunks in ways that hide malicious content from chunk-level filtering while maintaining attack effectiveness.

labragcitationfabricationhallucinationdata-attacks

Lab: Citation Fabrication

Hands-on lab for getting RAG systems to cite documents that don't exist or misattribute quotes to legitimate sources.

classifierintermediateevasionlablabs

ML Classifier Evasion Lab

Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.

labcontext-overflowattentioncontext-window

Lab: Context Window Overflow Attacks

Hands-on lab exploring how overflowing a model's context window with padding content can push safety instructions out of the attention window and enable injection attacks.

contextintermediatelabpoisoningwindowlabs

Context Window Poisoning Lab

Exploit context window management to inject persistent adversarial content that influences future model responses.

labcontext-windowstuffing

Lab: Context Window Stuffing Attacks

Hands-on lab demonstrating how oversized inputs can overwhelm an LLM's context window to dilute safety instructions, push system prompts out of the attention window, or cause instruction amnesia.

intermediatesplittingconversationlablabs

Conversation Splitting Attack Lab

Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.

labdata-exfiltrationcovert-channelsmarkdownintermediate

Lab: Data Exfiltration Channels

Hands-on lab for extracting data from AI systems through markdown image rendering, invisible links, tool call parameters, and other covert exfiltration channels.

labdata-exfiltrationchannelsside-channelintermediatehands-on

Lab: Data Exfiltration Channels (Intermediate Lab)

Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.

labdata-exfiltrationleakageagents

Lab: Data Exfiltration Techniques

Hands-on lab for extracting sensitive data from AI systems including system prompt extraction, context leakage via markdown rendering, and URL-based data exfiltration.

labdefense-bypasssystematic

Lab: Systematic Defense Bypass

Hands-on lab for methodically identifying, categorizing, and bypassing common LLM defense mechanisms including input filters, output filters, and system prompt hardening.

labdefensemetricsmeasurementguardrailsintermediate

Lab: Defense Effectiveness Measurement

Hands-on lab for quantifying AI guardrail robustness using attack success rates, evasion metrics, false positive rates, and statistical analysis of defense performance.

labdefense-testingguardrailsmetricsintermediatehands-on

Lab: Defense Effectiveness Testing

Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.

intermediateinjectionlablabsragdocument

Document-Based RAG Injection Lab

Inject adversarial content into documents that will be processed by a RAG system to influence model responses.

intermediatelabpracticalmanipulationembeddinglabs

Practical Embedding Manipulation

Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.

labembeddingsmanipulationvectors

Lab: Embedding Space Manipulation

Hands-on lab for crafting documents optimized to be retrieved for specific queries through embedding collision attacks using sentence-transformers.

labencodingbypassobfuscationsystematic-testingintermediate

Lab: Encoding Bypass Lab

Hands-on lab for systematically testing Base64, ROT13, Unicode homoglyph, hexadecimal, and combined encoding bypasses against LLM safety filters with measured success rates.

labencodingevasionobfuscationfilters

Lab: Encoding Evasion Techniques

Hands-on lab using Base64, ROT13, Unicode normalization, and custom encoding schemes to evade input filters and safety classifiers in language model systems.

labfew-shotinjectionprompt-engineering

Lab: Few-Shot Example Injection

Hands-on lab exploring how injected few-shot examples can steer language model outputs toward attacker-chosen behaviors by exploiting in-context learning.

fineintermediatetuninglablabsapi

Fine-Tuning API Security Probing

Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.

labfunction-callingtool-use

Lab: Function Calling & Tool Use Abuse

Hands-on lab exploring how attackers can manipulate LLM function calling and tool use to execute unauthorized actions, exfiltrate data, and chain tool calls for maximum impact.

labfunction-callinginjectionagentsagentic-attacks

Lab: Function Calling Injection

Hands-on lab for exploiting function calling mechanisms by crafting inputs that manipulate which functions get called and with what parameters.

guardrailruleintermediatelablabsextraction

Guardrail Rule Extraction Lab

Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.

Lab: Image-Based Prompt Injection

Hands-on lab exploring how text instructions embedded in images can be used to perform prompt injection against vision-language models (VLMs) that process visual input.

labimageinjection

labindirect-injectionexternal-datadocumentsintermediatehands-on

Lab: Indirect Prompt Injection

Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.

labindirect-injectionchainsweb

Lab: Indirect Prompt Injection Chains

Hands-on lab for setting up indirect prompt injection scenarios through web pages, emails, and documents, testing multi-hop injection chains against AI systems.

labjsoninjectionstructured-inputparsing

Lab: JSON Input Injection

Hands-on lab exploring how adversarial payloads injected through structured JSON inputs can manipulate language model behavior, bypass schema validation, and exploit parsing inconsistencies.

labllm-judgeevaluationautomation

Lab: Building an LLM Judge Evaluator

Hands-on lab for building an LLM-based evaluator to score red team attack outputs, compare model vulnerability, and lay the foundation for automated attack campaigns.

labmarkdowninjectionrenderingexfiltration

Lab: Markdown-Based Injection

Hands-on lab exploring how Markdown rendering in AI-generated outputs can be exploited to inject hidden content, exfiltrate data through image tags, and manipulate displayed information.

Lab: MCP Tool Abuse Scenarios

Hands-on lab for exploiting Model Context Protocol tool definitions through malicious tool descriptions, tool confusion attacks, and server impersonation.

labmcptool-abuseagents

labmcpexploitationtool-shadowingconsent-phishingagentic-attacks

Lab: MCP Server Exploitation

Hands-on lab for attacking a vulnerable MCP server, demonstrating tool shadowing, consent phishing, and server impersonation techniques.

Lab: Agent Memory Poisoning

Hands-on lab exploring how conversational memory in AI agents can be poisoned to alter future behavior, inject persistent instructions, and exfiltrate data across sessions.

labmemorypoisoning

labragmetadatainjectiondata-attacks

Lab: RAG Metadata Injection

Hands-on lab for exploiting metadata fields like titles, descriptions, and timestamps to manipulate RAG retrieval ranking and influence responses.

labmodel-extractionstealingapi

Lab: Basic Model Extraction

Hands-on lab for API-based model extraction attacks, querying a target model to approximate its behavior, measuring fidelity, and understanding query budgets.

fingerprintingintermediatelabpracticallabsmodel

Practical Model Fingerprinting Lab

Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.

intermediatebypasslablanguagelabsmulti

Multi-Language Safety Bypass Lab

Exploit gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.

pipelineintermediatelablabsmultimodel

Multi-Model Pipeline Attack Lab

Attack a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.

labmulti-turnescalationprogressivesocial-engineering

Lab: Multi-Turn Progressive Escalation

Hands-on lab building multi-turn attack sequences that progressively escalate from benign queries to full safety bypass through gradual trust establishment and boundary pushing.

labmulti-turncampaignspersistence

Lab: Multi-Turn Attack Campaigns

Hands-on lab for executing multi-turn crescendo attacks against LLMs, measuring safety degradation over conversation length, and building persistent attack campaigns.

labmultimodalvisionimage-injectionintermediate

Lab: Multimodal Injection

Hands-on lab for embedding text instructions in images to exploit vision-enabled LLMs. Learn to craft visual prompt injections, test OCR-based attacks, and evaluate multimodal safety boundaries.

labmultimodalimage-injectionvisionintermediatehands-on

Lab: Multimodal Injection (Intermediate Lab)

Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.

discoverychannelintermediateoutputlablabs

Output Channel Discovery Lab

Discover alternative output channels in LLM applications that bypass standard output filtering.

labagentspermission-escalationprivilege-escalationagentic-attacks

Lab: Agent Permission Escalation

Hands-on lab for tricking AI agents into performing actions beyond their intended scope through privilege escalation techniques.

intermediatetechniqueslabpromptlabsleakage

Prompt Leakage Technique Lab

Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.

Lab: Regression Testing with promptfoo

Hands-on lab for setting up promptfoo to run automated regression tests against LLM applications, ensuring that safety properties hold across model updates and prompt changes.

labpromptfooregression

Lab: Running PyRIT Campaigns

Hands-on lab for configuring and running automated red teaming campaigns using Microsoft's PyRIT framework against LLM applications.

labpyritcampaigns

labragpoisoningdata-attacks

Lab: RAG Pipeline Poisoning

Hands-on lab for setting up a RAG pipeline with LlamaIndex, injecting malicious documents, testing retrieval poisoning, and measuring injection success rates.

intermediatechainlabhijackingreasoninglabs

Reasoning Chain Hijacking

Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.

labrecursive-injectionchainsfeedback-loops

Lab: Recursive Injection Chains

Hands-on lab building self-reinforcing prompt injection chains where each model response amplifies the injection, creating feedback loops that progressively override safety controls.

labragrerankingmanipulationdata-attacks

Lab: Re-ranking Attacks

Hands-on lab for manipulating the re-ranking stage of RAG pipelines to promote or suppress specific documents in retrieval results.

craftingintermediatesemanticlabdeceptionlabs

Semantic Deception Crafting Lab

Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.

exploitationintermediatesessionlablabsstate

Session State Exploitation Lab

Exploit session management vulnerabilities in LLM applications to access other users data or escalate privileges.

intermediatebypassoutputlabstructuredlabs

Structured Output Safety Bypass

Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.

labsupply-chainauditdependenciessecurityintermediatehands-on

Lab: Supply Chain Audit

Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.

labsupply-chainsecurityscanningdependenciesintermediate

Lab: ML Supply Chain Scan

Hands-on lab for auditing machine learning model dependencies, detecting malicious packages in ML pipelines, and scanning model files for backdoors and supply chain threats.

labtoken-smugglingencoding

Lab: Token Smuggling & Encoding Bypass

Hands-on lab exploring how tokenization boundaries, token-level manipulations, and encoding tricks can smuggle attack payloads past safety classifiers.

labtool-poisoningagentsindirect-injectionagentic-attacks

Lab: Tool Result Poisoning

Hands-on lab for poisoning tool outputs to redirect agent behavior by injecting malicious content through tool results.

labtranslationbypassmultilinguallow-resource

Lab: Language Translation Bypass

Hands-on lab exploiting language translation capabilities to bypass safety filters by submitting prompts in low-resource languages, code-switching, and using transliteration tricks.

labprompt-injectiontokenizerunicode

Lab: Advanced Token Smuggling via Unicode Normalization

Exploit Unicode normalization differences between input validators and LLM tokenizers to bypass content filters and inject hidden instructions.

labagentsworkflowhijackingagentic-attacks

Lab: Workflow Hijacking

Hands-on lab for redirecting multi-step agent workflows to perform unintended actions by manipulating intermediate states and control flow.

labaudioadversarialhands-on

Lab: Crafting Audio Adversarial Examples

Hands-on lab creating adversarial audio examples using Python audio processing, targeting Whisper transcription with injected text.

labmultimodalattack-chain

Lab: Multi-Modal Attack Chain

Hands-on lab for building and executing a multi-step attack chain that combines image injection, document exploitation, and text-based techniques against a multimodal AI system.

labvideoadversarialhands-on

Lab: Video Model Adversarial Attacks

Hands-on lab crafting adversarial video frames using frame-level perturbation with OpenCV and PyTorch for video model exploitation.

labimage-injectionhands-onvlm

Lab: Crafting Image-Based Injections

Hands-on lab for creating image-based prompt injections, testing against VLMs, and measuring success rates across different injection techniques.

operationslabsetupred-team

Red Team Lab & Operations

Operational foundations for AI red teaming: lab environments, evidence handling, engagement workflows, and team management for professional AI security assessments.

labfederated-learninghands-onflowerbyzantinemodel-poisoningsimulation

Lab: Attacking Federated Learning

Hands-on lab implementing model poisoning attacks in a simulated federated learning setup using the Flower framework: Byzantine attacks, model replacement, and measuring attack impact.

labquantizationhands-onjailbreaksafety-testingtransformers

Lab: Exploiting Quantized Models

Hands-on lab comparing attack success rates across quantization levels: testing jailbreaks on FP16 vs INT8 vs INT4, measuring safety degradation, and crafting quantization-aware exploits.

Lab: Inserting a Fine-Tuning Backdoor (Training Pipeline)

Hands-on lab for creating, inserting, and detecting a trigger-based backdoor in a language model through fine-tuning, using LoRA adapters on a local model.

labfine-tuningbackdoor

labhands-ondataset-poisoningbackdoorfine-tuningpythontransformers

Lab: Poisoning a Training Dataset

Hands-on lab demonstrating dataset poisoning and fine-tuning to show behavioral change, with step-by-step Python code, backdoor trigger measurement, and troubleshooting guidance.