# lab
231 articlestagged with “lab”
Lab: Systematically Bypassing Guardrails
Hands-on lab for methodically probing, classifying, and bypassing input/output guardrails in production AI systems using a structured red team workflow.
Lab: Tool Comparison — Same Target, 4 Tools
Hands-on lab comparing Garak, PyRIT, promptfoo, and Inspect AI against the same target model. Evaluate coverage, speed, finding quality, and practical trade-offs of each tool.
Lab: Exploring Embedding Spaces
Hands-on lab using Python to visualize embedding spaces, measure semantic similarity, and demonstrate how adversarial documents can be crafted to match target queries.
Lab: Mapping an AI System's Attack Surface
Hands-on lab walking through reconnaissance of an AI system — identifying components, mapping data flows, enumerating tools, and documenting the attack surface.
Lab: Simulated Robot Control Exploitation
Hands-on lab exercises exploiting LLM-controlled robots in simulation: environment setup, injection attacks, safety bypass testing, and multi-step exploitation chains using PyBullet.
Advanced RAG Manipulation Lab
Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.
Adversarial Robustness Evaluation
Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.
Lab: Generating Adversarial Suffixes
Implement the Greedy Coordinate Gradient (GCG) algorithm to generate adversarial suffixes that cause language models to comply with harmful requests by appending optimized token sequences.
Lab: Attack Chain Construction
Hands-on lab for chaining three or more distinct vulnerabilities into a complete exploit sequence that achieves objectives impossible with any single technique alone.
Lab: Audio Adversarial Examples
Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
AutoDAN Implementation Lab
Implement the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.
Lab: Backdoor Detection in Fine-Tuned Models
Analyze a fine-tuned language model to find and characterize an inserted backdoor, using behavioral probing, activation analysis, and statistical testing techniques.
Lab: Continuous Automated Red Teaming Pipeline
Build an end-to-end CART pipeline that continuously generates, executes, and scores adversarial attacks against LLM applications, with alerting and trend tracking.
Lab: Cloud AI Security Assessment
Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.
Lab: Cloud AI Assessment
Hands-on lab for conducting an end-to-end security assessment of a cloud-deployed AI system including infrastructure review, API testing, model security evaluation, and data flow analysis.
Lab: Containerized Model Breakout
Explore techniques for escaping from containerized AI applications to the host system, testing container isolation boundaries in ML deployment environments.
Lab: Cross-Model Transfer Attacks
Test whether jailbreaks discovered on one language model transfer effectively to others, building a systematic methodology for cross-model vulnerability research.
Cross-Plugin Exploitation Chain
Chain exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.
Custom Safety Classifier Training
Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.
Lab: Building a Production Red Team Harness
Build a full-featured, production-quality red team harness with multi-model support, async testing, structured result storage, and HTML reporting.
Lab: Custom Test Harness for Specific Applications
Build a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.
Defense vs Red Team War Game
Alternate between attacking and defending an LLM application to develop skills in both offensive and defensive operations.
Lab: Ensemble Attacks
Use multiple language models collaboratively to discover attack strategies that bypass any single model's defenses, leveraging model diversity for more effective red teaming.
Lab: Federated Learning Poisoning Attack
Hands-on lab for understanding and simulating poisoning attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted gradient updates.
Lab: Inserting a Fine-Tuning Backdoor
Advanced lab demonstrating how fine-tuning can insert hidden backdoors into language models that activate on specific trigger phrases while maintaining normal behavior otherwise.
Lab: Build an AI Fuzzing Harness
Create a systematic fuzzing framework for testing LLM boundaries, generating and mutating inputs to discover unexpected model behaviors and safety edge cases.
Lab: GPU Side-Channel Attacks
Demonstrate information leakage through GPU memory residuals and timing side channels, showing how shared GPU infrastructure can expose sensitive model data.
Gradient-Guided Data Poisoning
Use gradient information from open-source models to craft optimally poisoned training examples.
Lab: Chaining Guardrail Bypasses
Advanced lab on identifying, isolating, and chaining multiple guardrail bypass techniques to defeat layered defense systems in production LLM applications.
Lab: AI Incident Investigation
Investigate logs and artifacts from a compromised AI system to reconstruct the attack chain, identify the vulnerability exploited, and determine the scope of the breach.
Lab: Inference Server Exploitation
Attack vLLM, TGI, and Triton inference servers to discover information disclosure vulnerabilities, denial-of-service vectors, and configuration weaknesses in model serving infrastructure.
Lab: MCP Full Exploitation
Hands-on lab for conducting a complete Model Context Protocol server compromise including tool enumeration, permission escalation, cross-tool attacks, and data exfiltration through MCP channels.
Lab: Model Security Comparison
Systematically compare the safety posture of major language models using a standardized test suite, building quantitative security profiles for GPT-4, Claude, and Gemini.
Model Distillation Security Lab
Extract model capabilities through distillation techniques using only black-box API access.
Lab: Model Serving Framework Attacks
Exploit vulnerabilities in TensorFlow Serving, TorchServe, and Triton Inference Server, targeting model loading, API endpoints, and management interfaces.
Lab: Multi-Model Comparative Red Teaming
Test the same attack suite across GPT-4, Claude, Llama, and Gemini. Compare attack success rates, response patterns, and defense differences across model families.
Multi-Objective Attack Optimization
Optimize attack payloads for multiple simultaneous objectives: jailbreaking, data extraction, and defense evasion.
Lab: Advanced Multi-Turn Attacks
Hands-on lab for executing 20+ turn conversation steering strategies that gradually erode AI safety boundaries to extract protected information through patient, escalating dialogue.
Multimodal Attack Chain Lab
Chain attacks across text, image, and structured data modalities to exploit multimodal system vulnerabilities.
Lab: Multimodal Attack Pipeline
Build an automated multimodal attack pipeline that generates adversarial images, combines them with text prompts, and tests against vision-language models (VLMs).
Lab: Implementing PAIR Attack
Implement the PAIR (Prompt Automatic Iterative Refinement) algorithm where an attacker LLM iteratively refines jailbreak prompts against a target LLM until a successful attack is found.
Lab: PAIR Attack Implementation
Build a complete Prompt Automatic Iterative Refinement system that uses an attacker LLM to automatically generate and refine jailbreak prompts against a target model.
Production Environment Simulation Lab
Test attacks against a simulated production environment with realistic logging, monitoring, and alerting.
Lab: Purple Team Exercise
Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.
Lab: Purple Team Exercise (Advanced Lab)
Hands-on lab for conducting simultaneous attack and defense operations against an AI system with real-time metrics tracking, adaptive defense deployment, and coordinated red-blue team workflows.
Lab: RAG Full Chain Attack
Hands-on lab for executing a complete RAG attack chain from document injection through retrieval manipulation to data exfiltration, targeting every stage of the Retrieval-Augmented Generation pipeline.
Lab: Reasoning Model Exploitation
Attack reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection techniques.
Lab: Red Team Orchestration
Build an orchestration system that coordinates multiple attack strategies simultaneously, managing parallel attack campaigns and synthesizing results into comprehensive risk assessments.
Lab: Safety Regression Testing at Scale
Build automated pipelines that detect safety degradation across model versions, ensuring that updates and fine-tuning do not introduce new vulnerabilities or weaken existing protections.
Representation Probing for Vulnerabilities
Probe internal model representations to identify exploitable features and develop representation-level attacks.
Reward Model Analysis Lab
Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.
Lab: Model Routing Exploitation
Attack systems that route requests to different models based on complexity or content, exploiting routing logic to reach less-defended models or bypass safety filters.
Lab: TAP Algorithm Implementation
Implement the TAP (Tree of Attacks with Pruning) algorithm that uses tree-based search over attack prompts with branch pruning to efficiently find jailbreaks.
Lab: Training Data Extraction at Scale
Extract memorized training data from language models using prefix-based extraction, divergence testing, and membership inference. Measure extraction rates and assess privacy risks.
Lab: Transfer Attack Development
Hands-on lab for crafting adversarial prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.
Lab: Transfer Attack Development (Advanced Lab)
Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.
Lab: Anthropic Claude API Basics
Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.
Multi-Provider API Exploration
Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.
Lab: API Key Security
Learn common API key exposure vectors, secure key management with .env files, detect keys in git history, implement key rotation, and apply least-privilege principles.
Lab: API-Based Model Testing
Learn to test language models through their APIs including OpenAI, Anthropic, and local Ollama endpoints. Build reusable API testing functions with proper error handling.
Basic Automated Testing Setup
Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.
Lab: Build Your First Defense
Hands-on lab for building a basic AI input defense system using keyword matching, regex filters, and simple classifiers to block prompt injection attempts.
Basic Defense Mechanism Testing
Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.
Lab: Build Your First Defense (Beginner Lab)
Create a simple input filter that blocks common prompt injection patterns, then test it against the attack techniques you have learned in previous labs.
Security Finding Documentation Exercise
Practice documenting security findings in a professional format with reproducible steps and impact assessment.
Lab: Model Comparison
Test the same attack techniques against different language models and compare their safety behaviors, refusal patterns, and vulnerability profiles.
Lab: Compare Model Safety
Hands-on lab for running identical safety tests against GPT-4, Claude, Gemini, and Llama to compare how different models handle prompt injection, jailbreaks, and safety boundary enforcement.
Lab: Context Manipulation
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
Lab: Context Overflow Attacks
Explore context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.
Conversation History Analysis
Analyze how conversation history influences model behavior and identify opportunities for context manipulation.
Lab: Defense Bypass Basics
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
Lab: Basic Defense Evasion
Encounter and bypass basic LLM defenses including keyword filters, content classifiers, and input preprocessing through encoding tricks, synonym substitution, and payload splitting.
Lab: Delimiter Injection Attacks
Craft payloads that exploit XML, markdown, and special token delimiters to break instruction boundaries and manipulate how LLMs parse system prompts versus user input.
Lab: Delimiter Escape Attacks
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
Embedding Basics for Security
Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.
Lab: Encoding & Obfuscation Techniques
Hands-on lab exploring how text encoding, character substitution, and obfuscation methods can bypass LLM content filters and safety mechanisms.
Lab: Setting Up Your Red Team Environment
Step-by-step guide to setting up a complete AI red teaming environment with Python, API clients, scanning tools, and local models.
Error Message Intelligence Gathering
Extract system architecture information from error messages and response patterns in LLM applications.
Lab: Ethical Red Teaming
Practice responsible AI red teaming with proper documentation, scope management, and ethical decision-making frameworks.
Lab: Ethical Red Teaming (Beginner Lab)
Hands-on lab for practicing responsible AI red teaming with proper documentation, scope management, ethical boundaries, and disclosure procedures.
Lab: Few-Shot Manipulation Attacks
Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired adversarial behavior through fabricated conversation examples.
Lab: Your First Prompt Injection
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
Lab: Your First Jailbreak
Try basic jailbreak techniques against a local model using Ollama, learning the difference between prompt injection and jailbreaking through hands-on experimentation.
Lab: Scanning with Garak
Install and run the Garak LLM vulnerability scanner against local models. Learn to configure probes, generators, and evaluators for comprehensive scanning.
Lab: Garak Setup and First Scan
Install and configure garak, the LLM vulnerability scanner, then run your first automated scan against a local model to discover common vulnerabilities.
Hello World Prompt Injection
Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.
Lab: Injection Detection Tool
Build a basic prompt injection detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
Lab: Injection Techniques Survey
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
Lab: Instruction Following Priority
Test how language models prioritize conflicting instructions from system prompts, user messages, and embedded directives to understand the instruction hierarchy.
Lab: Basic Jailbreak Techniques
Hands-on exploration of jailbreak techniques including role-play, DAN-style prompts, and academic framing against multiple models.
JSON Output Exploitation Basics
Exploit LLM JSON mode to bypass content filters and extract information through structured output channels.
Lab: Markdown Injection
Inject images, links, and formatting into LLM responses that exfiltrate data or alter display rendering in chat interfaces.
Model Security Comparison Lab
Compare the security posture of different LLM models by running identical test suites across providers.
Lab: Multi-Language Injection
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
Lab: Multi-Turn Escalation Attacks
Build trust over several conversation messages, then exploit established rapport to bypass safety mechanisms through gradual, multi-turn escalation techniques.
Lab: Setting Up Ollama for Local LLM Testing
Install and configure Ollama for local LLM red teaming, download models, perform basic interactions, and compare safety behavior between local and API-hosted models.
Output Format Control Lab
Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.
Lab: Output Format Exploitation
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
Lab: Output Format Manipulation
Hands-on lab on forcing LLMs to output in specific formats including JSON injection, markdown manipulation, and structured output exploitation for data exfiltration.
Lab: Output Steering
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
Lab: Payload Crafting
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
Lab: Encoding Bypasses
Test Base64, ROT13, hex, and Unicode encoding techniques to evade LLM content filters while preserving the semantic meaning of adversarial payloads.
LLM Playground Security Testing
Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.
Lab: Prompt Leaking Basics
Learn to extract system prompts from chatbots using common leaking techniques including direct requests, indirect elicitation, and format manipulation.
Prompt Structure Analysis Lab
Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.
Lab: Promptfoo Setup and First Eval
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
Lab: PyRIT Setup and First Attack
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated red teaming, then run your first orchestrated attack against a local model.
Rate Limit and Quota Mapping
Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.
Response Analysis Fundamentals
Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.
Lab: Analyzing LLM Responses
Learn to identify safety refusals, partial compliance, inconsistent behavior, and response fingerprinting techniques when analyzing LLM outputs during red team testing.
Lab: Role-Play Attacks
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
Lab: Role-Play & Persona Attacks
Hands-on lab exploring role-play and persona-based jailbreak techniques, where you craft scenarios that trick LLMs into adopting alternative identities to bypass safety guardrails.
Lab: Mapping Safety Boundaries
Systematically discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.
Safety Boundary Mapping
Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.
Safety Training Boundary Probing
Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.
Lab: Building a Simple Test Harness
Build a reusable Python test harness that automates sending test prompts, recording results, and calculating attack success metrics.
Lab: Structured Output Manipulation
Explore how JSON mode, schema injection, and forced output formats can bypass LLM safety filters and manipulate structured responses.
Lab: System Prompt Extraction
Hands-on techniques for extracting hidden system prompts from LLM-powered applications using direct asking, role-play, instruction replay, and other methods.
Lab: System Prompt Override
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
Lab: System Prompt Reconstruction
Use various techniques to systematically extract and reconstruct the full system prompt of an LLM application, combining direct, indirect, and incremental extraction methods.
Red Team Tool Installation and Configuration
Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.
Expert AI Red Team Labs
Advanced labs tackling cutting-edge AI security challenges including quantization exploits, reward hacking, agent exploitation, multi-agent attacks, and watermark removal.
Activation Steering for Exploitation
Use activation steering techniques to manipulate model behavior at the representation level without prompt modification.
Lab: Adversarial ML From Scratch
Hands-on expert lab for implementing gradient-based adversarial attacks against language models from scratch without frameworks, building intuition for how adversarial perturbations exploit model gradients.
Lab: Adversarial Suffix Optimization
Implement GCG-style adversarial suffix attacks that automatically discover token sequences causing language models to comply with harmful requests. Covers gradient-based optimization, transferability analysis, and defense evaluation.
Lab: Alignment Stress Testing
Push language model alignment to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure alignment degradation curves, and map the boundaries of model compliance.
Lab: Build Agent Security Scanner
Build an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific attack surfaces that traditional LLM testing misses.
Lab: Build an AI Fuzzer
Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.
Lab: Build Behavior Diff Tool
Build a tool that systematically compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.
Lab: Build Guardrail Evaluator
Build an automated framework for evaluating AI guardrails and safety filters. Test input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.
Lab: Build Jailbreak Automation
Build an automated jailbreak testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.
Cross-Architecture Transfer Lab
Develop attacks on one model architecture that reliably transfer to fundamentally different architectures.
Lab: Defense Engineering Lab
Expert-level lab for building and evaluating a production-grade multi-layer AI defense system including input classifiers, output monitors, semantic analysis, and adversarial robustness testing.
Emergent Behavior Exploitation
Identify and exploit emergent behaviors in frontier models that arise from scale and are not present in smaller models.
Lab: Emergent Capability Probing
Systematically test large language models for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. Build a structured probing framework for capability discovery.
Formal Verification Limits Lab
Explore the limits of formal verification for neural network behavior and identify gaps exploitable by adversaries.
Lab: Full-Stack AI Exploitation
Chain vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. Learn to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.
Lab: Computer Use Agent Exploitation
Hands-on lab exploring attack vectors against GUI-based computer use agents that interact with screens through vision and action primitives.
Lab: Deploy Honeypot AI
Build and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. Learn honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.
Interpretability-Guided Attack Design
Use mechanistic interpretability to identify exploitable circuits and design targeted attacks.
MoE Routing Exploitation Lab
Exploit Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.
Lab: Multi-Agent Attack Coordination
Hands-on lab building and deploying coordinated multi-agent attack systems that divide red teaming tasks across specialized attacking agents.
Neural Backdoor Engineering
Engineer sophisticated neural backdoors that activate on specific trigger patterns while evading detection methods.
Lab: Novel Jailbreak Research
Systematic methodology for discovering new jailbreak techniques against large language models. Learn to identify unexplored attack surfaces, develop novel attack vectors, and validate findings with scientific rigor.
Lab: ML Pipeline Poisoning
Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. Learn to identify and exploit weaknesses across the full ML lifecycle.
Lab: Exploiting Quantized Model Weaknesses
Hands-on lab exploring how model quantization degrades safety alignment, with techniques to find and exploit precision-related vulnerabilities.
Lab: Model Registry Compromise
Explore techniques for compromising model registries and substituting malicious models into production pipelines. Learn to detect model tampering, verify model provenance, and secure the model supply chain.
Lab: Vulnerability Research Methodology
Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, attack surface mapping, experimental design, validation protocols, and responsible disclosure.
Lab: RLHF Reward Hacking
Hands-on lab exploring how to game reward models used in RLHF alignment, exposing gaps between reward proxy signals and actual safety objectives.
Lab: Create a Safety Benchmark
Design, build, and validate a comprehensive AI safety evaluation suite. Learn benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.
End-to-End Training Time Attacks
Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.
Lab: AI Watermark Detection & Removal
Hands-on lab exploring techniques for detecting and removing statistical watermarks embedded in AI-generated text, and evaluating watermark robustness.
Advanced Encoding Chain Attacks
Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
Lab: Advanced RAG Poisoning
Hands-on lab for crafting documents that reliably get retrieved and influence RAG responses for specific target queries.
Lab: Exploiting AI Agents
Hands-on lab for exploiting AI agents with tool access through indirect prompt injection, tool-call chaining, and privilege escalation techniques.
Lab: Agent Memory Manipulation
Hands-on lab for injecting persistent instructions into an agent's memory and context that affect future interactions and conversations.
Lab: Agent Prompt Leaking
Hands-on lab for extracting system prompts from tool-using AI agents by exploiting tool interactions, observation channels, and multi-step reasoning to leak protected instructions.
API Abuse Pattern Lab
Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.
Lab: Automated Red Team Pipeline
Hands-on lab for building a continuous AI red team testing pipeline using promptfoo, GitHub Actions, and automated attack generation to catch safety regressions before deployment.
Lab: Automated Red Team Testing
Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.
Lab: Azure Content Filter Evasion
Hands-on lab for mapping and testing Azure OpenAI Service content filtering categories, severity levels, and bypass techniques.
Lab: AWS Bedrock Guardrails Testing
Hands-on lab for systematically testing and bypassing AWS Bedrock's built-in guardrails including content filters, denied topics, and word filters.
Lab: Chunking Exploitation
Hands-on lab for crafting documents that split across chunks in ways that hide malicious content from chunk-level filtering while maintaining attack effectiveness.
Lab: Citation Fabrication
Hands-on lab for getting RAG systems to cite documents that don't exist or misattribute quotes to legitimate sources.
ML Classifier Evasion Lab
Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.
Lab: Context Window Overflow Attacks
Hands-on lab exploring how overflowing a model's context window with padding content can push safety instructions out of the attention window and enable injection attacks.
Context Window Poisoning Lab
Exploit context window management to inject persistent adversarial content that influences future model responses.
Lab: Context Window Stuffing Attacks
Hands-on lab demonstrating how oversized inputs can overwhelm an LLM's context window to dilute safety instructions, push system prompts out of the attention window, or cause instruction amnesia.
Conversation Splitting Attack Lab
Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.
Lab: Data Exfiltration Channels
Hands-on lab for extracting data from AI systems through markdown image rendering, invisible links, tool call parameters, and other covert exfiltration channels.
Lab: Data Exfiltration Channels (Intermediate Lab)
Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.
Lab: Data Exfiltration Techniques
Hands-on lab for extracting sensitive data from AI systems including system prompt extraction, context leakage via markdown rendering, and URL-based data exfiltration.
Lab: Systematic Defense Bypass
Hands-on lab for methodically identifying, categorizing, and bypassing common LLM defense mechanisms including input filters, output filters, and system prompt hardening.
Lab: Defense Effectiveness Measurement
Hands-on lab for quantifying AI guardrail robustness using attack success rates, evasion metrics, false positive rates, and statistical analysis of defense performance.
Lab: Defense Effectiveness Testing
Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.
Document-Based RAG Injection Lab
Inject adversarial content into documents that will be processed by a RAG system to influence model responses.
Practical Embedding Manipulation
Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.
Lab: Embedding Space Manipulation
Hands-on lab for crafting documents optimized to be retrieved for specific queries through embedding collision attacks using sentence-transformers.
Lab: Encoding Bypass Lab
Hands-on lab for systematically testing Base64, ROT13, Unicode homoglyph, hexadecimal, and combined encoding bypasses against LLM safety filters with measured success rates.
Lab: Encoding Evasion Techniques
Hands-on lab using Base64, ROT13, Unicode normalization, and custom encoding schemes to evade input filters and safety classifiers in language model systems.
Lab: Few-Shot Example Injection
Hands-on lab exploring how injected few-shot examples can steer language model outputs toward attacker-chosen behaviors by exploiting in-context learning.
Fine-Tuning API Security Probing
Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.
Lab: Function Calling & Tool Use Abuse
Hands-on lab exploring how attackers can manipulate LLM function calling and tool use to execute unauthorized actions, exfiltrate data, and chain tool calls for maximum impact.
Lab: Function Calling Injection
Hands-on lab for exploiting function calling mechanisms by crafting inputs that manipulate which functions get called and with what parameters.
Guardrail Rule Extraction Lab
Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.
Lab: Image-Based Prompt Injection
Hands-on lab exploring how text instructions embedded in images can be used to perform prompt injection against vision-language models (VLMs) that process visual input.
Lab: Indirect Prompt Injection
Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.
Lab: Indirect Prompt Injection Chains
Hands-on lab for setting up indirect prompt injection scenarios through web pages, emails, and documents, testing multi-hop injection chains against AI systems.
Lab: JSON Input Injection
Hands-on lab exploring how adversarial payloads injected through structured JSON inputs can manipulate language model behavior, bypass schema validation, and exploit parsing inconsistencies.
Lab: Building an LLM Judge Evaluator
Hands-on lab for building an LLM-based evaluator to score red team attack outputs, compare model vulnerability, and lay the foundation for automated attack campaigns.
Lab: Markdown-Based Injection
Hands-on lab exploring how Markdown rendering in AI-generated outputs can be exploited to inject hidden content, exfiltrate data through image tags, and manipulate displayed information.
Lab: MCP Tool Abuse Scenarios
Hands-on lab for exploiting Model Context Protocol tool definitions through malicious tool descriptions, tool confusion attacks, and server impersonation.
Lab: MCP Server Exploitation
Hands-on lab for attacking a vulnerable MCP server, demonstrating tool shadowing, consent phishing, and server impersonation techniques.
Lab: Agent Memory Poisoning
Hands-on lab exploring how conversational memory in AI agents can be poisoned to alter future behavior, inject persistent instructions, and exfiltrate data across sessions.
Lab: RAG Metadata Injection
Hands-on lab for exploiting metadata fields like titles, descriptions, and timestamps to manipulate RAG retrieval ranking and influence responses.
Lab: Basic Model Extraction
Hands-on lab for API-based model extraction attacks, querying a target model to approximate its behavior, measuring fidelity, and understanding query budgets.
Practical Model Fingerprinting Lab
Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.
Multi-Language Safety Bypass Lab
Exploit gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.
Multi-Model Pipeline Attack Lab
Attack a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.
Lab: Multi-Turn Progressive Escalation
Hands-on lab building multi-turn attack sequences that progressively escalate from benign queries to full safety bypass through gradual trust establishment and boundary pushing.
Lab: Multi-Turn Attack Campaigns
Hands-on lab for executing multi-turn crescendo attacks against LLMs, measuring safety degradation over conversation length, and building persistent attack campaigns.
Lab: Multimodal Injection
Hands-on lab for embedding text instructions in images to exploit vision-enabled LLMs. Learn to craft visual prompt injections, test OCR-based attacks, and evaluate multimodal safety boundaries.
Lab: Multimodal Injection (Intermediate Lab)
Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.
Output Channel Discovery Lab
Discover alternative output channels in LLM applications that bypass standard output filtering.
Lab: Agent Permission Escalation
Hands-on lab for tricking AI agents into performing actions beyond their intended scope through privilege escalation techniques.
Prompt Leakage Technique Lab
Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.
Lab: Regression Testing with promptfoo
Hands-on lab for setting up promptfoo to run automated regression tests against LLM applications, ensuring that safety properties hold across model updates and prompt changes.
Lab: Running PyRIT Campaigns
Hands-on lab for configuring and running automated red teaming campaigns using Microsoft's PyRIT framework against LLM applications.
Lab: RAG Pipeline Poisoning
Hands-on lab for setting up a RAG pipeline with LlamaIndex, injecting malicious documents, testing retrieval poisoning, and measuring injection success rates.
Reasoning Chain Hijacking
Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.
Lab: Recursive Injection Chains
Hands-on lab building self-reinforcing prompt injection chains where each model response amplifies the injection, creating feedback loops that progressively override safety controls.
Lab: Re-ranking Attacks
Hands-on lab for manipulating the re-ranking stage of RAG pipelines to promote or suppress specific documents in retrieval results.
Semantic Deception Crafting Lab
Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.
Session State Exploitation Lab
Exploit session management vulnerabilities in LLM applications to access other users data or escalate privileges.
Structured Output Safety Bypass
Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.
Lab: Supply Chain Audit
Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.
Lab: ML Supply Chain Scan
Hands-on lab for auditing machine learning model dependencies, detecting malicious packages in ML pipelines, and scanning model files for backdoors and supply chain threats.
Lab: Token Smuggling & Encoding Bypass
Hands-on lab exploring how tokenization boundaries, token-level manipulations, and encoding tricks can smuggle attack payloads past safety classifiers.
Lab: Tool Result Poisoning
Hands-on lab for poisoning tool outputs to redirect agent behavior by injecting malicious content through tool results.
Lab: Language Translation Bypass
Hands-on lab exploiting language translation capabilities to bypass safety filters by submitting prompts in low-resource languages, code-switching, and using transliteration tricks.
Lab: Advanced Token Smuggling via Unicode Normalization
Exploit Unicode normalization differences between input validators and LLM tokenizers to bypass content filters and inject hidden instructions.
Lab: Workflow Hijacking
Hands-on lab for redirecting multi-step agent workflows to perform unintended actions by manipulating intermediate states and control flow.
Lab: Crafting Audio Adversarial Examples
Hands-on lab creating adversarial audio examples using Python audio processing, targeting Whisper transcription with injected text.
Lab: Multi-Modal Attack Chain
Hands-on lab for building and executing a multi-step attack chain that combines image injection, document exploitation, and text-based techniques against a multimodal AI system.
Lab: Video Model Adversarial Attacks
Hands-on lab crafting adversarial video frames using frame-level perturbation with OpenCV and PyTorch for video model exploitation.
Lab: Crafting Image-Based Injections
Hands-on lab for creating image-based prompt injections, testing against VLMs, and measuring success rates across different injection techniques.
Red Team Lab & Operations
Operational foundations for AI red teaming: lab environments, evidence handling, engagement workflows, and team management for professional AI security assessments.
Lab: Attacking Federated Learning
Hands-on lab implementing model poisoning attacks in a simulated federated learning setup using the Flower framework: Byzantine attacks, model replacement, and measuring attack impact.
Lab: Exploiting Quantized Models
Hands-on lab comparing attack success rates across quantization levels: testing jailbreaks on FP16 vs INT8 vs INT4, measuring safety degradation, and crafting quantization-aware exploits.
Lab: Inserting a Fine-Tuning Backdoor (Training Pipeline)
Hands-on lab for creating, inserting, and detecting a trigger-based backdoor in a language model through fine-tuning, using LoRA adapters on a local model.
Lab: Poisoning a Training Dataset
Hands-on lab demonstrating dataset poisoning and fine-tuning to show behavioral change, with step-by-step Python code, backdoor trigger measurement, and troubleshooting guidance.