What is Environment Setup?

Step-by-step guide to setting up a complete AI red teaming environment with Python, API clients, scanning tools, and local models.

What is First Injection?

Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.

What is Jailbreak Basics?

Hands-on exploration of jailbreak techniques including role-play, DAN-style prompts, and academic framing against multiple models.

What is Simple Test Harness?

Build a reusable Python test harness that automates sending test prompts, recording results, and calculating attack success metrics.

What is Garak Scanning?

Install and run the Garak LLM vulnerability scanner against local models. Learn to configure probes, generators, and evaluators for comprehensive scanning.

Learn to test language models through their APIs including OpenAI, Anthropic, and local Ollama endpoints. Build reusable API testing functions with proper error handling.

What is System Prompt Extraction?

Hands-on techniques for extracting hidden system prompts from LLM-powered applications using direct asking, role-play, instruction replay, and other methods.

What is Output Manipulation?

Hands-on lab on forcing LLMs to output in specific formats including JSON injection, markdown manipulation, and structured output exploitation for data exfiltration.

What is Defense Evasion 101?

Encounter and bypass basic LLM defenses including keyword filters, content classifiers, and input preprocessing through encoding tricks, synonym substitution, and payload splitting.

What is Role-Play Attacks?

Hands-on lab exploring role-play and persona-based jailbreak techniques, where you craft scenarios that trick LLMs into adopting alternative identities to bypass safety guardrails.

Skip to main content

redteams.ai

Topics Glossary Blog ATT&CK Navigator Challenges

Loading...

redteams.ai

Built with Next.js

Topics Glossary Tags Blog ATT&CK Navigator Challenges

Methodology Contribute Bookmarks RSS GitHub Contact

Privacy Cookies Terms Imprint

// stay adversarial

Hands-On Labs & CTF
Beginner Labs

Getting Started with AI Red Teaming Labs

beginner7 min readUpdated 2026-03-13

Introduction to the hands-on lab environment for AI red teaming. Learn what you'll need, how labs are structured, and get an overview of all beginner exercises.

labs getting-started beginner setup

What You'll Learn

Understand the purpose and structure of AI red teaming labs
Identify the prerequisites needed to complete beginner labs
Navigate the lab progression from basic to advanced exercises
Set expectations for what you will learn across the beginner track

Welcome to the AI Red Teaming Labs

These labs provide hands-on experience with the techniques, tools, and methodologies used to evaluate the safety and robustness of large language models (LLMs). Unlike theoretical material, every lab in this series requires you to run real attacks against real models and observe the results firsthand.

Key Concept

AI red teaming is the practice of systematically probing AI systems to discover vulnerabilities, safety failures, and unintended behaviors. These labs teach you to do it responsibly, methodically, and effectively.

Who These Labs Are For

The beginner track is designed for practitioners who have:

Basic Python proficiency -- you can write functions, handle exceptions, and work with libraries
Foundational security awareness -- you understand concepts like input validation, authorization, and attack surfaces
Curiosity about LLM behavior -- you want to understand how language models fail and how to test them systematically

You do not need prior experience with machine learning, model internals, or advanced prompt engineering. The labs build these skills progressively.

Prerequisites

Before starting the labs, ensure you have the following ready:

Requirement	Minimum	Recommended
Python	3.9+	3.11+
RAM	8 GB	16 GB
Disk space	10 GB free	50 GB free (for local models)
API access	At least one LLM API key	OpenAI + Anthropic + local model
OS	Any (Linux, macOS, Windows with WSL)	Linux or macOS

Warning

Some labs require API credits. Start with free-tier or low-cost options. Local models via Ollama are free and sufficient for most beginner exercises.

How Labs Are Structured

Every lab in this series follows a consistent format:

Learning Objectives
Each lab begins with clear objectives so you know exactly what skills you will gain.
Prerequisites & Setup
Required tools, packages, and configurations are listed upfront. Complete these before attempting the exercises.
Background Context
Brief explanation of the technique or concept being explored, with links to deeper theory pages.
Step-by-Step Exercises
Detailed, numbered instructions that walk you through each attack or test. Every step includes the exact commands or code to run.
Expected Outputs
Sample outputs so you can verify your results match what is expected. Variations are noted where model behavior may differ.
Troubleshooting
Common issues and their solutions, so you spend time learning -- not debugging environment problems.
Knowledge Check
A quiz at the end of each lab to reinforce key concepts and verify understanding.

Beginner Lab Overview

The beginner track contains 11 hands-on labs that progress from environment setup through increasingly sophisticated attack techniques:

Foundation Labs

Lab	Title	What You Learn
1	Environment Setup	Install tools, configure API keys, verify your setup
2	Your First Prompt Injection	Basic prompt override techniques against a chatbot
3	Basic Jailbreak Techniques	Role-play, DAN-style, and framing-based jailbreaks

Tooling Labs

Lab	Title	What You Learn
4	Building a Simple Test Harness	Automate prompt testing with Python and CSV output
5	Scanning with Garak	Use the Garak framework for automated vulnerability scanning
6	API-Based Model Testing	Test models through OpenAI, Anthropic, and local APIs

Attack Technique Labs

Lab	Title	What You Learn
7	System Prompt Extraction	Extract hidden system prompts from deployed models
8	Output Format Manipulation	Force models into specific output formats for exploitation
9	Basic Defense Evasion	Bypass keyword filters and basic content classifiers
10	Role-Play & Persona Attacks	Craft persona-based attacks and test their effectiveness
11	Encoding & Obfuscation	Use encoding tricks to bypass model safety filters

Recommended Progression

Tip

Complete the labs in order. Each lab builds on skills and tools from previous ones. The test harness you build in Lab 4 is reused throughout the remaining labs.

While the labs are designed to be followed sequentially, here are some alternative paths based on your interests:

Tool-focused path: Labs 1, 4, 5, 6 -- focuses on building and using testing infrastructure
Attack-focused path: Labs 1, 2, 3, 7, 8, 10, 11 -- focuses on hands-on attack techniques
Defense-aware path: Labs 1, 2, 9, 8 -- focuses on understanding and bypassing defenses

Ethical Guidelines

Danger

These labs are for authorized security testing and educational purposes only. Never use these techniques against systems you do not own or have explicit written permission to test. Responsible disclosure applies to any vulnerabilities you discover.

All labs in this series follow responsible AI red teaming principles:

Test only what you are authorized to test -- your own deployments, or models with explicit testing permissions
Document everything -- maintain logs of all tests for accountability
Report vulnerabilities responsibly -- follow the vendor's disclosure process
Never weaponize findings -- the goal is to improve safety, not to cause harm
Respect rate limits and terms of service -- do not abuse API access

For a deeper discussion of ethics and legal considerations, see Red Team Ethics and Legal Considerations.

What Comes Next

After completing the beginner track, you will be ready for:

Intermediate Labs -- multi-step attacks, advanced jailbreaks, tool-use exploitation
Advanced Labs -- automated red teaming pipelines, model-specific attacks, fine-tuning exploits
CTF Challenges -- competitive capture-the-flag exercises to test your skills

References

"OWASP Top 10 for LLM Applications" - OWASP (2025) - Industry-standard classification of LLM security risks that maps to lab exercises
"AI Risk Management Framework" - NIST (2023) - Federal guidelines for identifying and managing AI risks, relevant to red team methodology
"Red Teaming Language Models with Language Models" - Perez et al. (2022) - Foundational paper on automated red teaming approaches
"Garak Documentation" - NVIDIA/garak (2024) - Official documentation for the Garak LLM vulnerability scanner used in Lab 5

Key Takeaway

The beginner labs provide a structured, hands-on introduction to AI red teaming. By completing them in order, you will build both the technical skills and the testing infrastructure needed for more advanced work. Every lab produces real, observable results -- there is no substitute for running the attacks yourself.

Knowledge Check

What is the recommended approach for completing the beginner labs?

Knowledge Check

Which of the following is NOT a prerequisite for the beginner labs?

Learning Path

0/155 completed

~2354 min total155 lessons

1
Environment Setupbeginner
Step-by-step guide to setting up a complete AI red teaming environment with Python, API clients, scanning tools, and local models.
8m
2
First Injectionbeginner
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
10m
3
Jailbreak Basicsbeginner
Hands-on exploration of jailbreak techniques including role-play, DAN-style prompts, and academic framing against multiple models.
11m
4
Simple Test Harnessbeginner
Build a reusable Python test harness that automates sending test prompts, recording results, and calculating attack success metrics.
12m
5
Garak Scanningbeginner
Install and run the Garak LLM vulnerability scanner against local models. Learn to configure probes, generators, and evaluators for comprehensive scanning.
8m
6
API Testingbeginner
Learn to test language models through their APIs including OpenAI, Anthropic, and local Ollama endpoints. Build reusable API testing functions with proper error handling.
11m
7
System Prompt Extractionbeginner
Hands-on techniques for extracting hidden system prompts from LLM-powered applications using direct asking, role-play, instruction replay, and other methods.
10m
8
Output Manipulationbeginner
Hands-on lab on forcing LLMs to output in specific formats including JSON injection, markdown manipulation, and structured output exploitation for data exfiltration.
10m
9
Defense Evasion 101intermediate
Encounter and bypass basic LLM defenses including keyword filters, content classifiers, and input preprocessing through encoding tricks, synonym substitution, and payload splitting.
11m
10
Role-Play Attacksbeginner
Hands-on lab exploring role-play and persona-based jailbreak techniques, where you craft scenarios that trick LLMs into adopting alternative identities to bypass safety guardrails.
11m
11
Encoding & Obfuscationbeginner
Hands-on lab exploring how text encoding, character substitution, and obfuscation methods can bypass LLM content filters and safety mechanisms.
10m
12
First Jailbreakbeginner
Try basic jailbreak techniques against a local model using Ollama, learning the difference between prompt injection and jailbreaking through hands-on experimentation.
12m
13
Lab: Setting Up Ollama for Local LLM Testingbeginner
Install and configure Ollama for local LLM red teaming, download models, perform basic interactions, and compare safety behavior between local and API-hosted models.
11m
14
Lab: Anthropic Claude API Basicsbeginner
Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.
11m
15
Safety Boundariesbeginner
Systematically discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.
10m
16
Lab: Structured Output Manipulationbeginner
Explore how JSON mode, schema injection, and forced output formats can bypass LLM safety filters and manipulate structured responses.
10m
17
Prompt Leakingbeginner
Learn to extract system prompts from chatbots using common leaking techniques including direct requests, indirect elicitation, and format manipulation.
10m
18
Lab: API Key Securitybeginner
Learn common API key exposure vectors, secure key management with .env files, detect keys in git history, implement key rotation, and apply least-privilege principles.
13m
19
Role-Play Attacks (Hands-On)beginner
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
12m
20
Lab: Analyzing LLM Responsesbeginner
Learn to identify safety refusals, partial compliance, inconsistent behavior, and response fingerprinting techniques when analyzing LLM outputs during red team testing.
13m
21
Output Format Exploitationbeginner
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
11m
22
Build Your First Defensebeginner
Create a simple input filter that blocks common prompt injection patterns, then test it against the attack techniques you have learned in previous labs.
12m
23
Lab: Context Overflow Attacksbeginner
Explore context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.
11m
24
Ethical Red Teamingbeginner
Practice responsible AI red teaming with proper documentation, scope management, and ethical decision-making frameworks.
12m
25
Lab: Delimiter Injection Attacksbeginner
Craft payloads that exploit XML, markdown, and special token delimiters to break instruction boundaries and manipulate how LLMs parse system prompts versus user input.
11m
26
Lab: Markdown Injectionbeginner
Inject images, links, and formatting into LLM responses that exfiltrate data or alter display rendering in chat interfaces.
11m
27
Model Comparisonbeginner
Test the same attack techniques against different language models and compare their safety behaviors, refusal patterns, and vulnerability profiles.
11m
28
Lab: Few-Shot Manipulation Attacksbeginner
Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired adversarial behavior through fabricated conversation examples.
13m
29
Lab: Injection Techniques Surveybeginner
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
14m
30
Lab: Encoding Bypassesbeginner
Test Base64, ROT13, hex, and Unicode encoding techniques to evade LLM content filters while preserving the semantic meaning of adversarial payloads.
10m
31
Lab: System Prompt Overridebeginner
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
15m
32
Lab: Delimiter Escape Attacksbeginner
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
12m
33
Lab: Multi-Turn Escalation Attacksbeginner
Build trust over several conversation messages, then exploit established rapport to bypass safety mechanisms through gradual, multi-turn escalation techniques.
11m
34
Lab: Instruction Following Prioritybeginner
Test how language models prioritize conflicting instructions from system prompts, user messages, and embedded directives to understand the instruction hierarchy.
14m
35
Lab: System Prompt Reconstructionbeginner
Use various techniques to systematically extract and reconstruct the full system prompt of an LLM application, combining direct, indirect, and incremental extraction methods.
10m
36
Lab: Payload Craftingbeginner
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
14m
37
Lab: Context Manipulationbeginner
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
13m
38
Lab: Multi-Language Injectionbeginner
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
13m
39
Lab: Injection Detection Toolbeginner
Build a basic prompt injection detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
14m
40
Lab: Output Steeringbeginner
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
14m
41
Lab: Build Your First Defensebeginner
Hands-on lab for building a basic AI input defense system using keyword matching, regex filters, and simple classifiers to block prompt injection attempts.
12m
42
Lab: Defense Bypass Basicsbeginner
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
15m
43
Lab: Ethical Red Teaming (Beginner Lab)beginner
Hands-on lab for practicing responsible AI red teaming with proper documentation, scope management, ethical boundaries, and disclosure procedures.
15m
44
Lab: Garak Setup and First Scanbeginner
Install and configure garak, the LLM vulnerability scanner, then run your first automated scan against a local model to discover common vulnerabilities.
12m
45
Lab: Compare Model Safetybeginner
Hands-on lab for running identical safety tests against GPT-4, Claude, Gemini, and Llama to compare how different models handle prompt injection, jailbreaks, and safety boundary enforcement.
11m
46
Lab: PyRIT Setup and First Attackbeginner
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated red teaming, then run your first orchestrated attack against a local model.
13m
47
Lab: Promptfoo Setup and First Evalbeginner
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
14m
48
Your First LLM API Call with OpenAIbeginner
Set up your Python environment and make your first LLM API call to understand request/response patterns.
18m
49
Your First Claude API Callbeginner
Set up the Anthropic SDK and make your first Claude API call with system prompts and messages.
18m
50
Basic Role-Play Prompt Injectionbeginner
Use role-play techniques to bypass basic LLM safety guardrails through character assumption.
18m
51
System Prompt Extraction Fundamentalsbeginner
Learn to extract hidden system prompts from LLM applications using direct and indirect techniques.
18m
52
Encoding and Obfuscation Basicsbeginner
Explore how Base64, ROT13, and Unicode encoding can bypass input filters in LLM applications.
18m
53
Output Format Manipulationbeginner
Force LLMs to output in specific formats (JSON, XML, code) to bypass output sanitization.
18m
54
Your First Garak Vulnerability Scanbeginner
Install and run NVIDIA's Garak tool to perform automated vulnerability scanning on an LLM endpoint.
18m
55
Setting Up Promptfoo for LLM Evaluationbeginner
Configure Promptfoo to create automated test suites for evaluating LLM safety and robustness.
18m
56
Temperature and Sampling Effects on Jailbreaksbeginner
Experiment with temperature and top-p settings to understand their impact on jailbreak success rates.
18m
57
Basic Context Window Overflowbeginner
Fill the context window with padding content to push safety instructions out of the attention window.
18m
58
Multi-Turn Conversation Probingbeginner
Use multi-turn conversations to gradually escalate requests and probe LLM safety boundaries.
18m
59
API Response Parsing and Analysisbeginner
Parse and analyze LLM API responses to identify anomalies, safety filter triggers, and information leakage patterns.
17m
60
Few-Shot Injection Fundamentalsbeginner
Craft few-shot examples that prime the model to follow attacker instructions in subsequent turns.
18m
61
Delimiter Escape Techniquesbeginner
Practice escaping common delimiters used to separate system prompts from user input.
18m
62
Introduction to Safety Testingbeginner
Learn the fundamentals of LLM safety testing including test case design, baseline measurement, and result documentation.
17m
63
Instruction Hierarchy Testingbeginner
Test how models prioritize conflicting instructions between system, user, and assistant roles.
18m
64
LLM Playground Explorationbeginner
Explore LLM provider playgrounds to understand model behavior, parameter effects, and interactive testing workflows.
17m
65
Basic Data Exfiltration Techniquesbeginner
Extract sensitive information from LLM applications using social engineering and misdirection.
18m
66
Introduction to LLM Fuzzingbeginner
Learn the basics of fuzzing LLM inputs by mutating prompts to discover unexpected behaviors and edge cases.
17m
67
Identifying LLM Defensesbeginner
Map the defensive layers of an LLM application through systematic probing and error analysis.
18m
68
Token Manipulation Basicsbeginner
Understand how tokenization affects LLM behavior and practice basic token-level manipulation techniques.
16m
69
Embedding Fundamentals for Red Teamersbeginner
Learn embedding fundamentals including vector similarity, semantic search, and how embeddings enable RAG systems.
17m
70
Token Counting and Estimationbeginner
Understand tokenization by counting and estimating tokens across different models and encoders.
18m
71
API Rate Limit and Error Handlingbeginner
Test LLM API rate limits and implement proper error handling for automated testing workflows.
18m
72
Format String Injection in LLMsbeginner
Practice injecting format strings and template directives to manipulate LLM output structure and content.
17m
73
Multi-Language Prompt Testingbeginner
Test LLM safety boundaries across multiple languages to identify inconsistencies in safety training coverage.
17m
74
Setting Up Payload Loggingbeginner
Build a payload logging system to track prompt injection attempts and model responses.
18m
75
Basic Classifier Evasionbeginner
Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.
18m
76
Multimodal Input Testing Basicsbeginner
Introduction to testing multimodal LLMs with image and text inputs to understand cross-modal behavior.
16m
77
Introduction to Defense Testingbeginner
Learn to systematically test LLM application defenses by probing input filters, output validators, and guardrails.
17m
78
Setting Up Automated LLM Testingbeginner
Configure automated testing pipelines for LLM red teaming using Python scripts, test runners, and result collection.
17m
79
Red Team Report Writing Basicsbeginner
Practice writing clear, actionable red team findings reports with evidence, risk ratings, and remediation guidance.
17m
80
Evidence Collection for LLM Testingbeginner
Set up evidence collection workflows including request/response logging, screenshots, and reproducibility documentation.
17m
81
Vulnerability Scoring Fundamentalsbeginner
Learn vulnerability scoring frameworks adapted for LLM systems including severity, exploitability, and impact assessment.
16m
82
Comparing Red Team Testing Toolsbeginner
Compare Garak, PyRIT, and Promptfoo capabilities through hands-on exercises using each tool against the same target.
17m
83
Testing Environment Hardeningbeginner
Secure your AI red team testing environment to prevent credential leaks, accidental production testing, and data exposure.
16m
84
System Prompt Enumeration Techniquesbeginner
Practice systematic approaches to enumerating hidden system prompt content through varied probing strategies.
17m
85
Response Consistency Testingbeginner
Test LLM response consistency across repeated queries to identify non-deterministic safety boundary behavior.
16m
86
Jailbreak Technique Taxonomybeginner
Explore the major categories of jailbreak techniques and practice classifying attack payloads by technique type.
17m
87
Basic Model Fingerprintingbeginner
Learn to identify which LLM model powers an application through behavioral fingerprinting and response analysis.
16m
88
Prompt Template Vulnerability Testingbeginner
Test common prompt template patterns for injection vulnerabilities including string interpolation and placeholder abuse.
17m
89
Error Message Analysis for Reconbeginner
Analyze LLM application error messages to extract information about system architecture, models, and defensive layers.
17m
90
Rate Limit Enumeration and Bypassbeginner
Enumerate API rate limits and test common bypass techniques including header manipulation and request distribution.
16m
91
Conversation History Manipulationbeginner
Test how LLM applications handle conversation history including truncation, injection, and context window management.
17m
92
Detecting Output Filtersbeginner
Identify and characterize output filtering mechanisms in LLM applications through systematic response analysis.
16m
93
JSON Output Mode Security Testingbeginner
Test JSON output mode for injection vulnerabilities including schema manipulation and structured output exploitation.
17m
94
Analyzing Model Refusal Patternsbeginner
Systematically map model refusal patterns to understand safety training boundaries and identify potential bypasses.
16m
95
Social Engineering LLM Applicationsbeginner
Practice social engineering techniques adapted for LLM applications including authority escalation and urgency injection.
17m
96
Hallucination Detection Basicsbeginner
Learn to detect and trigger hallucinations in LLM outputs including factual errors, fabricated citations, and invented APIs.
16m
97
Content Policy Boundary Mappingbeginner
Map the content policy boundaries of LLM applications to identify permitted and restricted topic areas systematically.
17m
98
Crafting Basic Adversarial Examplesbeginner
Create basic adversarial examples that cause LLMs to misclassify, misinterpret, or bypass safety checks on text input.
17m
99
API Authentication Security Testingbeginner
Test LLM API authentication mechanisms for vulnerabilities including token leakage, replay attacks, and scope bypass.
17m
100
Simple Payload Encoding Techniquesbeginner
Practice encoding injection payloads using Base64, hex, URL encoding, and Unicode to bypass basic input filters.
17m
101
Designing LLM Red Team Test Casesbeginner
Design effective red team test cases with clear objectives, success criteria, and reproducible execution procedures.
17m
102
Local Model Setup for Testingbeginner
Set up local LLM instances using Ollama for safe, cost-free red team testing without API costs or rate limits.
17m
103
Running Safety Benchmarksbeginner
Run standardized safety benchmarks against LLM models to establish baseline safety profiles for comparison.
16m
104
Injection Attempt Log Analysisbeginner
Build a logging pipeline to capture and analyze injection attempts for pattern discovery and technique refinement.
16m
105
Temperature and Sampling Security Effectsbeginner
Explore how temperature and sampling parameters affect model susceptibility to jailbreaks and safety boundary enforcement.
17m
106
Testing Prompt Leaking Defensesbeginner
Test various prompt leaking defense configurations to evaluate their effectiveness against extraction techniques.
17m
107
Basic RAG System Security Testingbeginner
Introduction to testing Retrieval-Augmented Generation systems for document injection and retrieval manipulation vulnerabilities.
17m
108
Prompt Leaking via Summarization Requestsbeginner
Use summarization requests to trick an LLM into revealing its hidden system prompt through indirect disclosure.
15m
109
Analyzing Refusal Messages for Intelbeginner
Extract useful information about model configuration and guardrails by systematically analyzing refusal messages.
15m
110
Character Encoding Bypass Techniquesbeginner
Bypass input filters using Unicode normalization, homoglyph substitution, and mixed-script encoding.
15m
111
Prompt Injection via Translationbeginner
Exploit LLM translation capabilities to smuggle instructions through language boundaries.
15m
112
Completion Hijacking Fundamentalsbeginner
Craft partial sentences that steer model completions toward attacker-desired outputs.
15m
113
Markdown Rendering Exfiltrationbeginner
Exploit Markdown rendering in chat UIs to exfiltrate data through image tags and links.
15m
114
Your First LLM Guard Scanbeginner
Install and configure LLM Guard to scan inputs and outputs, understanding detection capabilities and gaps.
15m
115
Temperature and Top-K Effects on Safetybeginner
Systematically test how temperature, top-k, and top-p parameters affect safety guardrail effectiveness.
15m
116
System Prompt Reconstruction from Cluesbeginner
Piece together a hidden system prompt by collecting partial information from multiple probe techniques.
15m
117
Error Message Exploitationbeginner
Trigger and analyze error messages from LLM applications to reveal architecture and configuration details.
15m
118
API Response Header Analysisbeginner
Analyze HTTP response headers from LLM APIs to fingerprint providers, versions, and middleware.
15m
119
Basic Rate Limit Abuse Patternsbeginner
Test common rate-limit bypass patterns including header manipulation and endpoint discovery.
15m
120
Simple Output Constraint Attacksbeginner
Force models to output in constrained formats that bypass output safety filters.
15m
121
Conversation Reset Attacksbeginner
Exploit conversation resets and context clearing to weaken model adherence to safety instructions.
15m
122
Basic RAG Query Injectionbeginner
Craft user queries that manipulate RAG retrieval to surface unintended documents.
15m
123
Your First Inspect AI Evaluationbeginner
Set up and run a basic AI safety evaluation using the UK AISI Inspect framework.
15m
124
JSON Injection Basicsbeginner
Inject adversarial content through JSON-formatted inputs to exploit structured data processing.
15m
125
XML Injection in LLM Contextsbeginner
Exploit XML tag handling in LLM applications to manipulate instruction parsing.
15m
126
Introduction to NeMo Guardrailsbeginner
Set up NVIDIA NeMo Guardrails and create your first Colang-based guardrail for an LLM application.
15m
127
Model Fingerprinting Basicsbeginner
Identify which LLM model powers an application through behavioral fingerprinting techniques.
15m
128
Hello World Prompt Injectionbeginner
Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.
18m
129
Safety Boundary Mapping Exercisebeginner
Systematically map the safety boundaries of an LLM application across multiple topic categories.
15m
130
Multi-Provider API Explorationbeginner
Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.
18m
131
Prompt Injection via File Namesbeginner
Embed prompt injection payloads in filenames and metadata of uploaded documents.
15m
132
Response Timing Side-Channel Analysisbeginner
Use response timing differences to infer information about model processing and guardrail activation.
15m
133
Safety Boundary Mappingbeginner
Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.
18m
134
Basic Payload Mutation Techniquesbeginner
Learn fundamental payload mutation strategies including synonym replacement and structural transforms.
15m
135
Response Analysis Fundamentalsbeginner
Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.
18m
136
Chatbot Persona and Capability Mappingbeginner
Map the configured persona, capabilities, and restrictions of an LLM chatbot through systematic probing.
15m
137
LLM Playground Security Testingbeginner
Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.
18m
138
API Key Scope and Permission Testingbeginner
Test API key scoping and permission boundaries to identify over-privileged access configurations.
15m
139
Basic Indirect Prompt Injectionbeginner
Plant and trigger a basic indirect prompt injection payload in content consumed by an LLM.
15m
140
Model Security Comparison Labbeginner
Compare the security posture of different LLM models by running identical test suites across providers.
18m
141
Emoji and Unicode Injection Techniquesbeginner
Use emoji sequences and Unicode special characters to bypass text-based input filters.
15m
142
JSON Output Exploitation Basicsbeginner
Exploit LLM JSON mode to bypass content filters and extract information through structured output channels.
18m
143
Your First HarmBench Evaluationbeginner
Run a standardized safety evaluation using the HarmBench framework against a target model.
15m
144
Conversation History Analysisbeginner
Analyze how conversation history influences model behavior and identify opportunities for context manipulation.
18m
145
System Prompt Extraction via Error Injectionbeginner
Trigger specific error conditions that cause the model to inadvertently reveal system prompt contents.
15m
146
Error Message Intelligence Gatheringbeginner
Extract system architecture information from error messages and response patterns in LLM applications.
18m
147
Basic Automated Testing Setupbeginner
Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.
18m
148
Embedding Basics for Securitybeginner
Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.
18m
149
Safety Training Boundary Probingbeginner
Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.
18m
150
Output Format Control Labbeginner
Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.
18m
151
Basic Defense Mechanism Testingbeginner
Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.
18m
152
Prompt Structure Analysis Labbeginner
Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.
18m
153
Rate Limit and Quota Mappingbeginner
Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.
18m
154
Security Finding Documentation Exercisebeginner
Practice documenting security findings in a professional format with reproducible steps and impact assessment.
18m
155
Red Team Tool Installation and Configurationbeginner
Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.
18m

Start Learning

Getting Started with AI Red Teaming Labs

beginner7 min readUpdated 2026-03-13

Introduction to the hands-on lab environment for AI red teaming. Learn what you'll need, how labs are structured, and get an overview of all beginner exercises.

labs getting-started beginner setup

What You'll Learn

Understand the purpose and structure of AI red teaming labs
Identify the prerequisites needed to complete beginner labs
Navigate the lab progression from basic to advanced exercises
Set expectations for what you will learn across the beginner track

Welcome to the AI Red Teaming Labs

These labs provide hands-on experience with the techniques, tools, and methodologies used to evaluate the safety and robustness of large language models (LLMs). Unlike theoretical material, every lab in this series requires you to run real attacks against real models and observe the results firsthand.

Key Concept

AI red teaming is the practice of systematically probing AI systems to discover vulnerabilities, safety failures, and unintended behaviors. These labs teach you to do it responsibly, methodically, and effectively.

Who These Labs Are For

The beginner track is designed for practitioners who have:

Basic Python proficiency -- you can write functions, handle exceptions, and work with libraries
Foundational security awareness -- you understand concepts like input validation, authorization, and attack surfaces
Curiosity about LLM behavior -- you want to understand how language models fail and how to test them systematically

You do not need prior experience with machine learning, model internals, or advanced prompt engineering. The labs build these skills progressively.

Prerequisites

Before starting the labs, ensure you have the following ready:

Requirement	Minimum	Recommended
Python	3.9+	3.11+
RAM	8 GB	16 GB
Disk space	10 GB free	50 GB free (for local models)
API access	At least one LLM API key	OpenAI + Anthropic + local model
OS	Any (Linux, macOS, Windows with WSL)	Linux or macOS

Warning

Some labs require API credits. Start with free-tier or low-cost options. Local models via Ollama are free and sufficient for most beginner exercises.

How Labs Are Structured

Every lab in this series follows a consistent format:

Learning Objectives
Each lab begins with clear objectives so you know exactly what skills you will gain.
Prerequisites & Setup
Required tools, packages, and configurations are listed upfront. Complete these before attempting the exercises.
Background Context
Brief explanation of the technique or concept being explored, with links to deeper theory pages.
Step-by-Step Exercises
Detailed, numbered instructions that walk you through each attack or test. Every step includes the exact commands or code to run.
Expected Outputs
Sample outputs so you can verify your results match what is expected. Variations are noted where model behavior may differ.
Troubleshooting
Common issues and their solutions, so you spend time learning -- not debugging environment problems.
Knowledge Check
A quiz at the end of each lab to reinforce key concepts and verify understanding.

Beginner Lab Overview

The beginner track contains 11 hands-on labs that progress from environment setup through increasingly sophisticated attack techniques:

Foundation Labs

Lab	Title	What You Learn
1	Environment Setup	Install tools, configure API keys, verify your setup
2	Your First Prompt Injection	Basic prompt override techniques against a chatbot
3	Basic Jailbreak Techniques	Role-play, DAN-style, and framing-based jailbreaks

Tooling Labs

Lab	Title	What You Learn
4	Building a Simple Test Harness	Automate prompt testing with Python and CSV output
5	Scanning with Garak	Use the Garak framework for automated vulnerability scanning
6	API-Based Model Testing	Test models through OpenAI, Anthropic, and local APIs

Attack Technique Labs

Lab	Title	What You Learn
7	System Prompt Extraction	Extract hidden system prompts from deployed models
8	Output Format Manipulation	Force models into specific output formats for exploitation
9	Basic Defense Evasion	Bypass keyword filters and basic content classifiers
10	Role-Play & Persona Attacks	Craft persona-based attacks and test their effectiveness
11	Encoding & Obfuscation	Use encoding tricks to bypass model safety filters

Recommended Progression

Tip

Complete the labs in order. Each lab builds on skills and tools from previous ones. The test harness you build in Lab 4 is reused throughout the remaining labs.

While the labs are designed to be followed sequentially, here are some alternative paths based on your interests:

Tool-focused path: Labs 1, 4, 5, 6 -- focuses on building and using testing infrastructure
Attack-focused path: Labs 1, 2, 3, 7, 8, 10, 11 -- focuses on hands-on attack techniques
Defense-aware path: Labs 1, 2, 9, 8 -- focuses on understanding and bypassing defenses

Ethical Guidelines

Danger

These labs are for authorized security testing and educational purposes only. Never use these techniques against systems you do not own or have explicit written permission to test. Responsible disclosure applies to any vulnerabilities you discover.

All labs in this series follow responsible AI red teaming principles:

Test only what you are authorized to test -- your own deployments, or models with explicit testing permissions
Document everything -- maintain logs of all tests for accountability
Report vulnerabilities responsibly -- follow the vendor's disclosure process
Never weaponize findings -- the goal is to improve safety, not to cause harm
Respect rate limits and terms of service -- do not abuse API access

For a deeper discussion of ethics and legal considerations, see Red Team Ethics and Legal Considerations.

What Comes Next

After completing the beginner track, you will be ready for:

Intermediate Labs -- multi-step attacks, advanced jailbreaks, tool-use exploitation
Advanced Labs -- automated red teaming pipelines, model-specific attacks, fine-tuning exploits
CTF Challenges -- competitive capture-the-flag exercises to test your skills

References

"OWASP Top 10 for LLM Applications" - OWASP (2025) - Industry-standard classification of LLM security risks that maps to lab exercises
"AI Risk Management Framework" - NIST (2023) - Federal guidelines for identifying and managing AI risks, relevant to red team methodology
"Red Teaming Language Models with Language Models" - Perez et al. (2022) - Foundational paper on automated red teaming approaches
"Garak Documentation" - NVIDIA/garak (2024) - Official documentation for the Garak LLM vulnerability scanner used in Lab 5

Key Takeaway

The beginner labs provide a structured, hands-on introduction to AI red teaming. By completing them in order, you will build both the technical skills and the testing infrastructure needed for more advanced work. Every lab produces real, observable results -- there is no substitute for running the attacks yourself.

Knowledge Check

What is the recommended approach for completing the beginner labs?

Knowledge Check

Which of the following is NOT a prerequisite for the beginner labs?

Learning Path

0/155 completed

~2354 min total155 lessons

1
Environment Setupbeginner
Step-by-step guide to setting up a complete AI red teaming environment with Python, API clients, scanning tools, and local models.
8m
2
First Injectionbeginner
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
10m
3
Jailbreak Basicsbeginner
Hands-on exploration of jailbreak techniques including role-play, DAN-style prompts, and academic framing against multiple models.
11m
4
Simple Test Harnessbeginner
Build a reusable Python test harness that automates sending test prompts, recording results, and calculating attack success metrics.
12m
5
Garak Scanningbeginner
Install and run the Garak LLM vulnerability scanner against local models. Learn to configure probes, generators, and evaluators for comprehensive scanning.
8m
6
API Testingbeginner
Learn to test language models through their APIs including OpenAI, Anthropic, and local Ollama endpoints. Build reusable API testing functions with proper error handling.
11m
7
System Prompt Extractionbeginner
Hands-on techniques for extracting hidden system prompts from LLM-powered applications using direct asking, role-play, instruction replay, and other methods.
10m
8
Output Manipulationbeginner
Hands-on lab on forcing LLMs to output in specific formats including JSON injection, markdown manipulation, and structured output exploitation for data exfiltration.
10m
9
Defense Evasion 101intermediate
Encounter and bypass basic LLM defenses including keyword filters, content classifiers, and input preprocessing through encoding tricks, synonym substitution, and payload splitting.
11m
10
Role-Play Attacksbeginner
Hands-on lab exploring role-play and persona-based jailbreak techniques, where you craft scenarios that trick LLMs into adopting alternative identities to bypass safety guardrails.
11m
11
Encoding & Obfuscationbeginner
Hands-on lab exploring how text encoding, character substitution, and obfuscation methods can bypass LLM content filters and safety mechanisms.
10m
12
First Jailbreakbeginner
Try basic jailbreak techniques against a local model using Ollama, learning the difference between prompt injection and jailbreaking through hands-on experimentation.
12m
13
Lab: Setting Up Ollama for Local LLM Testingbeginner
Install and configure Ollama for local LLM red teaming, download models, perform basic interactions, and compare safety behavior between local and API-hosted models.
11m
14
Lab: Anthropic Claude API Basicsbeginner
Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.
11m
15
Safety Boundariesbeginner
Systematically discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.
10m
16
Lab: Structured Output Manipulationbeginner
Explore how JSON mode, schema injection, and forced output formats can bypass LLM safety filters and manipulate structured responses.
10m
17
Prompt Leakingbeginner
Learn to extract system prompts from chatbots using common leaking techniques including direct requests, indirect elicitation, and format manipulation.
10m
18
Lab: API Key Securitybeginner
Learn common API key exposure vectors, secure key management with .env files, detect keys in git history, implement key rotation, and apply least-privilege principles.
13m
19
Role-Play Attacks (Hands-On)beginner
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
12m
20
Lab: Analyzing LLM Responsesbeginner
Learn to identify safety refusals, partial compliance, inconsistent behavior, and response fingerprinting techniques when analyzing LLM outputs during red team testing.
13m
21
Output Format Exploitationbeginner
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
11m
22
Build Your First Defensebeginner
Create a simple input filter that blocks common prompt injection patterns, then test it against the attack techniques you have learned in previous labs.
12m
23
Lab: Context Overflow Attacksbeginner
Explore context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.
11m
24
Ethical Red Teamingbeginner
Practice responsible AI red teaming with proper documentation, scope management, and ethical decision-making frameworks.
12m
25
Lab: Delimiter Injection Attacksbeginner
Craft payloads that exploit XML, markdown, and special token delimiters to break instruction boundaries and manipulate how LLMs parse system prompts versus user input.
11m
26
Lab: Markdown Injectionbeginner
Inject images, links, and formatting into LLM responses that exfiltrate data or alter display rendering in chat interfaces.
11m
27
Model Comparisonbeginner
Test the same attack techniques against different language models and compare their safety behaviors, refusal patterns, and vulnerability profiles.
11m
28
Lab: Few-Shot Manipulation Attacksbeginner
Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired adversarial behavior through fabricated conversation examples.
13m
29
Lab: Injection Techniques Surveybeginner
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
14m
30
Lab: Encoding Bypassesbeginner
Test Base64, ROT13, hex, and Unicode encoding techniques to evade LLM content filters while preserving the semantic meaning of adversarial payloads.
10m
31
Lab: System Prompt Overridebeginner
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
15m
32
Lab: Delimiter Escape Attacksbeginner
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
12m
33
Lab: Multi-Turn Escalation Attacksbeginner
Build trust over several conversation messages, then exploit established rapport to bypass safety mechanisms through gradual, multi-turn escalation techniques.
11m
34
Lab: Instruction Following Prioritybeginner
Test how language models prioritize conflicting instructions from system prompts, user messages, and embedded directives to understand the instruction hierarchy.
14m
35
Lab: System Prompt Reconstructionbeginner
Use various techniques to systematically extract and reconstruct the full system prompt of an LLM application, combining direct, indirect, and incremental extraction methods.
10m
36
Lab: Payload Craftingbeginner
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
14m
37
Lab: Context Manipulationbeginner
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
13m
38
Lab: Multi-Language Injectionbeginner
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
13m
39
Lab: Injection Detection Toolbeginner
Build a basic prompt injection detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
14m
40
Lab: Output Steeringbeginner
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
14m
41
Lab: Build Your First Defensebeginner
Hands-on lab for building a basic AI input defense system using keyword matching, regex filters, and simple classifiers to block prompt injection attempts.
12m
42
Lab: Defense Bypass Basicsbeginner
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
15m
43
Lab: Ethical Red Teaming (Beginner Lab)beginner
Hands-on lab for practicing responsible AI red teaming with proper documentation, scope management, ethical boundaries, and disclosure procedures.
15m
44
Lab: Garak Setup and First Scanbeginner
Install and configure garak, the LLM vulnerability scanner, then run your first automated scan against a local model to discover common vulnerabilities.
12m
45
Lab: Compare Model Safetybeginner
Hands-on lab for running identical safety tests against GPT-4, Claude, Gemini, and Llama to compare how different models handle prompt injection, jailbreaks, and safety boundary enforcement.
11m
46
Lab: PyRIT Setup and First Attackbeginner
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated red teaming, then run your first orchestrated attack against a local model.
13m
47
Lab: Promptfoo Setup and First Evalbeginner
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
14m
48
Your First LLM API Call with OpenAIbeginner
Set up your Python environment and make your first LLM API call to understand request/response patterns.
18m
49
Your First Claude API Callbeginner
Set up the Anthropic SDK and make your first Claude API call with system prompts and messages.
18m
50
Basic Role-Play Prompt Injectionbeginner
Use role-play techniques to bypass basic LLM safety guardrails through character assumption.
18m
51
System Prompt Extraction Fundamentalsbeginner
Learn to extract hidden system prompts from LLM applications using direct and indirect techniques.
18m
52
Encoding and Obfuscation Basicsbeginner
Explore how Base64, ROT13, and Unicode encoding can bypass input filters in LLM applications.
18m
53
Output Format Manipulationbeginner
Force LLMs to output in specific formats (JSON, XML, code) to bypass output sanitization.
18m
54
Your First Garak Vulnerability Scanbeginner
Install and run NVIDIA's Garak tool to perform automated vulnerability scanning on an LLM endpoint.
18m
55
Setting Up Promptfoo for LLM Evaluationbeginner
Configure Promptfoo to create automated test suites for evaluating LLM safety and robustness.
18m
56
Temperature and Sampling Effects on Jailbreaksbeginner
Experiment with temperature and top-p settings to understand their impact on jailbreak success rates.
18m
57
Basic Context Window Overflowbeginner
Fill the context window with padding content to push safety instructions out of the attention window.
18m
58
Multi-Turn Conversation Probingbeginner
Use multi-turn conversations to gradually escalate requests and probe LLM safety boundaries.
18m
59
API Response Parsing and Analysisbeginner
Parse and analyze LLM API responses to identify anomalies, safety filter triggers, and information leakage patterns.
17m
60
Few-Shot Injection Fundamentalsbeginner
Craft few-shot examples that prime the model to follow attacker instructions in subsequent turns.
18m
61
Delimiter Escape Techniquesbeginner
Practice escaping common delimiters used to separate system prompts from user input.
18m
62
Introduction to Safety Testingbeginner
Learn the fundamentals of LLM safety testing including test case design, baseline measurement, and result documentation.
17m
63
Instruction Hierarchy Testingbeginner
Test how models prioritize conflicting instructions between system, user, and assistant roles.
18m
64
LLM Playground Explorationbeginner
Explore LLM provider playgrounds to understand model behavior, parameter effects, and interactive testing workflows.
17m
65
Basic Data Exfiltration Techniquesbeginner
Extract sensitive information from LLM applications using social engineering and misdirection.
18m
66
Introduction to LLM Fuzzingbeginner
Learn the basics of fuzzing LLM inputs by mutating prompts to discover unexpected behaviors and edge cases.
17m
67
Identifying LLM Defensesbeginner
Map the defensive layers of an LLM application through systematic probing and error analysis.
18m
68
Token Manipulation Basicsbeginner
Understand how tokenization affects LLM behavior and practice basic token-level manipulation techniques.
16m
69
Embedding Fundamentals for Red Teamersbeginner
Learn embedding fundamentals including vector similarity, semantic search, and how embeddings enable RAG systems.
17m
70
Token Counting and Estimationbeginner
Understand tokenization by counting and estimating tokens across different models and encoders.
18m
71
API Rate Limit and Error Handlingbeginner
Test LLM API rate limits and implement proper error handling for automated testing workflows.
18m
72
Format String Injection in LLMsbeginner
Practice injecting format strings and template directives to manipulate LLM output structure and content.
17m
73
Multi-Language Prompt Testingbeginner
Test LLM safety boundaries across multiple languages to identify inconsistencies in safety training coverage.
17m
74
Setting Up Payload Loggingbeginner
Build a payload logging system to track prompt injection attempts and model responses.
18m
75
Basic Classifier Evasionbeginner
Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.
18m
76
Multimodal Input Testing Basicsbeginner
Introduction to testing multimodal LLMs with image and text inputs to understand cross-modal behavior.
16m
77
Introduction to Defense Testingbeginner
Learn to systematically test LLM application defenses by probing input filters, output validators, and guardrails.
17m
78
Setting Up Automated LLM Testingbeginner
Configure automated testing pipelines for LLM red teaming using Python scripts, test runners, and result collection.
17m
79
Red Team Report Writing Basicsbeginner
Practice writing clear, actionable red team findings reports with evidence, risk ratings, and remediation guidance.
17m
80
Evidence Collection for LLM Testingbeginner
Set up evidence collection workflows including request/response logging, screenshots, and reproducibility documentation.
17m
81
Vulnerability Scoring Fundamentalsbeginner
Learn vulnerability scoring frameworks adapted for LLM systems including severity, exploitability, and impact assessment.
16m
82
Comparing Red Team Testing Toolsbeginner
Compare Garak, PyRIT, and Promptfoo capabilities through hands-on exercises using each tool against the same target.
17m
83
Testing Environment Hardeningbeginner
Secure your AI red team testing environment to prevent credential leaks, accidental production testing, and data exposure.
16m
84
System Prompt Enumeration Techniquesbeginner
Practice systematic approaches to enumerating hidden system prompt content through varied probing strategies.
17m
85
Response Consistency Testingbeginner
Test LLM response consistency across repeated queries to identify non-deterministic safety boundary behavior.
16m
86
Jailbreak Technique Taxonomybeginner
Explore the major categories of jailbreak techniques and practice classifying attack payloads by technique type.
17m
87
Basic Model Fingerprintingbeginner
Learn to identify which LLM model powers an application through behavioral fingerprinting and response analysis.
16m
88
Prompt Template Vulnerability Testingbeginner
Test common prompt template patterns for injection vulnerabilities including string interpolation and placeholder abuse.
17m
89
Error Message Analysis for Reconbeginner
Analyze LLM application error messages to extract information about system architecture, models, and defensive layers.
17m
90
Rate Limit Enumeration and Bypassbeginner
Enumerate API rate limits and test common bypass techniques including header manipulation and request distribution.
16m
91
Conversation History Manipulationbeginner
Test how LLM applications handle conversation history including truncation, injection, and context window management.
17m
92
Detecting Output Filtersbeginner
Identify and characterize output filtering mechanisms in LLM applications through systematic response analysis.
16m
93
JSON Output Mode Security Testingbeginner
Test JSON output mode for injection vulnerabilities including schema manipulation and structured output exploitation.
17m
94
Analyzing Model Refusal Patternsbeginner
Systematically map model refusal patterns to understand safety training boundaries and identify potential bypasses.
16m
95
Social Engineering LLM Applicationsbeginner
Practice social engineering techniques adapted for LLM applications including authority escalation and urgency injection.
17m
96
Hallucination Detection Basicsbeginner
Learn to detect and trigger hallucinations in LLM outputs including factual errors, fabricated citations, and invented APIs.
16m
97
Content Policy Boundary Mappingbeginner
Map the content policy boundaries of LLM applications to identify permitted and restricted topic areas systematically.
17m
98
Crafting Basic Adversarial Examplesbeginner
Create basic adversarial examples that cause LLMs to misclassify, misinterpret, or bypass safety checks on text input.
17m
99
API Authentication Security Testingbeginner
Test LLM API authentication mechanisms for vulnerabilities including token leakage, replay attacks, and scope bypass.
17m
100
Simple Payload Encoding Techniquesbeginner
Practice encoding injection payloads using Base64, hex, URL encoding, and Unicode to bypass basic input filters.
17m
101
Designing LLM Red Team Test Casesbeginner
Design effective red team test cases with clear objectives, success criteria, and reproducible execution procedures.
17m
102
Local Model Setup for Testingbeginner
Set up local LLM instances using Ollama for safe, cost-free red team testing without API costs or rate limits.
17m
103
Running Safety Benchmarksbeginner
Run standardized safety benchmarks against LLM models to establish baseline safety profiles for comparison.
16m
104
Injection Attempt Log Analysisbeginner
Build a logging pipeline to capture and analyze injection attempts for pattern discovery and technique refinement.
16m
105
Temperature and Sampling Security Effectsbeginner
Explore how temperature and sampling parameters affect model susceptibility to jailbreaks and safety boundary enforcement.
17m
106
Testing Prompt Leaking Defensesbeginner
Test various prompt leaking defense configurations to evaluate their effectiveness against extraction techniques.
17m
107
Basic RAG System Security Testingbeginner
Introduction to testing Retrieval-Augmented Generation systems for document injection and retrieval manipulation vulnerabilities.
17m
108
Prompt Leaking via Summarization Requestsbeginner
Use summarization requests to trick an LLM into revealing its hidden system prompt through indirect disclosure.
15m
109
Analyzing Refusal Messages for Intelbeginner
Extract useful information about model configuration and guardrails by systematically analyzing refusal messages.
15m
110
Character Encoding Bypass Techniquesbeginner
Bypass input filters using Unicode normalization, homoglyph substitution, and mixed-script encoding.
15m
111
Prompt Injection via Translationbeginner
Exploit LLM translation capabilities to smuggle instructions through language boundaries.
15m
112
Completion Hijacking Fundamentalsbeginner
Craft partial sentences that steer model completions toward attacker-desired outputs.
15m
113
Markdown Rendering Exfiltrationbeginner
Exploit Markdown rendering in chat UIs to exfiltrate data through image tags and links.
15m
114
Your First LLM Guard Scanbeginner
Install and configure LLM Guard to scan inputs and outputs, understanding detection capabilities and gaps.
15m
115
Temperature and Top-K Effects on Safetybeginner
Systematically test how temperature, top-k, and top-p parameters affect safety guardrail effectiveness.
15m
116
System Prompt Reconstruction from Cluesbeginner
Piece together a hidden system prompt by collecting partial information from multiple probe techniques.
15m
117
Error Message Exploitationbeginner
Trigger and analyze error messages from LLM applications to reveal architecture and configuration details.
15m
118
API Response Header Analysisbeginner
Analyze HTTP response headers from LLM APIs to fingerprint providers, versions, and middleware.
15m
119
Basic Rate Limit Abuse Patternsbeginner
Test common rate-limit bypass patterns including header manipulation and endpoint discovery.
15m
120
Simple Output Constraint Attacksbeginner
Force models to output in constrained formats that bypass output safety filters.
15m
121
Conversation Reset Attacksbeginner
Exploit conversation resets and context clearing to weaken model adherence to safety instructions.
15m
122
Basic RAG Query Injectionbeginner
Craft user queries that manipulate RAG retrieval to surface unintended documents.
15m
123
Your First Inspect AI Evaluationbeginner
Set up and run a basic AI safety evaluation using the UK AISI Inspect framework.
15m
124
JSON Injection Basicsbeginner
Inject adversarial content through JSON-formatted inputs to exploit structured data processing.
15m
125
XML Injection in LLM Contextsbeginner
Exploit XML tag handling in LLM applications to manipulate instruction parsing.
15m
126
Introduction to NeMo Guardrailsbeginner
Set up NVIDIA NeMo Guardrails and create your first Colang-based guardrail for an LLM application.
15m
127
Model Fingerprinting Basicsbeginner
Identify which LLM model powers an application through behavioral fingerprinting techniques.
15m
128
Hello World Prompt Injectionbeginner
Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.
18m
129
Safety Boundary Mapping Exercisebeginner
Systematically map the safety boundaries of an LLM application across multiple topic categories.
15m
130
Multi-Provider API Explorationbeginner
Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.
18m
131
Prompt Injection via File Namesbeginner
Embed prompt injection payloads in filenames and metadata of uploaded documents.
15m
132
Response Timing Side-Channel Analysisbeginner
Use response timing differences to infer information about model processing and guardrail activation.
15m
133
Safety Boundary Mappingbeginner
Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.
18m
134
Basic Payload Mutation Techniquesbeginner
Learn fundamental payload mutation strategies including synonym replacement and structural transforms.
15m
135
Response Analysis Fundamentalsbeginner
Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.
18m
136
Chatbot Persona and Capability Mappingbeginner
Map the configured persona, capabilities, and restrictions of an LLM chatbot through systematic probing.
15m
137
LLM Playground Security Testingbeginner
Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.
18m
138
API Key Scope and Permission Testingbeginner
Test API key scoping and permission boundaries to identify over-privileged access configurations.
15m
139
Basic Indirect Prompt Injectionbeginner
Plant and trigger a basic indirect prompt injection payload in content consumed by an LLM.
15m
140
Model Security Comparison Labbeginner
Compare the security posture of different LLM models by running identical test suites across providers.
18m
141
Emoji and Unicode Injection Techniquesbeginner
Use emoji sequences and Unicode special characters to bypass text-based input filters.
15m
142
JSON Output Exploitation Basicsbeginner
Exploit LLM JSON mode to bypass content filters and extract information through structured output channels.
18m
143
Your First HarmBench Evaluationbeginner
Run a standardized safety evaluation using the HarmBench framework against a target model.
15m
144
Conversation History Analysisbeginner
Analyze how conversation history influences model behavior and identify opportunities for context manipulation.
18m
145
System Prompt Extraction via Error Injectionbeginner
Trigger specific error conditions that cause the model to inadvertently reveal system prompt contents.
15m
146
Error Message Intelligence Gatheringbeginner
Extract system architecture information from error messages and response patterns in LLM applications.
18m
147
Basic Automated Testing Setupbeginner
Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.
18m
148
Embedding Basics for Securitybeginner
Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.
18m
149
Safety Training Boundary Probingbeginner
Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.
18m
150
Output Format Control Labbeginner
Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.
18m
151
Basic Defense Mechanism Testingbeginner
Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.
18m
152
Prompt Structure Analysis Labbeginner
Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.
18m
153
Rate Limit and Quota Mappingbeginner
Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.
18m
154
Security Finding Documentation Exercisebeginner
Practice documenting security findings in a professional format with reproducible steps and impact assessment.
18m
155
Red Team Tool Installation and Configurationbeginner
Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.
18m

Start Learning

Getting Started with AI Red Teaming Labs

Learning Objectives

Prerequisites & Setup

Background Context

Step-by-Step Exercises

Expected Outputs

Troubleshooting

Knowledge Check

Learning Path

Related articles

Getting Started with AI Red Teaming Labs

Learning Objectives

Prerequisites & Setup

Background Context

Step-by-Step Exercises

Expected Outputs

Troubleshooting

Knowledge Check

Learning Path

Related articles