# lab
標記為「lab」的 463 篇文章
Lab: Systematically Bypassing Guardrails
Hands-on lab for methodically probing, classifying, and bypassing input/output guardrails in production AI systems using a structured red team workflow.
Lab: Tool Comparison — Same Target, 4 Tools
Hands-on lab comparing Garak, PyRIT, promptfoo, and Inspect AI against the same target model. Evaluate coverage, speed, finding quality, and practical trade-offs of each tool.
Lab: Exploring Embedding Spaces
Hands-on lab using Python to visualize embedding spaces, measure semantic similarity, and demonstrate how adversarial documents can be crafted to match target queries.
Lab: Mapping an AI System's Attack Surface
Hands-on lab walking through reconnaissance of an AI system — identifying components, mapping data flows, enumerating tools, and documenting the attack surface.
Lab: Simulated Robot Control Exploitation
Hands-on lab exercises exploiting LLM-controlled robots in simulation: environment setup, injection attacks, safety bypass testing, and multi-step exploitation chains using PyBullet.
Advanced RAG Manipulation Lab
Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.
Adversarial Robustness Evaluation
Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.
Lab: Generating Adversarial Suffixes
Implement the Greedy Coordinate Gradient (GCG) algorithm to generate adversarial suffixes that cause language models to comply with harmful requests by appending optimized token sequences.
Lab: Attack Chain Construction
Hands-on lab for chaining three or more distinct vulnerabilities into a complete exploit sequence that achieves objectives impossible with any single technique alone.
Lab: Audio Adversarial Examples
Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
AutoDAN Implementation Lab
Implement the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.
Lab: Backdoor Detection in Fine-Tuned Models
Analyze a fine-tuned language model to find and characterize an inserted backdoor, using behavioral probing, activation analysis, and statistical testing techniques.
Lab: Continuous Automated Red Teaming Pipeline
Build an end-to-end CART pipeline that continuously generates, executes, and scores adversarial attacks against LLM applications, with alerting and trend tracking.
Lab: Cloud AI Security Assessment
Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.
Lab: Cloud AI Assessment
Hands-on lab for conducting an end-to-end security assessment of a cloud-deployed AI system including infrastructure review, API testing, model security evaluation, and data flow analysis.
Lab: Containerized Model Breakout
Explore techniques for escaping from containerized AI applications to the host system, testing container isolation boundaries in ML deployment environments.
Lab: Cross-Model Transfer Attacks
Test whether jailbreaks discovered on one language model transfer effectively to others, building a systematic methodology for cross-model vulnerability research.
Cross-Plugin Exploitation Chain
Chain exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.
Custom Safety Classifier Training
Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.
Lab: Building a Production Red Team Harness
Build a full-featured, production-quality red team harness with multi-model support, async testing, structured result storage, and HTML reporting.
Lab: Custom Test Harness for Specific Applications
Build a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.
Defense vs Red Team War Game
Alternate between attacking and defending an LLM application to develop skills in both offensive and defensive operations.
Lab: Ensemble Attacks
Use multiple language models collaboratively to discover attack strategies that bypass any single model's defenses, leveraging model diversity for more effective red teaming.
Lab: Federated Learning Poisoning Attack
Hands-on lab for understanding and simulating poisoning attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted gradient updates.
Lab: Inserting a Fine-Tuning Backdoor
Advanced lab demonstrating how fine-tuning can insert hidden backdoors into language models that activate on specific trigger phrases while maintaining normal behavior otherwise.
Lab: Build an AI Fuzzing Harness
Create a systematic fuzzing framework for testing LLM boundaries, generating and mutating inputs to discover unexpected model behaviors and safety edge cases.
Lab: GPU Side-Channel Attacks
Demonstrate information leakage through GPU memory residuals and timing side channels, showing how shared GPU infrastructure can expose sensitive model data.
Gradient-Guided Data Poisoning
Use gradient information from open-source models to craft optimally poisoned training examples.
Lab: Chaining Guardrail Bypasses
Advanced lab on identifying, isolating, and chaining multiple guardrail bypass techniques to defeat layered defense systems in production LLM applications.
Lab: AI Incident Investigation
Investigate logs and artifacts from a compromised AI system to reconstruct the attack chain, identify the vulnerability exploited, and determine the scope of the breach.
Lab: Inference Server Exploitation
Attack vLLM, TGI, and Triton inference servers to discover information disclosure vulnerabilities, denial-of-service vectors, and configuration weaknesses in model serving infrastructure.
Lab: MCP Full Exploitation
Hands-on lab for conducting a complete Model Context Protocol server compromise including tool enumeration, permission escalation, cross-tool attacks, and data exfiltration through MCP channels.
Lab: Model Security Comparison
Systematically compare the safety posture of major language models using a standardized test suite, building quantitative security profiles for GPT-4, Claude, and Gemini.
Model Distillation Security Lab
Extract model capabilities through distillation techniques using only black-box API access.
Lab: Model Serving Framework Attacks
Exploit vulnerabilities in TensorFlow Serving, TorchServe, and Triton Inference Server, targeting model loading, API endpoints, and management interfaces.
Lab: Multi-Model Comparative Red Teaming
Test the same attack suite across GPT-4, Claude, Llama, and Gemini. Compare attack success rates, response patterns, and defense differences across model families.
Multi-Objective Attack Optimization
Optimize attack payloads for multiple simultaneous objectives: jailbreaking, data extraction, and defense evasion.
Lab: Advanced Multi-Turn Attacks
Hands-on lab for executing 20+ turn conversation steering strategies that gradually erode AI safety boundaries to extract protected information through patient, escalating dialogue.
Multimodal Attack Chain Lab
Chain attacks across text, image, and structured data modalities to exploit multimodal system vulnerabilities.
Lab: Multimodal Attack Pipeline
Build an automated multimodal attack pipeline that generates adversarial images, combines them with text prompts, and tests against vision-language models (VLMs).
Lab: Implementing PAIR Attack
Implement the PAIR (Prompt Automatic Iterative Refinement) algorithm where an attacker LLM iteratively refines jailbreak prompts against a target LLM until a successful attack is found.
Lab: PAIR Attack Implementation
Build a complete Prompt Automatic Iterative Refinement system that uses an attacker LLM to automatically generate and refine jailbreak prompts against a target model.
Production Environment Simulation Lab
Test attacks against a simulated production environment with realistic logging, monitoring, and alerting.
Lab: Purple Team Exercise
Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.
Lab: Purple Team Exercise (Advanced Lab)
Hands-on lab for conducting simultaneous attack and defense operations against an AI system with real-time metrics tracking, adaptive defense deployment, and coordinated red-blue team workflows.
Lab: RAG Full Chain Attack
Hands-on lab for executing a complete RAG attack chain from document injection through retrieval manipulation to data exfiltration, targeting every stage of the Retrieval-Augmented Generation pipeline.
Lab: Reasoning Model Exploitation
Attack reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection techniques.
Lab: Red Team Orchestration
Build an orchestration system that coordinates multiple attack strategies simultaneously, managing parallel attack campaigns and synthesizing results into comprehensive risk assessments.
Lab: Safety Regression Testing at Scale
Build automated pipelines that detect safety degradation across model versions, ensuring that updates and fine-tuning do not introduce new vulnerabilities or weaken existing protections.
Representation Probing for Vulnerabilities
Probe internal model representations to identify exploitable features and develop representation-level attacks.
Reward Model Analysis Lab
Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.
Lab: Model Routing Exploitation
Attack systems that route requests to different models based on complexity or content, exploiting routing logic to reach less-defended models or bypass safety filters.
Lab: TAP Algorithm Implementation
Implement the TAP (Tree of Attacks with Pruning) algorithm that uses tree-based search over attack prompts with branch pruning to efficiently find jailbreaks.
Lab: Training Data Extraction at Scale
Extract memorized training data from language models using prefix-based extraction, divergence testing, and membership inference. Measure extraction rates and assess privacy risks.
Lab: Transfer Attack Development
Hands-on lab for crafting adversarial prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.
Lab: Transfer Attack Development (Advanced Lab)
Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.
Lab: Anthropic Claude API Basics
Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.
Multi-Provider API Exploration
Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.
Lab: API Key Security
Learn common API key exposure vectors, secure key management with .env files, detect keys in git history, implement key rotation, and apply least-privilege principles.
Lab: API-Based Model Testing
Learn to test language models through their APIs including OpenAI, Anthropic, and local Ollama endpoints. Build reusable API testing functions with proper error handling.
Basic Automated Testing Setup
Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.
Lab: Build Your First Defense
Hands-on lab for building a basic AI input defense system using keyword matching, regex filters, and simple classifiers to block prompt injection attempts.
Basic Defense Mechanism Testing
Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.
Lab: Build Your First Defense (Beginner Lab)
Create a simple input filter that blocks common prompt injection patterns, then test it against the attack techniques you have learned in previous labs.
Security Finding Documentation Exercise
Practice documenting security findings in a professional format with reproducible steps and impact assessment.
Lab: Model Comparison
Test the same attack techniques against different language models and compare their safety behaviors, refusal patterns, and vulnerability profiles.
Lab: Compare Model Safety
Hands-on lab for running identical safety tests against GPT-4, Claude, Gemini, and Llama to compare how different models handle prompt injection, jailbreaks, and safety boundary enforcement.
Lab: Context Manipulation
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
Lab: Context Overflow Attacks
Explore context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.
Conversation History Analysis
Analyze how conversation history influences model behavior and identify opportunities for context manipulation.
Lab: Defense Bypass Basics
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
Lab: Basic Defense Evasion
Encounter and bypass basic LLM defenses including keyword filters, content classifiers, and input preprocessing through encoding tricks, synonym substitution, and payload splitting.
Lab: Delimiter Injection Attacks
Craft payloads that exploit XML, markdown, and special token delimiters to break instruction boundaries and manipulate how LLMs parse system prompts versus user input.
Lab: Delimiter Escape Attacks
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
Embedding Basics for Security
Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.
Lab: Encoding & Obfuscation Techniques
Hands-on lab exploring how text encoding, character substitution, and obfuscation methods can bypass LLM content filters and safety mechanisms.
Lab: Setting Up Your Red Team Environment
Step-by-step guide to setting up a complete AI red teaming environment with Python, API clients, scanning tools, and local models.
Error Message Intelligence Gathering
Extract system architecture information from error messages and response patterns in LLM applications.
Lab: Ethical Red Teaming
Practice responsible AI red teaming with proper documentation, scope management, and ethical decision-making frameworks.
Lab: Ethical Red Teaming (Beginner Lab)
Hands-on lab for practicing responsible AI red teaming with proper documentation, scope management, ethical boundaries, and disclosure procedures.
Lab: Few-Shot Manipulation Attacks
Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired adversarial behavior through fabricated conversation examples.
Lab: Your First Prompt Injection
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
Lab: Your First Jailbreak
Try basic jailbreak techniques against a local model using Ollama, learning the difference between prompt injection and jailbreaking through hands-on experimentation.
Lab: Scanning with Garak
Install and run the Garak LLM vulnerability scanner against local models. Learn to configure probes, generators, and evaluators for comprehensive scanning.
Lab: Garak Setup and First Scan
Install and configure garak, the LLM vulnerability scanner, then run your first automated scan against a local model to discover common vulnerabilities.
Hello World Prompt Injection
Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.
Lab: Injection Detection Tool
Build a basic prompt injection detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
Lab: Injection Techniques Survey
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
Lab: Instruction Following Priority
Test how language models prioritize conflicting instructions from system prompts, user messages, and embedded directives to understand the instruction hierarchy.
Lab: Basic Jailbreak Techniques
Hands-on exploration of jailbreak techniques including role-play, DAN-style prompts, and academic framing against multiple models.
JSON Output Exploitation Basics
Exploit LLM JSON mode to bypass content filters and extract information through structured output channels.
Lab: Markdown Injection
Inject images, links, and formatting into LLM responses that exfiltrate data or alter display rendering in chat interfaces.
Model Security Comparison Lab
Compare the security posture of different LLM models by running identical test suites across providers.
Lab: Multi-Language Injection
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
Lab: Multi-Turn Escalation Attacks
Build trust over several conversation messages, then exploit established rapport to bypass safety mechanisms through gradual, multi-turn escalation techniques.
Lab: Setting Up Ollama for Local LLM Testing
Install and configure Ollama for local LLM red teaming, download models, perform basic interactions, and compare safety behavior between local and API-hosted models.
Output Format Control Lab
Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.
Lab: Output Format Exploitation
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
Lab: Output Format Manipulation
Hands-on lab on forcing LLMs to output in specific formats including JSON injection, markdown manipulation, and structured output exploitation for data exfiltration.
Lab: Output Steering
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
Lab: Payload Crafting
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
Lab: Encoding Bypasses
Test Base64, ROT13, hex, and Unicode encoding techniques to evade LLM content filters while preserving the semantic meaning of adversarial payloads.
LLM Playground Security Testing
Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.
Lab: Prompt Leaking Basics
Learn to extract system prompts from chatbots using common leaking techniques including direct requests, indirect elicitation, and format manipulation.
Prompt Structure Analysis Lab
Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.
Lab: Promptfoo Setup and First Eval
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
Lab: PyRIT Setup and First Attack
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated red teaming, then run your first orchestrated attack against a local model.
Rate Limit and Quota Mapping
Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.
Response Analysis Fundamentals
Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.
Lab: Analyzing LLM Responses
Learn to identify safety refusals, partial compliance, inconsistent behavior, and response fingerprinting techniques when analyzing LLM outputs during red team testing.
Lab: Role-Play Attacks
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
Lab: Role-Play & Persona Attacks
Hands-on lab exploring role-play and persona-based jailbreak techniques, where you craft scenarios that trick LLMs into adopting alternative identities to bypass safety guardrails.
Lab: Mapping Safety Boundaries
Systematically discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.
Safety Boundary Mapping
Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.
Safety Training Boundary Probing
Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.
Lab: Building a Simple Test Harness
Build a reusable Python test harness that automates sending test prompts, recording results, and calculating attack success metrics.
Lab: Structured Output Manipulation
Explore how JSON mode, schema injection, and forced output formats can bypass LLM safety filters and manipulate structured responses.
Lab: System Prompt Extraction
Hands-on techniques for extracting hidden system prompts from LLM-powered applications using direct asking, role-play, instruction replay, and other methods.
Lab: System Prompt Override
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
Lab: System Prompt Reconstruction
Use various techniques to systematically extract and reconstruct the full system prompt of an LLM application, combining direct, indirect, and incremental extraction methods.
Red Team Tool Installation and Configuration
Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.
Expert AI Red Team Labs
Advanced labs tackling cutting-edge AI security challenges including quantization exploits, reward hacking, agent exploitation, multi-agent attacks, and watermark removal.
Activation Steering for Exploitation
Use activation steering techniques to manipulate model behavior at the representation level without prompt modification.
Lab: Adversarial ML From Scratch
Hands-on expert lab for implementing gradient-based adversarial attacks against language models from scratch without frameworks, building intuition for how adversarial perturbations exploit model gradients.
Lab: Adversarial Suffix Optimization
Implement GCG-style adversarial suffix attacks that automatically discover token sequences causing language models to comply with harmful requests. Covers gradient-based optimization, transferability analysis, and defense evaluation.
Lab: Alignment Stress Testing
Push language model alignment to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure alignment degradation curves, and map the boundaries of model compliance.
Lab: Build Agent Security Scanner
Build an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific attack surfaces that traditional LLM testing misses.
Lab: Build an AI Fuzzer
Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.
Lab: Build Behavior Diff Tool
Build a tool that systematically compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.
Lab: Build Guardrail Evaluator
Build an automated framework for evaluating AI guardrails and safety filters. Test input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.
Lab: Build Jailbreak Automation
Build an automated jailbreak testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.
Cross-Architecture Transfer Lab
Develop attacks on one model architecture that reliably transfer to fundamentally different architectures.
Lab: Defense Engineering Lab
Expert-level lab for building and evaluating a production-grade multi-layer AI defense system including input classifiers, output monitors, semantic analysis, and adversarial robustness testing.
Emergent Behavior Exploitation
Identify and exploit emergent behaviors in frontier models that arise from scale and are not present in smaller models.
Lab: Emergent Capability Probing
Systematically test large language models for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. Build a structured probing framework for capability discovery.
Formal Verification Limits Lab
Explore the limits of formal verification for neural network behavior and identify gaps exploitable by adversaries.
Lab: Full-Stack AI Exploitation
Chain vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. Learn to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.
Lab: Computer Use Agent Exploitation
Hands-on lab exploring attack vectors against GUI-based computer use agents that interact with screens through vision and action primitives.
Lab: Deploy Honeypot AI
Build and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. Learn honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.
Interpretability-Guided Attack Design
Use mechanistic interpretability to identify exploitable circuits and design targeted attacks.
MoE Routing Exploitation Lab
Exploit Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.
Lab: Multi-Agent Attack Coordination
Hands-on lab building and deploying coordinated multi-agent attack systems that divide red teaming tasks across specialized attacking agents.
Neural Backdoor Engineering
Engineer sophisticated neural backdoors that activate on specific trigger patterns while evading detection methods.
Lab: Novel Jailbreak Research
Systematic methodology for discovering new jailbreak techniques against large language models. Learn to identify unexplored attack surfaces, develop novel attack vectors, and validate findings with scientific rigor.
Lab: ML Pipeline Poisoning
Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. Learn to identify and exploit weaknesses across the full ML lifecycle.
Lab: Exploiting Quantized Model Weaknesses
Hands-on lab exploring how model quantization degrades safety alignment, with techniques to find and exploit precision-related vulnerabilities.
Lab: Model Registry Compromise
Explore techniques for compromising model registries and substituting malicious models into production pipelines. Learn to detect model tampering, verify model provenance, and secure the model supply chain.
Lab: Vulnerability Research Methodology
Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, attack surface mapping, experimental design, validation protocols, and responsible disclosure.
Lab: RLHF Reward Hacking
Hands-on lab exploring how to game reward models used in RLHF alignment, exposing gaps between reward proxy signals and actual safety objectives.
Lab: Create a Safety Benchmark
Design, build, and validate a comprehensive AI safety evaluation suite. Learn benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.
End-to-End Training Time Attacks
Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.
Lab: AI Watermark Detection & Removal
Hands-on lab exploring techniques for detecting and removing statistical watermarks embedded in AI-generated text, and evaluating watermark robustness.
Advanced Encoding Chain Attacks
Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
Lab: Advanced RAG Poisoning
Hands-on lab for crafting documents that reliably get retrieved and influence RAG responses for specific target queries.
Lab: Exploiting AI Agents
Hands-on lab for exploiting AI agents with tool access through indirect prompt injection, tool-call chaining, and privilege escalation techniques.
Lab: Agent Memory Manipulation
Hands-on lab for injecting persistent instructions into an agent's memory and context that affect future interactions and conversations.
Lab: Agent Prompt Leaking
Hands-on lab for extracting system prompts from tool-using AI agents by exploiting tool interactions, observation channels, and multi-step reasoning to leak protected instructions.
API Abuse Pattern Lab
Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.
Lab: Automated Red Team Pipeline
Hands-on lab for building a continuous AI red team testing pipeline using promptfoo, GitHub Actions, and automated attack generation to catch safety regressions before deployment.
Lab: Automated Red Team Testing
Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.
Lab: Azure Content Filter Evasion
Hands-on lab for mapping and testing Azure OpenAI Service content filtering categories, severity levels, and bypass techniques.
Lab: AWS Bedrock Guardrails Testing
Hands-on lab for systematically testing and bypassing AWS Bedrock's built-in guardrails including content filters, denied topics, and word filters.
Lab: Chunking Exploitation
Hands-on lab for crafting documents that split across chunks in ways that hide malicious content from chunk-level filtering while maintaining attack effectiveness.
Lab: Citation Fabrication
Hands-on lab for getting RAG systems to cite documents that don't exist or misattribute quotes to legitimate sources.
ML Classifier Evasion Lab
Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.
Lab: Context Window Overflow Attacks
Hands-on lab exploring how overflowing a model's context window with padding content can push safety instructions out of the attention window and enable injection attacks.
Context Window Poisoning Lab
Exploit context window management to inject persistent adversarial content that influences future model responses.
Lab: Context Window Stuffing Attacks
Hands-on lab demonstrating how oversized inputs can overwhelm an LLM's context window to dilute safety instructions, push system prompts out of the attention window, or cause instruction amnesia.
Conversation Splitting Attack Lab
Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.
Lab: Data Exfiltration Channels
Hands-on lab for extracting data from AI systems through markdown image rendering, invisible links, tool call parameters, and other covert exfiltration channels.
Lab: Data Exfiltration Channels (Intermediate Lab)
Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.
Lab: Data Exfiltration Techniques
Hands-on lab for extracting sensitive data from AI systems including system prompt extraction, context leakage via markdown rendering, and URL-based data exfiltration.
Lab: Systematic Defense Bypass
Hands-on lab for methodically identifying, categorizing, and bypassing common LLM defense mechanisms including input filters, output filters, and system prompt hardening.
Lab: Defense Effectiveness Measurement
Hands-on lab for quantifying AI guardrail robustness using attack success rates, evasion metrics, false positive rates, and statistical analysis of defense performance.
Lab: Defense Effectiveness Testing
Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.
Document-Based RAG Injection Lab
Inject adversarial content into documents that will be processed by a RAG system to influence model responses.
Practical Embedding Manipulation
Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.
Lab: Embedding Space Manipulation
Hands-on lab for crafting documents optimized to be retrieved for specific queries through embedding collision attacks using sentence-transformers.
Lab: Encoding Bypass Lab
Hands-on lab for systematically testing Base64, ROT13, Unicode homoglyph, hexadecimal, and combined encoding bypasses against LLM safety filters with measured success rates.
Lab: Encoding Evasion Techniques
Hands-on lab using Base64, ROT13, Unicode normalization, and custom encoding schemes to evade input filters and safety classifiers in language model systems.
Lab: Few-Shot Example Injection
Hands-on lab exploring how injected few-shot examples can steer language model outputs toward attacker-chosen behaviors by exploiting in-context learning.
Fine-Tuning API Security Probing
Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.
Lab: Function Calling & Tool Use Abuse
Hands-on lab exploring how attackers can manipulate LLM function calling and tool use to execute unauthorized actions, exfiltrate data, and chain tool calls for maximum impact.
Lab: Function Calling Injection
Hands-on lab for exploiting function calling mechanisms by crafting inputs that manipulate which functions get called and with what parameters.
Guardrail Rule Extraction Lab
Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.
Lab: Image-Based Prompt Injection
Hands-on lab exploring how text instructions embedded in images can be used to perform prompt injection against vision-language models (VLMs) that process visual input.
Lab: Indirect Prompt Injection
Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.
Lab: Indirect Prompt Injection Chains
Hands-on lab for setting up indirect prompt injection scenarios through web pages, emails, and documents, testing multi-hop injection chains against AI systems.
Lab: JSON Input Injection
Hands-on lab exploring how adversarial payloads injected through structured JSON inputs can manipulate language model behavior, bypass schema validation, and exploit parsing inconsistencies.
Lab: Building an LLM Judge Evaluator
Hands-on lab for building an LLM-based evaluator to score red team attack outputs, compare model vulnerability, and lay the foundation for automated attack campaigns.
Lab: Markdown-Based Injection
Hands-on lab exploring how Markdown rendering in AI-generated outputs can be exploited to inject hidden content, exfiltrate data through image tags, and manipulate displayed information.
Lab: MCP Tool Abuse Scenarios
Hands-on lab for exploiting Model Context Protocol tool definitions through malicious tool descriptions, tool confusion attacks, and server impersonation.
Lab: MCP Server Exploitation
Hands-on lab for attacking a vulnerable MCP server, demonstrating tool shadowing, consent phishing, and server impersonation techniques.
Lab: Agent Memory Poisoning
Hands-on lab exploring how conversational memory in AI agents can be poisoned to alter future behavior, inject persistent instructions, and exfiltrate data across sessions.
Lab: RAG Metadata Injection
Hands-on lab for exploiting metadata fields like titles, descriptions, and timestamps to manipulate RAG retrieval ranking and influence responses.
Lab: Basic Model Extraction
Hands-on lab for API-based model extraction attacks, querying a target model to approximate its behavior, measuring fidelity, and understanding query budgets.
Practical Model Fingerprinting Lab
Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.
Multi-Language Safety Bypass Lab
Exploit gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.
Multi-Model Pipeline Attack Lab
Attack a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.
Lab: Multi-Turn Progressive Escalation
Hands-on lab building multi-turn attack sequences that progressively escalate from benign queries to full safety bypass through gradual trust establishment and boundary pushing.
Lab: Multi-Turn Attack Campaigns
Hands-on lab for executing multi-turn crescendo attacks against LLMs, measuring safety degradation over conversation length, and building persistent attack campaigns.
Lab: Multimodal Injection
Hands-on lab for embedding text instructions in images to exploit vision-enabled LLMs. Learn to craft visual prompt injections, test OCR-based attacks, and evaluate multimodal safety boundaries.
Lab: Multimodal Injection (Intermediate Lab)
Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.
Output Channel Discovery Lab
Discover alternative output channels in LLM applications that bypass standard output filtering.
Lab: Agent Permission Escalation
Hands-on lab for tricking AI agents into performing actions beyond their intended scope through privilege escalation techniques.
Prompt Leakage Technique Lab
Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.
Lab: Regression Testing with promptfoo
Hands-on lab for setting up promptfoo to run automated regression tests against LLM applications, ensuring that safety properties hold across model updates and prompt changes.
Lab: Running PyRIT Campaigns
Hands-on lab for configuring and running automated red teaming campaigns using Microsoft's PyRIT framework against LLM applications.
Lab: RAG Pipeline Poisoning
Hands-on lab for setting up a RAG pipeline with LlamaIndex, injecting malicious documents, testing retrieval poisoning, and measuring injection success rates.
Reasoning Chain Hijacking
Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.
Lab: Recursive Injection Chains
Hands-on lab building self-reinforcing prompt injection chains where each model response amplifies the injection, creating feedback loops that progressively override safety controls.
Lab: Re-ranking Attacks
Hands-on lab for manipulating the re-ranking stage of RAG pipelines to promote or suppress specific documents in retrieval results.
Semantic Deception Crafting Lab
Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.
Session State Exploitation Lab
Exploit session management vulnerabilities in LLM applications to access other users data or escalate privileges.
Structured Output Safety Bypass
Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.
Lab: Supply Chain Audit
Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.
Lab: ML Supply Chain Scan
Hands-on lab for auditing machine learning model dependencies, detecting malicious packages in ML pipelines, and scanning model files for backdoors and supply chain threats.
Lab: Token Smuggling & Encoding Bypass
Hands-on lab exploring how tokenization boundaries, token-level manipulations, and encoding tricks can smuggle attack payloads past safety classifiers.
Lab: Tool Result Poisoning
Hands-on lab for poisoning tool outputs to redirect agent behavior by injecting malicious content through tool results.
Lab: Language Translation Bypass
Hands-on lab exploiting language translation capabilities to bypass safety filters by submitting prompts in low-resource languages, code-switching, and using transliteration tricks.
Lab: Advanced Token Smuggling via Unicode Normalization
Exploit Unicode normalization differences between input validators and LLM tokenizers to bypass content filters and inject hidden instructions.
Lab: Workflow Hijacking
Hands-on lab for redirecting multi-step agent workflows to perform unintended actions by manipulating intermediate states and control flow.
Lab: Crafting Audio Adversarial Examples
Hands-on lab creating adversarial audio examples using Python audio processing, targeting Whisper transcription with injected text.
Lab: Multi-Modal Attack Chain
Hands-on lab for building and executing a multi-step attack chain that combines image injection, document exploitation, and text-based techniques against a multimodal AI system.
Lab: Video Model Adversarial Attacks
Hands-on lab crafting adversarial video frames using frame-level perturbation with OpenCV and PyTorch for video model exploitation.
Lab: Crafting Image-Based Injections
Hands-on lab for creating image-based prompt injections, testing against VLMs, and measuring success rates across different injection techniques.
Red Team Lab & Operations
Operational foundations for AI red teaming: lab environments, evidence handling, engagement workflows, and team management for professional AI security assessments.
Lab: Attacking Federated Learning
Hands-on lab implementing model poisoning attacks in a simulated federated learning setup using the Flower framework: Byzantine attacks, model replacement, and measuring attack impact.
Lab: Exploiting Quantized Models
Hands-on lab comparing attack success rates across quantization levels: testing jailbreaks on FP16 vs INT8 vs INT4, measuring safety degradation, and crafting quantization-aware exploits.
Lab: Inserting a Fine-Tuning Backdoor (Training Pipeline)
Hands-on lab for creating, inserting, and detecting a trigger-based backdoor in a language model through fine-tuning, using LoRA adapters on a local model.
Lab: Poisoning a Training Dataset
Hands-on lab demonstrating dataset poisoning and fine-tuning to show behavioral change, with step-by-step Python code, backdoor trigger measurement, and troubleshooting guidance.
建立你的 AI 紅隊實驗室
使用開源模型、測試框架與真實目標應用程式設定本地 AI 紅隊演練實驗室的實務指南。
實驗室: Systematically Bypassing Guardrails
Hands-on lab for methodically probing, classifying, and bypassing input/output guardrails in production AI systems using a structured red team workflow.
實作:工具比較——同一目標,4 個工具
對同一目標模型比較 Garak、PyRIT、promptfoo 與 Inspect AI 之實作。評估各工具之覆蓋、速度、發現品質與實務取捨。
實作:探索 Embedding 空間
使用 Python 視覺化 embedding 空間、量測語意相似度,並展示如何打造與目標查詢匹配之對抗文件之實作。
實作:繪製 AI 系統的攻擊面
對 AI 系統進行偵察的實作課程——辨識元件、繪製資料流、枚舉工具,並撰寫攻擊面文件。
實驗室: Simulated Robot Control 利用ation
Hands-on lab exercises exploiting LLM-controlled robots in simulation: environment setup, injection attacks, safety bypass testing, and multi-step exploitation chains using PyBullet.
進階 RAG Manipulation 實驗室
Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.
Adversarial Robustness Evaluation
Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.
實驗室: Generating Adversarial Suffixes
Implement the Greedy Coordinate Gradient (GCG) algorithm to generate adversarial suffixes that cause language models to comply with harmful requests by appending optimized token sequences.
實驗室: 攻擊 Chain Construction
Hands-on lab for chaining three or more distinct vulnerabilities into a complete exploit sequence that achieves objectives impossible with any single technique alone.
實驗室: Audio Adversarial Examples
Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
AutoDAN Implementation 實驗室
Implement the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.
實驗室: Backdoor Detection in Fine-Tuned 模型s
Analyze a fine-tuned language model to find and characterize an inserted backdoor, using behavioral probing, activation analysis, and statistical testing techniques.
實驗室: Continuous Automated 紅隊演練 Pipeline
Build an end-to-end CART pipeline that continuously generates, executes, and scores adversarial attacks against LLM applications, with alerting and trend tracking.
實驗室: Cloud AI 安全 評量
Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.
實驗室: Cloud AI 評量
Hands-on lab for conducting an end-to-end security assessment of a cloud-deployed AI system including infrastructure review, API testing, model security evaluation, and data flow analysis.
實作:容器化模型突破
探索自容器化 AI 應用逃逸至主機系統之技術,測試 ML 部署環境中之容器隔離邊界。
實驗室: Cross-模型 Transfer 攻擊s
Test whether jailbreaks discovered on one language model transfer effectively to others, building a systematic methodology for cross-model vulnerability research.
Cross-Plugin 利用ation Chain
Chain exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.
Custom Safety Classifier 訓練
Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.
實驗室: Building a Production 紅隊 Harness
Build a full-featured, production-quality red team harness with multi-model support, async testing, structured result storage, and HTML reporting.
實驗室: Custom Test Harness for Specific Applications
Build a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.
防禦 vs 紅隊 War Game
Alternate between attacking and defending an LLM application to develop skills in both offensive and defensive operations.
實驗室: Ensemble 攻擊s
Use multiple language models collaboratively to discover attack strategies that bypass any single model's defenses, leveraging model diversity for more effective red teaming.
實驗室: Federated Learning 投毒 攻擊
Hands-on lab for understanding and simulating poisoning attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted gradient updates.
實驗室: Inserting a Fine-Tuning Backdoor
進階 lab demonstrating how fine-tuning can insert hidden backdoors into language models that activate on specific trigger phrases while maintaining normal behavior otherwise.
實驗室: Build an AI Fuzzing Harness
Create a systematic fuzzing framework for testing LLM boundaries, generating and mutating inputs to discover unexpected model behaviors and safety edge cases.
實驗室: GPU Side-Channel 攻擊s
Demonstrate information leakage through GPU memory residuals and timing side channels, showing how shared GPU infrastructure can expose sensitive model data.
Gradient-指南d Data 投毒
Use gradient information from open-source models to craft optimally poisoned training examples.
實驗室: Chaining Guardrail Bypasses
進階 lab on identifying, isolating, and chaining multiple guardrail bypass techniques to defeat layered defense systems in production LLM applications.
實驗室: AI Incident Investigation
Investigate logs and artifacts from a compromised AI system to reconstruct the attack chain, identify the vulnerability exploited, and determine the scope of the breach.
實驗室: Inference Server 利用ation
攻擊 vLLM, TGI, and Triton inference servers to discover information disclosure vulnerabilities, denial-of-service vectors, and configuration weaknesses in model serving infrastructure.
實驗室: MCP Full 利用ation
Hands-on lab for conducting a complete 模型 Context Protocol server compromise including tool enumeration, permission escalation, cross-tool attacks, and data exfiltration through MCP channels.
實驗室: 模型 安全 Comparison
Systematically compare the safety posture of major language models using a standardized test suite, building quantitative security profiles for GPT-4, Claude, and Gemini.
模型 Distillation 安全 實驗室
Extract model capabilities through distillation techniques using only black-box API access.
實驗室: 模型 Serving Framework 攻擊s
利用 vulnerabilities in TensorFlow Serving, TorchServe, and Triton Inference Server, targeting model loading, API endpoints, and management interfaces.
實驗室: Multi-模型 Comparative 紅隊演練
Test the same attack suite across GPT-4, Claude, Llama, and Gemini. Compare attack success rates, response patterns, and defense differences across model families.
Multi-Objective 攻擊 Optimization
Optimize attack payloads for multiple simultaneous objectives: jailbreaking, data extraction, and defense evasion.
實驗室: 進階 Multi-Turn 攻擊s
Hands-on lab for executing 20+ turn conversation steering strategies that gradually erode AI safety boundaries to extract protected information through patient, escalating dialogue.
Multimodal 攻擊 Chain 實驗室
Chain attacks across text, image, and structured data modalities to exploit multimodal system vulnerabilities.
實驗室: Multimodal 攻擊 Pipeline
Build an automated multimodal attack pipeline that generates adversarial images, combines them with text prompts, and tests against vision-language models (VLMs).
實驗室: Implementing PAIR 攻擊
Implement the PAIR (Prompt Automatic Iterative Refinement) algorithm where an attacker LLM iteratively refines jailbreak prompts against a target LLM until a successful attack is found.
實驗室: PAIR 攻擊 Implementation
Build a complete Prompt Automatic Iterative Refinement system that uses an attacker LLM to automatically generate and refine jailbreak prompts against a target model.
Production Environment Simulation 實驗室
Test attacks against a simulated production environment with realistic logging, monitoring, and alerting.
實驗室: Purple Team 練習
Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.
實驗室: Purple Team 練習 (進階 實驗室)
Hands-on lab for conducting simultaneous attack and defense operations against an AI system with real-time metrics tracking, adaptive defense deployment, and coordinated red-blue team workflows.
實驗室: RAG Full Chain 攻擊
Hands-on lab for executing a complete RAG attack chain from document injection through retrieval manipulation to data exfiltration, targeting every stage of the Retrieval-Augmented Generation pipeline.
實驗室: Reasoning 模型 利用ation
攻擊 reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection techniques.
實驗室: 紅隊 Orchestration
Build an orchestration system that coordinates multiple attack strategies simultaneously, managing parallel attack campaigns and synthesizing results into comprehensive risk assessments.
實驗室: Safety Regression Testing at Scale
Build automated pipelines that detect safety degradation across model versions, ensuring that updates and fine-tuning do not introduce new vulnerabilities or weaken existing protections.
Representation Probing for Vulnerabilities
Probe internal model representations to identify exploitable features and develop representation-level attacks.
Reward 模型 Analysis 實驗室
Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.
實驗室: 模型 Routing 利用ation
攻擊 systems that route requests to different models based on complexity or content, exploiting routing logic to reach less-defended models or bypass safety filters.
實驗室: TAP Algorithm Implementation
Implement the TAP (Tree of 攻擊s with Pruning) algorithm that uses tree-based search over attack prompts with branch pruning to efficiently find jailbreaks.
實驗室: 訓練 Data Extraction at Scale
Extract memorized training data from language models using prefix-based extraction, divergence testing, and membership inference. Measure extraction rates and assess privacy risks.
實驗室: Transfer 攻擊 Development
Hands-on lab for crafting adversarial prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.
實驗室: Transfer 攻擊 Development (進階 實驗室)
Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.
實驗室: Anthropic Claude API Basics
Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.
Multi-Provider API Exploration
Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.
實驗室: API Key 安全
Learn common API key exposure vectors, secure key management with .env files, detect keys in git history, implement key rotation, and apply least-privilege principles.
實驗室: API-Based 模型 Testing
Learn to test language models through their APIs including OpenAI, Anthropic, and local Ollama endpoints. Build reusable API testing functions with proper error handling.
Basic Automated Testing Setup
Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.
實驗室: Build Your First 防禦
Hands-on lab for building a basic AI input defense system using keyword matching, regex filters, and simple classifiers to block prompt injection attempts.
Basic 防禦 Mechanism Testing
Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.
實驗室: Build Your First 防禦 (初階 實驗室)
Create a simple input filter that blocks common prompt injection patterns, then test it against the attack techniques you have learned in previous labs.
安全 Finding Documentation 練習
Practice documenting security findings in a professional format with reproducible steps and impact assessment.
實驗室: 模型 Comparison
Test the same attack techniques against different language models and compare their safety behaviors, refusal patterns, and vulnerability profiles.
實驗室: Compare 模型 Safety
Hands-on lab for running identical safety tests against GPT-4, Claude, Gemini, and Llama to compare how different models handle prompt injection, jailbreaks, and safety boundary enforcement.
實驗室: Context Manipulation
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
實驗室: Context Overflow 攻擊s
Explore context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.
Conversation History Analysis
Analyze how conversation history influences model behavior and identify opportunities for context manipulation.
實驗室: 防禦 Bypass Basics
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
實驗室: Basic 防禦 Evasion
Encounter and bypass basic LLM defenses including keyword filters, content classifiers, and input preprocessing through encoding tricks, synonym substitution, and payload splitting.
實驗室: Delimiter Injection 攻擊s
Craft payloads that exploit XML, markdown, and special token delimiters to break instruction boundaries and manipulate how LLMs parse system prompts versus user input.
實驗室: Delimiter Escape 攻擊s
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
Embedding Basics for 安全
Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.
實驗室: Encoding & Obfuscation Techniques
Hands-on lab exploring how text encoding, character substitution, and obfuscation methods can bypass LLM content filters and safety mechanisms.
實作:設置你的紅隊環境
設置完整 AI 紅隊環境之逐步指南——Python、API 客戶端、掃描工具與本地模型。
Error Message Intelligence Gathering
Extract system architecture information from error messages and response patterns in LLM applications.
實驗室: Ethical 紅隊演練
Practice responsible AI red teaming with proper documentation, scope management, and ethical decision-making frameworks.
實驗室: Ethical 紅隊演練 (初階 實驗室)
Hands-on lab for practicing responsible AI red teaming with proper documentation, scope management, ethical boundaries, and disclosure procedures.
實驗室: Few-Shot Manipulation 攻擊s
Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired adversarial behavior through fabricated conversation examples.
實驗室: Your First 提示詞注入
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
實驗室: Your First 越獄
Try basic jailbreak techniques against a local model using Ollama, learning the difference between prompt injection and jailbreaking through hands-on experimentation.
實作:以 Garak 掃描
安裝並對本地模型執行 Garak LLM 漏洞掃描器。學習組態 probe、generator 與 evaluator 以進行完整掃描。
實驗室: Garak Setup and First Scan
Install and configure garak, the LLM vulnerability scanner, then run your first automated scan against a local model to discover common vulnerabilities.
Hello World 提示詞注入
Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.
實驗室: Injection Detection 工具
Build a basic prompt injection detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
實驗室: Injection Techniques Survey
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
實驗室: Instruction Following Priority
Test how language models prioritize conflicting instructions from system prompts, user messages, and embedded directives to understand the instruction hierarchy.
實驗室: Basic 越獄 Techniques
Hands-on exploration of jailbreak techniques including role-play, DAN-style prompts, and academic framing against multiple models.
JSON Output 利用ation Basics
利用 LLM JSON mode to bypass content filters and extract information through structured output channels.
實驗室: Markdown Injection
Inject images, links, and formatting into LLM responses that exfiltrate data or alter display rendering in chat interfaces.
模型 安全 Comparison 實驗室
Compare the security posture of different LLM models by running identical test suites across providers.
實驗室: Multi-Language Injection
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
實驗室: Multi-Turn Escalation 攻擊s
Build trust over several conversation messages, then exploit established rapport to bypass safety mechanisms through gradual, multi-turn escalation techniques.
實驗室: Setting Up Ollama for Local LLM Testing
Install and configure Ollama for local LLM red teaming, download models, perform basic interactions, and compare safety behavior between local and API-hosted models.
Output Format Control 實驗室
Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.
實驗室: Output Format 利用ation
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
實驗室: Output Format Manipulation
Hands-on lab on forcing LLMs to output in specific formats including JSON injection, markdown manipulation, and structured output exploitation for data exfiltration.
實驗室: Output Steering
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
實驗室: Payload Crafting
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
實驗室: Encoding Bypasses
Test Base64, ROT13, hex, and Unicode encoding techniques to evade LLM content filters while preserving the semantic meaning of adversarial payloads.
LLM Playground 安全 Testing
Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.
實驗室: Prompt Leaking Basics
Learn to extract system prompts from chatbots using common leaking techniques including direct requests, indirect elicitation, and format manipulation.
Prompt Structure Analysis 實驗室
Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.
實驗室: Promptfoo Setup and First Eval
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
實驗室: PyRIT Setup and First 攻擊
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated red teaming, then run your first orchestrated attack against a local model.
Rate Limit and Quota Mapping
Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.
Response Analysis 基礎
Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.
實驗室: Analyzing LLM Responses
Learn to identify safety refusals, partial compliance, inconsistent behavior, and response fingerprinting techniques when analyzing LLM outputs during red team testing.
實驗室: Role-Play 攻擊s
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
實驗室: Role-Play & Persona 攻擊s
Hands-on lab exploring role-play and persona-based jailbreak techniques, where you craft scenarios that trick LLMs into adopting alternative identities to bypass safety guardrails.
實驗室: Mapping Safety Boundaries
Systematically discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.
Safety Boundary Mapping
Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.
Safety 訓練 Boundary Probing
Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.
實驗室: Building a Simple Test Harness
Build a reusable Python test harness that automates sending test prompts, recording results, and calculating attack success metrics.
實驗室: Structured Output Manipulation
Explore how JSON mode, schema injection, and forced output formats can bypass LLM safety filters and manipulate structured responses.
實驗室: System Prompt Extraction
Hands-on techniques for extracting hidden system prompts from LLM-powered applications using direct asking, role-play, instruction replay, and other methods.
實驗室: System Prompt Override
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
實驗室: System Prompt Reconstruction
Use various techniques to systematically extract and reconstruct the full system prompt of an LLM application, combining direct, indirect, and incremental extraction methods.
紅隊 工具 Installation and Configuration
Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.
專家 AI 紅隊實驗室
處理尖端 AI 安全挑戰的進階實驗室,包含量化利用、獎勵駭客、代理利用、多代理攻擊與水印移除。
Activation Steering for 利用ation
Use activation steering techniques to manipulate model behavior at the representation level without prompt modification.
實驗室: Adversarial ML From Scratch
Hands-on expert lab for implementing gradient-based adversarial attacks against language models from scratch without frameworks, building intuition for how adversarial perturbations exploit model gradients.
實驗室: Adversarial Suffix Optimization
Implement GCG-style adversarial suffix attacks that automatically discover token sequences causing language models to comply with harmful requests. Covers gradient-based optimization, transferability analysis, and defense evaluation.
實驗室: Alignment Stress Testing
Push language model alignment to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure alignment degradation curves, and map the boundaries of model compliance.
實驗室: Build 代理 安全 Scanner
Build an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific attack surfaces that traditional LLM testing misses.
實驗室: Build an AI Fuzzer
Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.
實驗室: Build Behavior Diff 工具
Build a tool that systematically compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.
實驗室: Build Guardrail Evaluator
Build an automated framework for evaluating AI guardrails and safety filters. Test input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.
實驗室: Build 越獄 Automation
Build an automated jailbreak testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.
Cross-Architecture Transfer 實驗室
Develop attacks on one model architecture that reliably transfer to fundamentally different architectures.
實驗室: 防禦 Engineering 實驗室
專家-level lab for building and evaluating a production-grade multi-layer AI defense system including input classifiers, output monitors, semantic analysis, and adversarial robustness testing.
Emergent Behavior 利用ation
Identify and exploit emergent behaviors in frontier models that arise from scale and are not present in smaller models.
實驗室: Emergent Capability Probing
Systematically test large language models for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. Build a structured probing framework for capability discovery.
Formal Verification Limits 實驗室
Explore the limits of formal verification for neural network behavior and identify gaps exploitable by adversaries.
實驗室: Full-Stack AI 利用ation
Chain vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. Learn to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.
實作:Computer Use 代理攻擊
探索針對以 GUI 為本之 computer use 代理(透過視覺與動作原語與畫面互動)之攻擊向量的實作課程。
實驗室: Deploy Honeypot AI
Build and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. Learn honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.
Interpretability-指南d 攻擊 Design
Use mechanistic interpretability to identify exploitable circuits and design targeted attacks.
MoE Routing 利用ation 實驗室
利用 Mixture-of-專家s routing mechanisms to selectively activate or suppress expert modules in MoE models.
實驗室: Multi-代理 攻擊 Coordination
Hands-on lab building and deploying coordinated multi-agent attack systems that divide red teaming tasks across specialized attacking agents.
Neural Backdoor Engineering
Engineer sophisticated neural backdoors that activate on specific trigger patterns while evading detection methods.
實驗室: Novel 越獄 Research
Systematic methodology for discovering new jailbreak techniques against large language models. Learn to identify unexplored attack surfaces, develop novel attack vectors, and validate findings with scientific rigor.
實驗室: ML Pipeline 投毒
Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. Learn to identify and exploit weaknesses across the full ML lifecycle.
實作:利用量化模型弱點
實作探索模型量化如何退化安全對齊之實作,含尋找並利用與精度相關漏洞之技術。
實驗室: 模型 Registry Compromise
Explore techniques for compromising model registries and substituting malicious models into production pipelines. Learn to detect model tampering, verify model provenance, and secure the model supply chain.
實驗室: 漏洞 Research Methodology
Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, attack surface mapping, experimental design, validation protocols, and responsible disclosure.
實作:RLHF 獎勵 hacking
探索如何鑽營用於 RLHF 對齊之獎勵模型——暴露獎勵代理訊號與實際安全目標之間缺口之實作。
實驗室: Create a Safety Benchmark
Design, build, and validate a comprehensive AI safety evaluation suite. Learn benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.
End-to-End 訓練 Time 攻擊s
Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.
實驗室: AI Watermark Detection & Removal
Hands-on lab exploring techniques for detecting and removing statistical watermarks embedded in AI-generated text, and evaluating watermark robustness.
進階 Encoding Chain 攻擊s
Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
實驗室: 進階 RAG 投毒
Hands-on lab for crafting documents that reliably get retrieved and influence RAG responses for specific target queries.
實驗室: 利用ing AI 代理s
Hands-on lab for exploiting AI agents with tool access through indirect prompt injection, tool-call chaining, and privilege escalation techniques.
實驗室: 代理 記憶體 Manipulation
Hands-on lab for injecting persistent instructions into an agent's memory and context that affect future interactions and conversations.
實驗室: 代理 Prompt Leaking
Hands-on lab for extracting system prompts from tool-using AI agents by exploiting tool interactions, observation channels, and multi-step reasoning to leak protected instructions.
API Abuse Pattern 實驗室
Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.
實驗室: Automated 紅隊 Pipeline
Hands-on lab for building a continuous AI red team testing pipeline using promptfoo, GitHub Actions, and automated attack generation to catch safety regressions before deployment.
實驗室: Automated 紅隊 Testing
Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.
實驗室: Azure Content Filter Evasion
Hands-on lab for mapping and testing Azure OpenAI Service content filtering categories, severity levels, and bypass techniques.
實驗室: AWS Bedrock Guardrails Testing
Hands-on lab for systematically testing and bypassing AWS Bedrock's built-in guardrails including content filters, denied topics, and word filters.
實驗室: Chunking 利用ation
Hands-on lab for crafting documents that split across chunks in ways that hide malicious content from chunk-level filtering while maintaining attack effectiveness.
實驗室: Citation Fabrication
Hands-on lab for getting RAG systems to cite documents that don't exist or misattribute quotes to legitimate sources.
ML Classifier Evasion 實驗室
Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.
實驗室: Context Window Overflow 攻擊s
Hands-on lab exploring how overflowing a model's context window with padding content can push safety instructions out of the attention window and enable injection attacks.
Context Window 投毒 實驗室
利用 context window management to inject persistent adversarial content that influences future model responses.
實驗室: Context Window Stuffing 攻擊s
Hands-on lab demonstrating how oversized inputs can overwhelm an LLM's context window to dilute safety instructions, push system prompts out of the attention window, or cause instruction amnesia.
Conversation Splitting 攻擊 實驗室
Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.
實驗室: Data Exfiltration Channels
Hands-on lab for extracting data from AI systems through markdown image rendering, invisible links, tool call parameters, and other covert exfiltration channels.
實驗室: Data Exfiltration Channels (中階 實驗室)
Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.
實驗室: Data Exfiltration Techniques
Hands-on lab for extracting sensitive data from AI systems including system prompt extraction, context leakage via markdown rendering, and URL-based data exfiltration.
實驗室: Systematic 防禦 Bypass
Hands-on lab for methodically identifying, categorizing, and bypassing common LLM defense mechanisms including input filters, output filters, and system prompt hardening.
實驗室: 防禦 Effectiveness Measurement
Hands-on lab for quantifying AI guardrail robustness using attack success rates, evasion metrics, false positive rates, and statistical analysis of defense performance.
實驗室: 防禦 Effectiveness Testing
Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.
Document-Based RAG Injection 實驗室
Inject adversarial content into documents that will be processed by a RAG system to influence model responses.
Practical Embedding Manipulation
Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.
實作:Embedding 空間操弄
使用 sentence-transformers 經 embedding 碰撞攻擊打造為特定查詢被檢索而最佳化之文件之實作。
實驗室: Encoding Bypass 實驗室
Hands-on lab for systematically testing Base64, ROT13, Unicode homoglyph, hexadecimal, and combined encoding bypasses against LLM safety filters with measured success rates.
實驗室: Encoding Evasion Techniques
Hands-on lab using Base64, ROT13, Unicode normalization, and custom encoding schemes to evade input filters and safety classifiers in language model systems.
實驗室: Few-Shot Example Injection
Hands-on lab exploring how injected few-shot examples can steer language model outputs toward attacker-chosen behaviors by exploiting in-context learning.
Fine-Tuning API 安全 Probing
Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.
實驗室: Function Calling & 工具 Use Abuse
Hands-on lab exploring how attackers can manipulate LLM function calling and tool use to execute unauthorized actions, exfiltrate data, and chain tool calls for maximum impact.
實驗室: Function Calling Injection
Hands-on lab for exploiting function calling mechanisms by crafting inputs that manipulate which functions get called and with what parameters.
Guardrail Rule Extraction 實驗室
Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.
實驗室: Image-Based 提示詞注入
Hands-on lab exploring how text instructions embedded in images can be used to perform prompt injection against vision-language models (VLMs) that process visual input.
實驗室: Indirect 提示詞注入
Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.
實驗室: Indirect 提示詞注入 Chains
Hands-on lab for setting up indirect prompt injection scenarios through web pages, emails, and documents, testing multi-hop injection chains against AI systems.
實驗室: JSON Input Injection
Hands-on lab exploring how adversarial payloads injected through structured JSON inputs can manipulate language model behavior, bypass schema validation, and exploit parsing inconsistencies.
實驗:建立 LLM 裁判評估器
為建立 LLM 基評估器以對紅隊攻擊輸出評分、比較模型脆弱度並為自動化攻擊活動奠基之實作實驗。
實驗室: Markdown-Based Injection
Hands-on lab exploring how Markdown rendering in AI-generated outputs can be exploited to inject hidden content, exfiltrate data through image tags, and manipulate displayed information.
實驗室: MCP 工具 Abuse Scenarios
Hands-on lab for exploiting 模型 Context Protocol tool definitions through malicious tool descriptions, tool confusion attacks, and server impersonation.
實驗室: MCP Server 利用ation
Hands-on lab for attacking a vulnerable MCP server, demonstrating tool shadowing, consent phishing, and server impersonation techniques.
實驗室: 代理 記憶體 投毒
Hands-on lab exploring how conversational memory in AI agents can be poisoned to alter future behavior, inject persistent instructions, and exfiltrate data across sessions.
實驗室: RAG Metadata Injection
Hands-on lab for exploiting metadata fields like titles, descriptions, and timestamps to manipulate RAG retrieval ranking and influence responses.
實作:基礎模型提取
為以 API 為本之模型提取攻擊之實作,查詢目標模型以近似其行為、量測保真度,並理解查詢預算。
Practical 模型 Fingerprinting 實驗室
Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.
Multi-Language Safety Bypass 實驗室
利用 gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.
Multi-模型 Pipeline 攻擊 實驗室
攻擊 a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.
實驗室: Multi-Turn Progressive Escalation
Hands-on lab building multi-turn attack sequences that progressively escalate from benign queries to full safety bypass through gradual trust establishment and boundary pushing.
實驗室: Multi-Turn 攻擊 Campaigns
Hands-on lab for executing multi-turn crescendo attacks against LLMs, measuring safety degradation over conversation length, and building persistent attack campaigns.
實驗室: Multimodal Injection
Hands-on lab for embedding text instructions in images to exploit vision-enabled LLMs. Learn to craft visual prompt injections, test OCR-based attacks, and evaluate multimodal safety boundaries.
實驗室: Multimodal Injection (中階 實驗室)
Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.
Output Channel Discovery 實驗室
Discover alternative output channels in LLM applications that bypass standard output filtering.
實驗室: 代理 Permission Escalation
Hands-on lab for tricking AI agents into performing actions beyond their intended scope through privilege escalation techniques.
Prompt Leakage Technique 實驗室
Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.
實作:以 promptfoo 進行回歸測試
設置 promptfoo 以對 LLM 應用執行自動化回歸測試之實作,確保安全屬性於模型更新與提示變更間保持。
實驗室: Running PyRIT Campaigns
Hands-on lab for configuring and running automated red teaming campaigns using Microsoft's PyRIT framework against LLM applications.
實驗室: RAG Pipeline 投毒
Hands-on lab for setting up a RAG pipeline with LlamaIndex, injecting malicious documents, testing retrieval poisoning, and measuring injection success rates.
Reasoning Chain Hijacking
Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.
實驗室: Recursive Injection Chains
Hands-on lab building self-reinforcing prompt injection chains where each model response amplifies the injection, creating feedback loops that progressively override safety controls.
實驗室: Re-ranking 攻擊s
Hands-on lab for manipulating the re-ranking stage of RAG pipelines to promote or suppress specific documents in retrieval results.
Semantic Deception Crafting 實驗室
Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.
Session State 利用ation 實驗室
利用 session management vulnerabilities in LLM applications to access other users data or escalate privileges.
Structured Output Safety Bypass
Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.
實驗室: Supply Chain Audit
Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.
實驗室: ML Supply Chain Scan
Hands-on lab for auditing machine learning model dependencies, detecting malicious packages in ML pipelines, and scanning model files for backdoors and supply chain threats.
實驗室: Token Smuggling & Encoding Bypass
Hands-on lab exploring how tokenization boundaries, token-level manipulations, and encoding tricks can smuggle attack payloads past safety classifiers.
實驗室: 工具 Result 投毒
Hands-on lab for poisoning tool outputs to redirect agent behavior by injecting malicious content through tool results.
實驗室: Language Translation Bypass
Hands-on lab exploiting language translation capabilities to bypass safety filters by submitting prompts in low-resource languages, code-switching, and using transliteration tricks.
實驗室: 進階 Token Smuggling via Unicode Normalization
利用 Unicode normalization differences between input validators and LLM tokenizers to bypass content filters and inject hidden instructions.
實驗室: Workflow Hijacking
Hands-on lab for redirecting multi-step agent workflows to perform unintended actions by manipulating intermediate states and control flow.
實作:打造音訊對抗範例
使用 Python 音訊處理建立對抗音訊範例之實作,針對 Whisper 轉譯注入文字。
實驗室: Multi-Modal 攻擊 Chain
Hands-on lab for building and executing a multi-step attack chain that combines image injection, document exploitation, and text-based techniques against a multimodal AI system.
實驗室: Video 模型 Adversarial 攻擊s
Hands-on lab crafting adversarial video frames using frame-level perturbation with OpenCV and PyTorch for video model exploitation.
實驗室: Crafting Image-Based Injections
Hands-on lab for creating image-based prompt injections, testing against VLMs, and measuring success rates across different injection techniques.
紅隊實驗室與營運
AI 紅隊演練的營運基礎:實驗室環境、證據處理、案件工作流程,以及專業 AI 安全評估的團隊管理。
實驗室: 攻擊ing Federated Learning
Hands-on lab implementing model poisoning attacks in a simulated federated learning setup using the Flower framework: Byzantine attacks, model replacement, and measuring attack impact.
實驗室: 利用ing Quantized 模型s
Hands-on lab comparing attack success rates across quantization levels: testing jailbreaks on FP16 vs INT8 vs INT4, measuring safety degradation, and crafting quantization-aware exploits.
實驗室: Inserting a Fine-Tuning Backdoor (訓練 Pipeline)
Hands-on lab for creating, inserting, and detecting a trigger-based backdoor in a language model through fine-tuning, using LoRA adapters on a local model.
實驗室: 投毒 a 訓練 Dataset
Hands-on lab demonstrating dataset poisoning and fine-tuning to show behavioral change, with step-by-step Python code, backdoor trigger measurement, and troubleshooting guidance.