# lab

labtoolscomparisonmethodology

實作：工具比較——同一目標，4 個工具

對同一目標模型比較 Garak、PyRIT、promptfoo 與 Inspect AI 之實作。評估各工具之覆蓋、速度、發現品質與實務取捨。

labembeddingshands-onpythonintermediate

實作：探索嵌入空間

使用 Python 視覺化嵌入空間、測量語意相似度，並展示如何打造與目標查詢匹配的對抗性文件之動手實作。

labreconnaissanceattack-surfacemappingintermediate

實作：繪製 AI 系統的攻擊面

對 AI 系統進行偵察的實作課程——辨識元件、繪製資料流、枚舉工具，並撰寫攻擊面文件。

實驗室: Simulated Robot Control 利用ation

Hands-on lab exercises exploiting LLM-controlled robots in simulation: environment setup, injection attacks, safety bypass testing, and multi-step exploitation chains using PyBullet.

labrobotsimulation

advancedlabragmanipulationlabs

進階 RAG 操控實驗室

執行精密的RAG 操控,包括嵌入空間攻擊,中繼資料投毒,檢索演算法作弊.

advancedlabadversarialrobustnessevallabs

對抗性穩健度評估

建構完整的adversarial 穩健度評估 framework,用於評估模型安全姿態.

實作：生成對抗性後綴

實作 the Greedy Coordinate Gradient (GCG) algorithm to generate 對抗性 suffixes that cause 語言模型 to comply with harmful requests by appending optimized 符元 sequences.

labadversarial-suffixgcg

labattack-chainexploitmulti-stageadvanced

實作：攻擊鏈構建

動手實作 for chaining three or more distinct vulnerabilities into a complete exploit sequence that achieves objectives impossible with any single technique alone.

labaudioadversarialmultimodaladvancedhands-on

實作：音訊對抗樣本

動手實作 for crafting 對抗性 audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.

autodanadvancedlabimplementationlabs

AutoDAN 實作實驗室

實作the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.

labbackdoordetectionforensicsfine-tuning

Lab: Backdoor Detection in Fine-Tuned Models

分析 a fine-tuned language model to find and characterize an inserted 後門, using behavioral probing, activation analysis, and statistical testing 技術.

Lab: Continuous Automated Red Teaming Pipeline

建構 an end-to-end CART pipeline that continuously generates, executes, and scores 對抗性 attacks against LLM applications, with alerting and trend tracking.

labcartcontinuous

labcloudassessmentend-to-endapi-securityadvancedhands-on

實作：雲端 AI 安全評估

Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.

labcloudassessmentinfrastructuresecurityadvanced

實驗室:雲端 AI 評估

動手實驗室,主題為conducting an 端對端安全評估 of a 雲端部署 AI system,包括基礎設施檢視,API 測試,模型安全 evaluation,資料流 analysis.

labcontainer-securitybreakoutinfrastructure

實作：容器化模型突破

探索自容器化 AI 應用逃逸至主機系統之技術，測試 ML 部署環境中之容器隔離邊界。

labcross-modeltransfer-attacksmulti-model

實作：跨模型遷移攻擊

測試 whether 越獄s discovered on one language model transfer effectively to others, building a systematic methodology for cross-model 漏洞 research.

exploitationadvancedcrosslabpluginlabs

Cross-Plugin 利用鏈

鏈 exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.

classifiercustomadvancedlabtraininglabs

Custom 安全 Classifier 訓練

Train a custom input safety classifier與then develop payloads that reliably evade it to underst與classifier limitations.

labharnessproductionframework

Lab: Building a Production Red Team Harness

建構 a full-featured, production-quality red team harness with multi-model support, async testing, structured result storage, and HTML reporting.

labcustom-harnesstesting-frameworkdomain-specificadvancedhands-on

Lab: Custom Test Harness for Specific Applications

建構 a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.

defenseadvancedlabteamredlabs

防禦 vs Red Team War Game

Alternate between attacking與defending an LLM application to develop skills in both offensive與defensive operations.

labensemble-attacksmulti-modelautomated-attacks

實作：集成攻擊

Use multiple 語言模型 collaboratively to discover attack strategies that bypass any single model's defenses, leveraging model diversity for more effective 紅隊演練.

labfederated-learningpoisoningexperthands-on

Lab: Federated Learning Poisoning Attack

動手實作 for understanding and simulating 投毒 attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted 梯度 updates.

Lab: Inserting a Fine-Tuning Backdoor

Advanced lab demonstrating how 微調 can insert hidden 後門s into 語言模型 that activate on specific trigger phrases while maintaining normal behavior otherwise.

labfine-tuningbackdoor

labfuzzingautomated-testingboundary-testing

Lab: Build an AI Fuzzing Harness

Create a systematic fuzzing framework for testing LLM boundaries, generating and mutating inputs to discover unexpected model behaviors and safety edge cases.

labgpuside-channelhardwareinformation-leakage

實作：GPU 側通道攻擊

Demonstrate information leakage through GPU memory residuals and timing side channels, showing how shared GPU infrastructure can expose sensitive model data.

advancedlabgradientguidedpoisoninglabs

Gradient-Guided Data 投毒

Use gradient information from open-source models to craft optimally poisoned training examples.

Lab: Chaining Guardrail Bypasses

Advanced lab on identifying, isolating, and chaining multiple 護欄 bypass 技術 to defeat layered defense systems in production LLM applications.

labguardrailsbypass

labforensicsincident-responseinvestigation

Lab: AI Incident Investigation

Investigate logs and artifacts from a compromised AI system to reconstruct the attack chain, identify the 漏洞 exploited, and determine the scope of the breach.

labinference-serverinfrastructurevllmtriton

實作：推論伺服器漏洞利用

Attack vLLM, TGI, and Triton inference servers to discover information disclosure vulnerabilities, denial-of-service vectors, and configuration weaknesses in model serving infrastructure.

labmcp漏洞利用tool-usefull-chainadvanced

Lab: MCP Full Exploitation

動手實作 for conducting a complete Model Context Protocol server compromise including tool enumeration, permission escalation, cross-tool attacks, and data exfiltration through MCP channels.

labmodel-comparisonsafety-evaluationbenchmarking

Lab: Model Security Comparison

系統性 compare the safety posture of major 語言模型 using a standardized test suite, building quantitative security profiles for GPT-4, Claude, and Gemini.

distillationadvancedlabattacklabsmodel

模型 Distillation 安全實驗室

萃取model capabilities through distillation techniques using only 黑盒 API access.

labmodel-servinginfrastructuretensorflow-servingtorchserve

Lab: Model Serving Framework Attacks

Exploit vulnerabilities in TensorFlow Serving, TorchServe, and Triton Inference Server, targeting model loading, API endpoints, and management interfaces.

labmulti-modelcomparativebenchmarking

Lab: Multi-Model Comparative Red Teaming

測試 the same attack suite across GPT-4, Claude, Llama, and Gemini. Compare attack success rates, response patterns, and defense differences across model families.

advancedlablabsattackmultiobjective

Multi-Objective 攻擊優化

Optimize attack payloads for multiple simultaneous objectives: jailbreaking,data extraction,防禦規避.

labmulti-turnconversation-steeringcrescendoadvanced

Lab: Advanced Multi-Turn Attacks

動手實作 for executing 20+ turn conversation steering strategies that gradually erode AI safety boundaries to extract protected information through patient, escalating dialogue.

chainadvancedlabmultimodalattacklabs

Multimodal 攻擊鏈實驗室

鏈 attacks across text,image,structured data modalities to exploit multimodal system vulnerabilities.

labmultimodalpipelinevision

Lab: Multimodal Attack Pipeline

建構 an automated multimodal attack pipeline that generates 對抗性 images, combines them with text prompts, and tests against vision-語言模型 (VLMs).

labpairautomatedattack-algorithm

Lab: Implementing PAIR Attack

實作 the PAIR (Prompt Automatic Iterative Refinement) algorithm where an attacker LLM iteratively refines 越獄 prompts against a target LLM until a successful attack is found.

labpairautomated-attacksjailbreaking

實作：PAIR 攻擊實作

建構 a complete Prompt Automatic Iterative Refinement system that uses an attacker LLM to automatically generate and refine 越獄 prompts against a target model.

productionadvancedlabsimulationlabs

Production Environment Simulation 實驗室

測試attacks 對抗 a simulated production environment with realistic logging,monitoring,alerting.

labpurple-teamattack-defensecollaborativeadvancedhands-on

實作：紫隊演練

Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.

labpurple-teamattack-defensemetricsadvanced

Lab: Purple Team Exercise (Advanced Lab)

動手實作 for conducting simultaneous attack and defense operations against an AI system with real-time metrics tracking, adaptive defense deployment, and coordinated red-blue team workflows.

labragfull-chaindocument-injectionexfiltrationadvanced

Lab: RAG Full Chain Attack

動手實作 for executing a complete RAG attack chain from document injection through retrieval manipulation to data exfiltration, targeting every stage of the Retrieval-Augmented Generation pipeline.

labreasoningo1chain-of-thought

實作：推理模型漏洞利用

Attack reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection 技術.

laborchestrationautomationred-team-ops

實作：紅隊編排

建構 an orchestration system that coordinates multiple attack strategies simultaneously, managing parallel attack campaigns and synthesizing results into comprehensive risk assessments.

labregression-testingsafetyautomationci-cd

Lab: Safety Regression Testing at Scale

建構 automated pipelines that detect safety degradation across model versions, ensuring that updates and 微調 do not introduce new vulnerabilities or weaken existing protections.

probingadvancedlabrepresentationlabs

Representation Probing for Vulnerabilities

Probe internal model representations to identify exploitable features與develop representation-level attacks.

analysisadvancedlabrewardlabsmodel

Reward 模型分析實驗室

Analyze與exploit reward model biases to craft responses that score high on safety while embedding harmful content.

labroutingmodel-selectionarchitecture-attacks

Lab: Model Routing Exploitation

Attack systems that route requests to different models based on complexity or content, exploiting routing logic to reach less-defended models or bypass safety filters.

labtaptree-of-attacksalgorithm

Lab: TAP Algorithm Implementation

實作 the TAP (Tree of Attacks with Pruning) algorithm that uses tree-based search over attack prompts with branch pruning to efficiently find 越獄s.

labtraining-dataextractionprivacy

Lab: Training Data Extraction at Scale

Extract memorized training data from 語言模型 using prefix-based extraction, divergence testing, and membership inference. Measure extraction rates and assess privacy risks.

labtransfer-attackscross-modeladversarialadvanced

Lab: Transfer Attack Development

動手實作 for crafting 對抗性 prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.

labtransfer-attacksadversarialcross-modeladvancedhands-on

Lab: Transfer Attack Development (Advanced Lab)

開發對抗性 attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box 漏洞利用.

labanthropicclaudeapibeginner

Lab: Anthropic Claude API Basics

設定 the Anthropic Claude API for 紅隊演練, learn authentication, the Messages API, 系統提示詞s, and how temperature and top-p affect attack success rates.

explorationlabbeginnerlabsmultiapi

Multi-Provider API Exploration

Explore與compare API behaviors across OpenAI,Anthropic,Google AI to underst與provider-specific security characteristics.

labapi-keyssecuritysecrets-managementbeginner

Lab: API Key Security

學習 common API key exposure vectors, secure key management with .env files, detect keys in git history, implement key rotation, and apply least-privilege principles.

Lab: API-Based Model Testing

學習 to test 語言模型 through their APIs including OpenAI, Anthropic, and local Ollama endpoints. 建構 reusable API testing functions with proper error handling.

labapitestingopenai

basiclabbeginnerlabsautomatedtesting

Basic Automated 測試 Setup

Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.

labdefenseinput-filterregexbeginner

Lab: Build Your First Defense

動手實作 for building a basic AI input defense system using keyword matching, regex filters, and simple classifiers to block 提示詞注入 attempts.

testingdefensebasiclabbeginnerlabs

Basic 防禦 Mechanism 測試

Identify與categorize the defensive mechanisms present in a target LLM application through structured probing.

labdefenseinput-filterbeginnerhands-on

Lab: Build Your First Defense (Beginner Lab)

Create a simple input filter that blocks common 提示詞注入 patterns, then test it against the attack 技術 you have learned in previous labs.

reportingbasiclabbeginnerexerciselabs

安全 Finding Documentation 練習

Practice documenting security findings in a professional format with reproducible steps與impact assessment.

labmodel-comparisonsafetybenchmarkingbeginnerhands-on

實作：模型比較

測試 the same attack 技術 against different 語言模型 and compare their safety behaviors, refusal patterns, and 漏洞 profiles.

labcomparisonsafetymulti-modelbeginner

Lab: Compare Model Safety

動手實作 for running identical safety tests against GPT-4, Claude, Gemini, and Llama to compare how different models handle 提示詞注入, 越獄s, and safety boundary enforcement.

labcontext-manipulationprompt-injectionpersonabeginnerhands-on

實作：上下文操控

Manipulate the conversational context to change model behavior, using 技術 like context flooding, false history injection, and persona shifting.

labcontext-overflowprompt-injectionattentionbeginner

Lab: Context Overflow Attacks

探索 context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.

historyconversationlabbeginnerbasicslabs

Conversation History 分析

分析how conversation history influences model behavior與identify opportunities for context manipulation.

labdefense-bypassevasionfiltersbeginnerhands-on

實作：防禦繞過基礎

學習 basic 技術 to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.

labdefense-evasionfiltersbypass

Lab: Basic Defense Evasion

Encounter and bypass basic LLM defenses including keyword filters, content classifiers, and input preprocessing through encoding tricks, synonym substitution, and payload splitting.

labdelimiter-injectionprompt-injectionxmlmarkdownbeginner

Lab: Delimiter Injection Attacks

Craft payloads that exploit XML, markdown, and special 符元 delimiters to break instruction boundaries and manipulate how LLMs parse 系統提示詞s versus user input.

labdelimiter-escapeprompt-injectionboundariesbeginnerhands-on

實作：分隔符逃脫攻擊

Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.

securitylabbeginnerembeddingbasicslabs

嵌入 Basics for 安全

Underst與text embeddings與their security relevance by generating,comparing,manipulating embedding vectors.

Lab: Encoding & Obfuscation Techniques

動手實作 exploring how text encoding, character substitution, and obfuscation methods can bypass LLM content filters and safety mechanisms.

labencodingobfuscation

實作：設置你的紅隊環境

設置完整 AI 紅隊環境之逐步指南——Python、API 客戶端、掃描工具與本地模型。

labenvironmentsetuptools

errorlabbeginnerintelligencelabsmessage

Error Message Intelligence Gathering

萃取system architecture information from error messages與response patterns in LLM applications.

labethicsresponsibledocumentationbeginnerhands-on

Lab: Ethical Red Teaming

Practice responsible AI 紅隊演練 with proper documentation, scope management, and ethical decision-making frameworks.

labethicsresponsible-testingdocumentationbeginner

Lab: Ethical Red Teaming (Beginner Lab)

動手實作 for practicing responsible AI 紅隊演練 with proper documentation, scope management, ethical boundaries, and disclosure procedures.

labfew-shotin-context-learningprompt-injectionbeginner

Lab: Few-Shot Manipulation Attacks

Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired 對抗性 behavior through fabricated conversation examples.

labprompt-injectionbeginnerhands-on

實作：你的第一個提示詞注入

動手實作 where you perform your first 提示詞注入 attacks against a chatbot, progressing from basic overrides to more sophisticated 技術.

labjailbreakollamabeginnerhands-on

實作：你的第一個越獄嘗試

Try basic 越獄技術 against a local model using Ollama, learning the difference between 提示詞注入 and 越獄ing through hands-on experimentation.

實作：以 Garak 掃描

安裝並對本地模型執行 Garak LLM 漏洞掃描器。學習組態 probe、generator 與 evaluator 以進行完整掃描。

labgarakscanningtools

labgaraktool-setup漏洞-scannerbeginnerhands-on

實作：Garak 設定與首次掃描

Install and configure garak, the LLM 漏洞 scanner, then run your first automated scan against a local model to discover common vulnerabilities.

injectionlabbeginnerlabshelloworld

Hello World 提示詞注入

Write與test your first 提示詞注入 payload 對抗 a simple chatbot to underst與the fundamental attack mechanism.

labinjection-detectiondefenseclassificationbeginnerhands-on

Lab: Injection Detection Tool

建構 a basic 提示詞注入 detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.

labprompt-injectionsurveytechniquesbeginnerhands-on

實作：注入技術概覽

Survey and test ten fundamental 提示詞注入技術 against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.

labinstruction-followingpriorityconflicting-instructionsbeginnerhands-on

Lab: Instruction Following Priority

測試 how 語言模型 prioritize conflicting instructions from 系統提示詞s, user messages, and embedded directives to understand the instruction hierarchy.

labjailbreakbeginnertechniques

Lab: Basic Jailbreak Techniques

Hands-on exploration of 越獄技術 including role-play, DAN-style prompts, and academic framing against multiple models.

exploitationoutputlabbeginnerlabsjson

JSON Output 利用 Basics

利用 LLM JSON mode to bypass content filters與extract information through structured output channels.

labmarkdown-injectiondata-exfiltrationrenderingbeginner

實作：Markdown 注入

Inject images, links, and formatting into LLM responses that exfiltrate data or alter display rendering in chat interfaces.

comparisonsecuritylabbeginnerlabsmodel

模型安全 Comparison 實驗室

Compare the security posture of different LLM models by running identical test suites across providers.

labmulti-languageprompt-injectiontranslationbeginnerhands-on

實作：多語言注入

測試提示詞注入技術 using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.

labmulti-turnescalationsocial-engineeringbeginner

Lab: Multi-Turn Escalation Attacks

建構 trust over several conversation messages, then exploit established rapport to bypass safety mechanisms through gradual, multi-turn escalation 技術.

labollamasetuplocal-modelsbeginner

Lab: Setting Up Ollama for Local LLM Testing

Install and configure Ollama for local LLM 紅隊演練, download models, perform basic interactions, and compare safety behavior between local and API-hosted models.

outputformatlabbeginnercontrollabs

Output Format Control 實驗室

Practice controlling LLM output formats to underst與how format constraints affect safety mechanism effectiveness.

laboutput-format漏洞利用data-leakagebeginnerhands-on

實作：輸出格式利用

Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.

laboutputformatmanipulation

實作：輸出格式操控

動手實作 on forcing LLMs to output in specific formats including JSON injection, markdown manipulation, and structured output 漏洞利用 for data exfiltration.

laboutput-steeringprompt-injectionoutput-manipulationbeginnerhands-on

實作：輸出引導

學習 to steer LLM outputs into specific formats, tones, and content using injection 技術 that manipulate how the model structures and presents its responses.

labpayload-craftingprompt-injectionred-teamingbeginnerhands-on

實作：載荷製作

學習 to craft effective 提示詞注入 payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.

labencodingbypasscontent-filtersbeginner

Lab: Encoding Bypasses

測試 Base64, ROT13, hex, and Unicode encoding 技術 to evade LLM content filters while preserving the semantic meaning of 對抗性 payloads.

labbeginnerlabsplaygroundtesting

LLM Playground 安全測試

Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.

labprompt-leakingsystem-promptextractionbeginnerhands-on

Lab: Prompt Leaking Basics

學習 to extract 系統提示詞s from chatbots using common leaking 技術 including direct requests, indirect elicitation, and format manipulation.

analysisstructurelabbeginnerpromptlabs

提示詞 Structure 分析實驗室

Reverse-engineer the prompt structure of target applications by analyzing response patterns與behavioral cues.

labpromptfootool-setupevaluationbeginnerhands-on

Lab: Promptfoo Setup and First Eval

Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.

labpyrittool-setupred-teamingmicrosoftbeginnerhands-on

Lab: PyRIT Setup and First Attack

Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated 紅隊演練, then run your first orchestrated attack against a local model.

mappingratelabbeginnerlimitlabs

Rate Limit and Quota Mapping

Map the rate limits,quotas,throttling behavior of LLM APIs to plan efficient testing strategies.

analysisresponselabbeginnerbasicslabs

Response 分析 Fundamentals

學習to analyze LLM responses for information leakage,inconsistencies,indicators of safety mechanism activation.

labresponse-analysisfingerprintingsafety-refusalsbeginner

Lab: Analyzing LLM Responses

學習 to identify safety refusals, partial compliance, inconsistent behavior, and response fingerprinting 技術 when analyzing LLM outputs during red team testing.

labrole-playpersonajailbreakbeginnerhands-on

實作：角色扮演攻擊

Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.

Lab: Role-Play & Persona Attacks

動手實作 exploring role-play and persona-based 越獄技術, where you craft scenarios that trick LLMs into adopting alternative identities to bypass safety 護欄s.

labrole-playpersona

labsafetyboundariesmappingbeginnerhands-on

Lab: Mapping Safety Boundaries

系統性 discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.

safetymappinglabbeginnerboundarylabs

安全 Boundary Mapping

Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics與documenting refusal patterns.

probingsafetylabbeginnertraininglabs

安全訓練 Boundary Probing

Systematically probe the boundaries of RLHF safety training to underst與where與how safety behaviors are enforced.

labharnessautomationpython

實作：建構簡單測試框架

建構 a reusable Python test harness that automates sending test prompts, recording results, and calculating attack success metrics.

labstructured-outputjsonschema-injectionbeginner

Lab: Structured Output Manipulation

探索 how JSON mode, schema injection, and forced output formats can bypass LLM safety filters and manipulate structured responses.

labsystem-promptextraction偵察

Lab: System Prompt Extraction

Hands-on 技術 for extracting hidden 系統提示詞s from LLM-powered applications using direct asking, role-play, instruction replay, and other methods.

labprompt-injectionsystem-promptoverridebeginnerhands-on

實作：系統提示詞覆寫

探索技術 to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.

labsystem-prompt偵察extractionbeginner

Lab: System Prompt Reconstruction

Use various 技術 to 系統性地 extract and reconstruct the full 系統提示詞 of an LLM application, combining direct, indirect, and incremental extraction methods.

labbeginnerconfigtoolinstallationlabs

Red Team 工具 Installation and Configuration

Install與configure essential AI red teaming tools: Garak,PyRIT,Promptfoo,custom Python harnesses.

專家 AI 紅隊實驗室

處理尖端 AI 安全挑戰的進階實驗室，包含量化利用、獎勵駭客、代理利用、多代理攻擊與水印移除。

labexpertoverview

labsteeringexpertlabsactivation

運用啟動向量操縱進行利用

運用啟動向量操縱技術，在表徵層面操控模型行為，而不需修改提示詞。

labexpertadversarial-mlgradientsfrom-scratch

Lab: Adversarial ML From Scratch

Hands-on expert lab for implementing 梯度-based 對抗性 attacks against 語言模型 from scratch without frameworks, building intuition for how 對抗性 perturbations exploit model 梯度s.

labexpertadversarial-suffixGCGoptimizationhands-on

實作：對抗性後綴優化

實作 GCG-style 對抗性 suffix attacks that automatically discover 符元 sequences causing 語言模型 to comply with harmful requests. Covers 梯度-based optimization, transferability analysis, and defense evaluation.

labexpertalignmentstress-testingsafetyhands-on

實作：對齊壓力測試

Push language model 對齊 to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure 對齊 degradation curves, and map the boundaries of model compliance.

labexpertagentscannersecurityhands-on

Lab: Build Agent Security Scanner

建構 an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific 攻擊面s that traditional LLM testing misses.

labexpertfuzzertestingadversarialhands-on

實作：建構 AI 模糊測試器

建構 a comprehensive fuzzing framework for large 語言模型 that 系統性地 generates malformed, unexpected, and 對抗性 inputs to discover crashes, unexpected behaviors, and safety bypasses.

labexpertbehavior-diffregressioncomparisonhands-on

Lab: Build Behavior Diff Tool

建構 a tool that 系統性地 compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.

labexpertguardrailsevaluationdefense-testinghands-on

Lab: Build Guardrail Evaluator

建構 an automated framework for evaluating AI 護欄s and safety filters. 測試 input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.

labexpertautomationjailbreakframeworkhands-on

Lab: Build Jailbreak Automation

建構 an automated 越獄 testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.

architecturecrosstransferlabexpertlabs

Cross-Architecture 遷移實驗室

發展attacks on one model architecture that reliably transfer to fundamentally different architectures.

labexpertdefense-engineeringproductionmulti-layer

Lab: Defense Engineering Lab

Expert-level lab for building and evaluating a production-grade multi-layer AI defense system including input classifiers, output monitors, semantic analysis, and 對抗性 robustness testing.

exploitationlabexpertbehavioremergentlabs

Emergent Behavior 利用

Identify與exploit emergent behaviors in frontier models that arise from scale與are not present in smaller models.

labexpertemergentcapabilityprobinghands-on

實作：湧現能力探測

系統性 test large 語言模型 for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. 建構 a structured probing framework for capability discovery.

labexpertformalverificationlabslimits

Formal Verification Limits 實驗室

探索the limits of formal verification for neural network behavior與identify gaps exploitable by adversaries.

labexpertfull-stackexploit-chainmulti-layerhands-on

Lab: Full-Stack AI Exploitation

串聯 vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. 學習 to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.

labexpertagentscomputer-useguihands-on

實作：Computer Use 代理攻擊

探索針對以 GUI 為本之 computer use 代理（透過視覺與動作原語與畫面互動）之攻擊向量的實作課程。

labexperthoneypotdeceptiondetectionhands-on

Lab: Deploy Honeypot AI

建構 and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. 學習 honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.

labexpertguidedattacklabsinterpretability

Interpretability-Guided 攻擊 Design

Use mechanistic interpretability to identify exploitable circuits與design targeted attacks.

routingmoeexploitationlabexpertlabs

MoE Routing 利用實驗室

利用 Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.

labexpertmulti-agentcoordinationhands-on

Lab: Multi-Agent Attack Coordination

動手實作 building and deploying coordinated multi-agent attack systems that divide 紅隊演練 tasks across specialized attacking agents.

neurallablabsexpertbackdoorengineering

Neural 後門 Engineering

Engineer精密的 neural backdoors that activate on specific trigger patterns while evading detection methods.

labexpertjailbreakresearchnovel-attackshands-on

實作：新型越獄研究

Systematic methodology for discovering new 越獄技術 against large 語言模型. 學習 to identify unexplored 攻擊面s, develop novel attack vectors, and validate findings with scientific rigor.

labexpertpipelinepoisoningsupply-chainhands-on

實作：ML 流水線投毒

Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. 學習 to identify and exploit weaknesses across the full ML lifecycle.

labexpertquantizationmodel-compressionhands-on

實作：利用量化模型弱點

實作探索模型量化如何退化安全對齊之實作，含尋找並利用與精度相關漏洞之技術。

labexpertregistrysupply-chainmodel-integrityhands-on

Lab: Model Registry Compromise

探索技術 for compromising model registries and substituting malicious models into production pipelines. 學習 to detect model tampering, verify model provenance, and secure the model supply chain.

labexpertresearchmethodology漏洞-discovery

Lab: Vulnerability Research Methodology

Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, 攻擊面 mapping, experimental design, validation protocols, and responsible disclosure.

labexpertrlhfreward-hackingalignmenthands-on

實作：RLHF 獎勵 hacking

探索如何鑽營用於 RLHF 對齊之獎勵模型——暴露獎勵代理訊號與實際安全目標之間缺口之實作。

labexpertbenchmarkevaluationsafetyhands-on

Lab: Create a Safety Benchmark

Design, build, and validate a comprehensive AI safety evaluation suite. 學習 benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.

attackstimelabexperttraininglabs

End-to-End 訓練 Time Attacks

執行a complete training-time attack from 資料投毒 through model deployment to triggered exploitation.

labexpertwatermarkingdetectionhands-on

Lab: AI Watermark Detection & Removal

動手實作 exploring 技術 for detecting and removing statistical 浮水印s embedded in AI-generated text, and evaluating 浮水印 robustness.

encodingchainsintermediateadvancedlablabs

進階 Encoding 鏈 Attacks

鏈 multiple encoding schemes to bypass multi-layer input filters in production LLM applications.

labragpoisoningadvanceddata-attacks

實驗室: 進階 RAG 投毒

動手實驗室,主題為crafting documents that reliably get retrieved與influence RAG responses for specific target queries.

labagentsexploitationtool-use

實驗室: Exploiting AI 代理

動手實驗室,主題為exploiting AI agents with tool access through indirect 提示詞注入,tool-call chaining,privilege escalation techniques.

labagentsmemorypersistenceagentic-attacks

實驗室: 代理記憶操控

動手實驗室,主題為injecting persistent instructions into an agent's memory與context that affect future interactions與conversations.

labagentsprompt-leakingtool-useintermediate

實驗室: 代理提示詞 Leaking

動手實驗室,主題為extracting system prompts from tool-using AI agents by exploiting tool interactions,observation channels,multi-step reasoning to leak protected instructions.

intermediateabuselabpatternslabsapi

API Abuse Pattern 實驗室

Discover與exploit API abuse patterns,包括 parameter injection,header manipulation,endpoint confusion.

labautomationci-cdpromptfoopipelineintermediate

實驗室:自動化紅隊流水線

動手實驗室,主題為building a continuous AI red team testing pipeline using promptfoo,GitHub Actions,automated attack generation to catch safety regressions before deployment.

labautomated-testingpromptfooci-cdintermediatehands-on

實驗室: Automated Red Team 測試

Set up promptfoo for continuous automated 安全測試 of AI applications,with custom evaluators與CI/CD integration.

labcloudazurecontent-filterevasioncloud-ai

實驗室: Azure Content Filter Evasion

動手實驗室,主題為mapping與testing Azure OpenAI Service content filtering categories,severity levels,bypass techniques.

labcloudawsbedrockguardrailscloud-ai

實驗室: AWS Bedrock 護欄測試

動手實驗室,主題為systematically testing與bypassing AWS Bedrock's built-in guardrails,包括 content filters,denied topics,word filters.

labragchunkingevasiondata-attacks

實驗室: Chunking 利用

動手實驗室,主題為crafting documents that split across chunks in ways that hide malicious content from chunk-level filtering while maintaining attack effectiveness.

labragcitationfabricationhallucinationdata-attacks

實驗室: Citation Fabrication

動手實驗室,主題為getting RAG systems to cite documents that don't exist or misattribute quotes to legitimate sources.

classifierintermediateevasionlablabs

ML Classifier Evasion 實驗室

發展payloads that evade machine learning-based input classifiers through adversarial text perturbation.

labcontext-overflowattentioncontext-window

實驗室: 上下文 Window Overflow Attacks

動手lab exploring how overflowing a model's context window with padding content can push safety instructions out of the attention window與enable injection attacks.

contextintermediatelabpoisoningwindowlabs

上下文 Window 投毒實驗室

利用 context window management to inject persistent adversarial content that influences future model responses.

labcontext-windowstuffing

實驗室: 上下文 Window Stuffing Attacks

動手lab demonstrating how oversized inputs can overwhelm an LLM's context window to dilute safety instructions,push system prompts out of the attention window,or cause instruction amnesia.

intermediatesplittingconversationlablabs

Conversation Splitting 攻擊實驗室

Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.

labdata-exfiltrationcovert-channelsmarkdownintermediate

實驗室: Data Exfiltration Channels

動手實驗室,主題為extracting data from AI systems through markdown image rendering,invisible links,tool call parameters,other covert exfiltration channels.

labdata-exfiltrationchannelsside-channelintermediatehands-on

實驗室: Data Exfiltration Channels (中階實驗室)

萃取sensitive information from AI systems through various exfiltration channels,包括 crafted links,image tags,tool calls,side-channel leakage.

labdata-exfiltrationleakageagents

實驗室: Data Exfiltration Techniques

動手實驗室,主題為extracting sensitive data from AI systems,包括系統提示詞 extraction,context leakage via markdown rendering,URL-based 資料外洩.

labdefense-bypasssystematic

實驗室: Systematic 防禦繞過

動手實驗室,主題為methodically identifying,categorizing,bypassing common LLM defense mechanisms,包括 input filters,output filters,系統提示詞 hardening.

labdefensemetricsmeasurementguardrailsintermediate

實驗室: 防禦 Effectiveness Measurement

動手實驗室,主題為quantifying AI guardrail robustness using attack success rates,evasion metrics,false positive rates,statistical analysis of defense performance.

labdefense-testingguardrailsmetricsintermediatehands-on

實驗室: 防禦 Effectiveness 測試

Systematically test與measure the robustness of AI guardrails using structured methodology,metrics,repeatable test suites.

intermediateinjectionlablabsragdocument

Document-Based RAG 注入實驗室

注入adversarial content into documents that will be processed by a RAG system to influence model responses.

intermediatelabpracticalmanipulationembeddinglabs

Practical 嵌入操控

Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.

labembeddingsmanipulationvectors

實作：嵌入空間操控

動手實作，示範如何透過嵌入碰撞攻擊，使用 sentence-transformers 來精心製作可被特定查詢檢索出的文件。

labencodingbypassobfuscationsystematic-testingintermediate

實驗室: Encoding 繞過實驗室

動手實驗室,主題為systematically testing Base64,ROT13,Unicode homoglyph,hexadecimal,combined encoding bypasses 對抗 LLM safety filters with measured success rates.

labencodingevasionobfuscationfilters

實驗室: Encoding Evasion Techniques

動手lab using Base64,ROT13,Unicode normalization,custom encoding schemes to evade input filters與safety classifiers in language model systems.

labfew-shotinjectionprompt-engineering

實驗室: Few-Shot Example 注入

動手lab exploring how injected few-shot examples can steer language model outputs toward attacker-chosen behaviors by exploiting in-context learning.

fineintermediatetuninglablabsapi

微調 API 安全 Probing

Probe fine-tuning APIs for security weaknesses,包括 insufficient validation與unsafe default configurations.

labfunction-callingtool-use

實驗室: Function Calling & 工具 Use Abuse

動手lab exploring how attackers can manipulate LLM function calling與tool use to execute unauthorized actions,exfiltrate data,chain tool calls for maximum impact.

labfunction-callinginjectionagentsagentic-attacks

實驗室: Function Calling 注入

動手實驗室,主題為exploiting function calling mechanisms by crafting inputs that manipulate which functions get called與with what parameters.

guardrailruleintermediatelablabsextraction

護欄 Rule 萃取實驗室

Systematically extract the rules與decision boundaries of guardrail systems through targeted probing.

實驗室: Image-Based 提示詞注入

動手lab exploring how text instructions embedded in images can be used to perform 提示詞注入對抗 vision-language models (VLMs) that process visual input.

labimageinjection

labindirect-injectionexternal-datadocumentsintermediatehands-on

實驗室: Indirect 提示詞注入

注入instructions through external data sources,包括 documents,web pages,emails that a target AI system processes as context.

labindirect-injectionchainsweb

實驗室: Indirect 提示詞注入 Chains

動手實驗室,主題為setting up indirect 提示詞注入 scenarios through web pages,emails,documents,testing multi-hop injection chains 對抗 AI systems.

labjsoninjectionstructured-inputparsing

實驗室: JSON Input 注入

動手lab exploring how adversarial payloads injected through structured JSON inputs can manipulate language model behavior,bypass schema validation,exploit parsing inconsistencies.

labllm-judgeevaluationautomation

實驗室: Building an LLM Judge Evaluator

動手實驗室,主題為building an LLM-based evaluator to score red team attack outputs,compare model vulnerability,lay the foundation for automated attack campaigns.

labmarkdowninjectionrenderingexfiltration

實驗室: Markdown-Based 注入

動手lab exploring how Markdown rendering in AI-generated outputs can be exploited to inject hidden content,exfiltrate data through image tags,manipulate displayed information.

實驗室: MCP 工具 Abuse Scenarios

動手實驗室,主題為exploiting 模型上下文 Protocol tool definitions through malicious tool descriptions,tool confusion attacks,server impersonation.

labmcptool-abuseagents

labmcpexploitationtool-shadowingconsent-phishingagentic-attacks

實驗室: MCP Server 利用

動手實驗室,主題為attacking a vulnerable MCP server,demonstrating tool shadowing,consent phishing,server impersonation techniques.

實驗室: 代理記憶投毒

動手lab exploring how conversational memory in AI agents can be poisoned to alter future behavior,inject persistent instructions,exfiltrate data across sessions.

labmemorypoisoning

labragmetadatainjectiondata-attacks

實驗室: RAG Metadata 注入

動手實驗室,主題為exploiting metadata fields like titles,descriptions,timestamps to manipulate RAG retrieval ranking與influence responses.

labmodel-extractionstealingapi

實驗室: Basic 模型萃取

動手實驗室,主題為API-based 模型萃取 attacks,querying a target model to approximate its behavior,measuring fidelity,understanding query budgets.

fingerprintingintermediatelabpracticallabsmodel

Practical 模型 Fingerprinting 實驗室

Fingerprint the underlying model behind an API endpoint to identify its type,version,configuration.

intermediatebypasslablanguagelabsmulti

Multi-Language 安全繞過實驗室

利用 gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.

pipelineintermediatelablabsmultimodel

Multi-模型流水線攻擊實驗室

攻擊 a pipeline where multiple models process data sequentially,exploiting trust between pipeline stages.

labmulti-turnescalationprogressivesocial-engineering

實驗室: Multi-Turn Progressive 提升

動手lab building multi-turn attack sequences that progressively escalate from benign queries to full safety bypass through gradual trust establishment與boundary pushing.

labmulti-turncampaignspersistence

實驗室: Multi-Turn 攻擊 Campaigns

動手實驗室,主題為executing multi-turn crescendo attacks 對抗 LLMs,measuring safety degradation over conversation length,building persistent attack campaigns.

labmultimodalvisionimage-injectionintermediate

實驗室: Multimodal 注入

動手實驗室,主題為embedding text instructions in images to exploit vision-enabled LLMs。Learn to craft visual prompt injections,test OCR-based attacks,evaluate multimodal safety boundaries.

labmultimodalimage-injectionvisionintermediatehands-on

實驗室: Multimodal 注入 (中階實驗室)

Embed 提示詞注入 instructions in images for vision-enabled models,testing how visual content can carry adversarial payloads.

discoverychannelintermediateoutputlablabs

Output Channel Discovery 實驗室

發掘alternative output channels in LLM applications that bypass standard output filtering.

labagentspermission-escalationprivilege-escalationagentic-attacks

實驗室: 代理 Permission 提升

動手實驗室,主題為tricking AI agents into performing actions beyond their intended scope through privilege escalation techniques.

intermediatetechniqueslabpromptlabsleakage

提示詞 Leakage Technique 實驗室

Practice multiple 系統提示詞 extraction techniques與measure their effectiveness across different targets.

實驗室: Regression 測試 with promptfoo

動手實驗室,主題為setting up promptfoo to run automated regression tests 對抗 LLM applications,ensuring that safety properties hold across model updates與prompt changes.

labpromptfooregression

實驗室: Running PyRIT Campaigns

動手實驗室,主題為configuring與running automated red teaming campaigns using Microsoft's PyRIT framework 對抗 LLM applications.

labpyritcampaigns

labragpoisoningdata-attacks

實驗室: RAG 流水線投毒

動手實驗室,主題為setting up a RAG pipeline with LlamaIndex,injecting malicious documents,testing retrieval poisoning,measuring injection success rates.

intermediatechainlabhijackingreasoninglabs

Reasoning 鏈 Hijacking

Hijack the model's chain-of-thought reasoning,以重新導向 its conclusions toward attacker-controlled objectives.

labrecursive-injectionchainsfeedback-loops

實驗室: Recursive 注入 Chains

動手lab building self-reinforcing 提示詞注入 chains where each model response amplifies the injection,creating feedback loops that progressively override safety controls.

labragrerankingmanipulationdata-attacks

實驗室: Re-ranking Attacks

動手實驗室,主題為manipulating the re-ranking stage of RAG pipelines to promote or suppress specific documents in retrieval results.

craftingintermediatesemanticlabdeceptionlabs

Semantic Deception Crafting 實驗室

Craft semantically coherent prompts that conceal adversarial intent from both human reviewers與ML classifiers.

exploitationintermediatesessionlablabsstate

Session State 利用實驗室

利用 session management vulnerabilities in LLM applications to access other users data or escalate privileges.

intermediatebypassoutputlabstructuredlabs

Structured Output 安全繞過

Use JSON mode與structured output constraints to extract information that would be refused in free-form responses.

labsupply-chainauditdependenciessecurityintermediatehands-on

實驗室: 供應鏈 Audit

Audit an ML project's dependencies for vulnerabilities,covering model files,Python packages,container images,training data provenance.

labsupply-chainsecurityscanningdependenciesintermediate

實驗室: ML 供應鏈 Scan

動手實驗室,主題為auditing machine learning model dependencies,detecting malicious packages in ML pipelines,scanning model files for backdoors與supply chain threats.

labtoken-smugglingencoding

實驗室: Token Smuggling & Encoding 繞過

動手lab exploring how tokenization boundaries,token-level manipulations,encoding tricks can smuggle attack payloads past safety classifiers.

labtool-poisoningagentsindirect-injectionagentic-attacks

實驗室: 工具 Result 投毒

動手實驗室,主題為poisoning tool outputs,以重新導向 agent behavior by injecting malicious content through tool results.

labtranslationbypassmultilinguallow-resource

實驗室: Language Translation 繞過

動手lab exploiting language translation capabilities to bypass safety filters by submitting prompts in low-resource languages,code-switching,using transliteration tricks.

labprompt-injectiontokenizerunicode

實驗室: 進階 Token Smuggling via Unicode Normalization

利用 Unicode normalization differences between input validators與LLM tokenizers to bypass content filters與inject hidden instructions.

labagentsworkflowhijackingagentic-attacks

實驗室: Workflow Hijacking

動手實驗室,主題為redirecting multi-step agent workflows to perform unintended actions by manipulating intermediate states與control flow.

labaudioadversarialhands-on

實作：打造音訊對抗範例

使用 Python 音訊處理建立對抗音訊範例之實作，針對 Whisper 轉譯注入文字。

labmultimodalattack-chain

實驗室: Multi-Modal 攻擊 Chain

Hands-on lab for building and executing a multi-step attack chain that combines image injection, document exploitation, and text-based techniques against a multimodal AI system.

labvideoadversarialhands-on

實驗室：影片模型對抗性攻擊

使用 OpenCV 和 PyTorch 進行影格層面擾動，實作對抗性影片影格的實作實驗室。

labimage-injectionhands-onvlm

實驗室：製作基於圖像的注入

建立基於圖像的提示詞注入、對 VLM 進行測試，以及衡量不同注入技術成功率的實作實驗室。

operationslabsetupred-team

紅隊實驗室與營運

AI 紅隊演練的營運基礎：實驗室環境、證據處理、案件工作流程，以及專業 AI 安全評估的團隊管理。

labfederated-learninghands-onflowerbyzantinemodel-poisoningsimulation

實作:攻擊聯邦學習

動手實作實驗,在使用 Flower 框架模擬的聯邦學習環境中實作模型投毒攻擊:拜占庭攻擊、模型替換,並量測攻擊影響。

labquantizationhands-onjailbreaksafety-testingtransformers

實作:利用量化後的模型

動手實驗:比較不同量化等級下的攻擊成功率,在 FP16、INT8、INT4 上測試越獄,量測安全性退化,並設計量化感知的漏洞利用。