What is 實驗室: Implementing PAIR 攻擊?

Implement the PAIR (Prompt Automatic Iterative Refinement) algorithm where an attacker LLM iteratively refines jailbreak prompts against a target LLM until a successful attack is found.

What is 實驗室: TAP Algorithm Implementation?

Implement the TAP (Tree of 攻擊s with Pruning) algorithm that uses tree-based search over attack prompts with branch pruning to efficiently find jailbreaks.

What is 實驗室: Building a Production 紅隊 Harness?

Build a full-featured, production-quality red team harness with multi-model support, async testing, structured result storage, and HTML reporting.

What is 實驗室: Multi-模型 Comparative 紅隊演練?

Test the same attack suite across GPT-4, Claude, Llama, and Gemini. Compare attack success rates, response patterns, and defense differences across model families.

What is 實驗室: Multimodal 攻擊 Pipeline?

Build an automated multimodal attack pipeline that generates adversarial images, combines them with text prompts, and tests against vision-language models (VLMs).

What is 實驗室: 訓練 Data Extraction at Scale?

Extract memorized training data from language models using prefix-based extraction, divergence testing, and membership inference. Measure extraction rates and assess privacy risks.

What is 實驗室: Reasoning 模型利用ation?

攻擊 reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection techniques.

What is 實驗室: Continuous Automated 紅隊演練 Pipeline?

Build an end-to-end CART pipeline that continuously generates, executes, and scores adversarial attacks against LLM applications, with alerting and trend tracking.

What is 實驗室: Generating Adversarial Suffixes?

Implement the Greedy Coordinate Gradient (GCG) algorithm to generate adversarial suffixes that cause language models to comply with harmful requests by appending optimized token sequences.

What is 實驗室: Chaining Guardrail Bypasses?

進階 lab on identifying, isolating, and chaining multiple guardrail bypass techniques to defeat layered defense systems in production LLM applications.

進階 AI 紅隊實驗室

Advanced1 min readUpdated 2026-03-13

結合多種攻擊向量並需要精密工具使用的進階實驗室——PAIR/TAP 攻擊、對抗性後綴、微調後門與護欄繞過鏈。

labs advanced pair tap adversarial-suffix guardrail-bypass

進階實驗室要求你跨領域整合知識。每個實驗室通常有多條解答路徑，最優雅的解答需要技術的創意結合。

Learning Path

0/130 completed

~1876 min total130 lessons

1
實驗室: Implementing PAIR 攻擊advanced
Implement the PAIR (Prompt Automatic Iterative Refinement) algorithm where an attacker LLM iteratively refines jailbreak prompts against a target LLM until a successful attack is found.
10m
2
實驗室: TAP Algorithm Implementationadvanced
Implement the TAP (Tree of 攻擊s with Pruning) algorithm that uses tree-based search over attack prompts with branch pruning to efficiently find jailbreaks.
9m
3
實驗室: Building a Production 紅隊 Harnessadvanced
Build a full-featured, production-quality red team harness with multi-model support, async testing, structured result storage, and HTML reporting.
11m
4
實驗室: Multi-模型 Comparative 紅隊演練advanced
Test the same attack suite across GPT-4, Claude, Llama, and Gemini. Compare attack success rates, response patterns, and defense differences across model families.
10m
5
實驗室: Multimodal 攻擊 Pipelineadvanced
Build an automated multimodal attack pipeline that generates adversarial images, combines them with text prompts, and tests against vision-language models (VLMs).
9m
6
實驗室: 訓練 Data Extraction at Scaleadvanced
Extract memorized training data from language models using prefix-based extraction, divergence testing, and membership inference. Measure extraction rates and assess privacy risks.
10m
7
實驗室: Reasoning 模型利用ationadvanced
攻擊 reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection techniques.
10m
8
實驗室: Continuous Automated 紅隊演練 Pipelineadvanced
Build an end-to-end CART pipeline that continuously generates, executes, and scores adversarial attacks against LLM applications, with alerting and trend tracking.
11m
9
實驗室: Generating Adversarial Suffixesexpert
Implement the Greedy Coordinate Gradient (GCG) algorithm to generate adversarial suffixes that cause language models to comply with harmful requests by appending optimized token sequences.
11m
10
實驗室: Chaining Guardrail Bypassesadvanced
進階 lab on identifying, isolating, and chaining multiple guardrail bypass techniques to defeat layered defense systems in production LLM applications.
13m
11
實驗室: Inserting a Fine-Tuning Backdoorexpert
進階 lab demonstrating how fine-tuning can insert hidden backdoors into language models that activate on specific trigger phrases while maintaining normal behavior otherwise.
11m
12
實驗室: Federated Learning 投毒攻擊expert
Hands-on lab for understanding and simulating poisoning attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted gradient updates.
10m
13
實驗室: Audio Adversarial Examplesadvanced
Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
9m
14
實驗室: Cross-模型 Transfer 攻擊sadvanced
Test whether jailbreaks discovered on one language model transfer effectively to others, building a systematic methodology for cross-model vulnerability research.
10m
15
實驗室: Transfer 攻擊 Developmentadvanced
Hands-on lab for crafting adversarial prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.
12m
16
實驗室: Transfer 攻擊 Development (進階實驗室)advanced
Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.
13m
17
實驗室: 攻擊 Chain Constructionadvanced
Hands-on lab for chaining three or more distinct vulnerabilities into a complete exploit sequence that achieves objectives impossible with any single technique alone.
10m
18
實驗室: 模型安全 Comparisonadvanced
Systematically compare the safety posture of major language models using a standardized test suite, building quantitative security profiles for GPT-4, Claude, and Gemini.
8m
19
實驗室: Custom Test Harness for Specific Applicationsadvanced
Build a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.
15m
20
實驗室: Ensemble 攻擊sadvanced
Use multiple language models collaboratively to discover attack strategies that bypass any single model's defenses, leveraging model diversity for more effective red teaming.
10m
21
實驗室: Cloud AI 安全評量advanced
Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.
14m
22
實驗室: Cloud AI 評量advanced
Hands-on lab for conducting an end-to-end security assessment of a cloud-deployed AI system including infrastructure review, API testing, model security evaluation, and data flow analysis.
9m
23
實驗室: 模型 Routing 利用ationadvanced
攻擊 systems that route requests to different models based on complexity or content, exploiting routing logic to reach less-defended models or bypass safety filters.
9m
24
實驗室: PAIR 攻擊 Implementationadvanced
Build a complete Prompt Automatic Iterative Refinement system that uses an attacker LLM to automatically generate and refine jailbreak prompts against a target model.
9m
25
實驗室: Purple Team 練習advanced
Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.
14m
26
實驗室: Purple Team 練習 (進階實驗室)advanced
Hands-on lab for conducting simultaneous attack and defense operations against an AI system with real-time metrics tracking, adaptive defense deployment, and coordinated red-blue team workflows.
9m
27
實驗室: Build an AI Fuzzing Harnessadvanced
Create a systematic fuzzing framework for testing LLM boundaries, generating and mutating inputs to discover unexpected model behaviors and safety edge cases.
10m
28
實驗室: RAG Full Chain 攻擊advanced
Hands-on lab for executing a complete RAG attack chain from document injection through retrieval manipulation to data exfiltration, targeting every stage of the Retrieval-Augmented Generation pipeline.
10m
29
實驗室: MCP Full 利用ationadvanced
Hands-on lab for conducting a complete 模型 Context Protocol server compromise including tool enumeration, permission escalation, cross-tool attacks, and data exfiltration through MCP channels.
10m
30
實驗室: Safety Regression Testing at Scaleadvanced
Build automated pipelines that detect safety degradation across model versions, ensuring that updates and fine-tuning do not introduce new vulnerabilities or weaken existing protections.
9m
31
實驗室: 紅隊 Orchestrationadvanced
Build an orchestration system that coordinates multiple attack strategies simultaneously, managing parallel attack campaigns and synthesizing results into comprehensive risk assessments.
8m
32
實驗室: 進階 Multi-Turn 攻擊sadvanced
Hands-on lab for executing 20+ turn conversation steering strategies that gradually erode AI safety boundaries to extract protected information through patient, escalating dialogue.
12m
33
實驗室: Inference Server 利用ationadvanced
攻擊 vLLM, TGI, and Triton inference servers to discover information disclosure vulnerabilities, denial-of-service vectors, and configuration weaknesses in model serving infrastructure.
7m
34
實作：容器化模型突破advanced
探索自容器化 AI 應用逃逸至主機系統之技術，測試 ML 部署環境中之容器隔離邊界。
5m
35
實驗室: GPU Side-Channel 攻擊sadvanced
Demonstrate information leakage through GPU memory residuals and timing side channels, showing how shared GPU infrastructure can expose sensitive model data.
9m
36
實驗室: 模型 Serving Framework 攻擊sadvanced
利用 vulnerabilities in TensorFlow Serving, TorchServe, and Triton Inference Server, targeting model loading, API endpoints, and management interfaces.
8m
37
實驗室: AI Incident Investigationadvanced
Investigate logs and artifacts from a compromised AI system to reconstruct the attack chain, identify the vulnerability exploited, and determine the scope of the breach.
8m
38
實驗室: Backdoor Detection in Fine-Tuned 模型sadvanced
Analyze a fine-tuned language model to find and characterize an inserted backdoor, using behavioral probing, activation analysis, and statistical testing techniques.
8m
39
Implementing PAIR Automated 攻擊sadvanced
Implement the PAIR (Prompt Automatic Iterative Refinement) attack for automated jailbreaking.
18m
40
Tree of 攻擊s (TAP) Implementationadvanced
Build a TAP attack system that generates and prunes jailbreak attempts using tree search.
18m
41
Adversarial Suffix Optimization (GCG)advanced
Implement the Greedy Coordinate Gradient attack to generate adversarial suffixes.
18m
42
Fine-Tuning Backdoor Insertionadvanced
Insert a triggered backdoor during fine-tuning that activates on specific input patterns.
18m
43
Guardrail Bypass Ensembleadvanced
Build an ensemble attack system that combines multiple bypass techniques for robustness.
18m
44
Building a Custom 紅隊 Harnessadvanced
Build a complete red team testing harness with parallel execution, logging, and scoring.
18m
45
Multimodal Image Injectionadvanced
Embed adversarial text in images that triggers prompt injection in vision-language models.
18m
46
Cross-模型 Transfer 攻擊sadvanced
Develop attacks on open-source models that transfer to closed-source commercial APIs.
18m
47
代理工具 Chain 利用ationadvanced
Chain multiple tool calls in an agent system to achieve multi-step exploitation.
18m
48
實驗室: RLHF 訓練利用ationadvanced
利用 reward model weaknesses to craft inputs that score highly on safety metrics while achieving adversarial objectives.
16m
49
進階 RAG 投毒 Techniquesadvanced
Execute sophisticated RAG poisoning including gradient-guided document crafting.
18m
50
Reward 模型 Gaming (進階實驗室)advanced
Craft inputs that score highly on reward models while containing adversarial content.
18m
51
實驗室: 模型 Merging 安全 Analysisadvanced
Analyze security implications of model merging techniques and test for backdoor propagation through merged model weights.
16m
52
Differential Testing Across 模型sadvanced
Use differential testing to find behavior inconsistencies across model providers.
18m
53
實驗室: Quantization 安全 Testingadvanced
Test behavioral differences between full-precision and quantized models to discover quantization-induced vulnerabilities.
16m
54
Automated 防禦 Evaluation Frameworkadvanced
Build an automated framework to evaluate defensive measures across attack categories.
18m
55
實驗室: Distributed 訓練攻擊 Simulationadvanced
Simulate attacks on distributed training infrastructure including gradient poisoning and aggregation manipulation.
16m
56
Prompt Leakage via Log Probabilitiesadvanced
Extract system prompt information using token log probability analysis.
18m
57
實驗室: LLM Watermark Detection and Removaladvanced
Detect and analyze LLM text watermarks using statistical methods and test watermark removal through paraphrasing.
16m
58
Multi-Hop MCP Server 利用ationadvanced
Chain exploits across multiple MCP servers to achieve lateral movement in agent systems.
18m
59
實驗室: Representation Engineering for 安全advanced
Use representation engineering to analyze and manipulate internal model representations for security research.
16m
60
Embedding Adversarial Perturbationadvanced
Craft adversarial inputs that produce target embeddings for retrieval manipulation.
18m
61
實驗室: Constitutional AI Bypass Techniquesadvanced
Test and bypass Constitutional AI safety mechanisms by exploiting the critique-revision training methodology.
16m
62
Output Constraint 利用ationadvanced
利用 structured output requirements to force models into revealing restricted information.
18m
63
實驗室: Building an Automated 紅隊 Pipelineadvanced
Build a complete automated red teaming pipeline with attack generation, execution, scoring, and reporting.
16m
64
Safety Classifier Inversionadvanced
Reverse-engineer safety classifier decision boundaries through systematic probing.
18m
65
實驗室: 進階 Reward Hacking Techniquesadvanced
Develop reward hacking exploits that manipulate RLHF reward models through adversarial optimization.
16m
66
Multi-Turn Crescendo 攻擊advanced
Implement Microsoft's Crescendo attack pattern for gradual multi-turn jailbreaking.
18m
67
實驗室: Gradient-Based 攻擊 Implementationadvanced
Implement gradient-based adversarial attacks on open-weight models including PGD, FGSM adapted for text inputs.
16m
68
實驗室: 模型 Steering with Activation Vectorsadvanced
Use activation steering vectors to control model behavior without prompt modification for security testing.
16m
69
實驗室: Token-Level Adversarial Optimizationadvanced
Implement token-level adversarial optimization to discover minimal perturbations that bypass safety training.
16m
70
實驗室: Multi-Modal 攻擊 Chain Constructionadvanced
Construct attack chains that span text, image, and audio modalities to exploit cross-modal processing gaps.
16m
71
實驗室: Safety Classifier 模型 Extractionadvanced
Extract the decision boundary of safety classifiers through systematic probing to craft maximally evasive payloads.
16m
72
實驗室: Backdoor Persistence Through Safety 訓練advanced
Test whether fine-tuned backdoors persist through subsequent safety training rounds and RLHF alignment.
16m
73
實驗室: Building a 提示詞注入 Compileradvanced
Build a compiler that transforms high-level attack specifications into optimized prompt injection payloads.
16m
74
實驗室: Federated Learning 投毒攻擊sadvanced
Execute model poisoning attacks in a federated learning simulation by manipulating local model updates.
16m
75
實驗室: 越獄 Transferability Analysisadvanced
Analyze jailbreak transferability across model families to discover universal vulnerability patterns.
16m
76
實驗室: 工具 Chain Lateral Movementadvanced
Chain tool call exploits to achieve lateral movement across services connected to LLM agent systems.
16m
77
實驗室: Long-Term 記憶體 Extractionadvanced
Extract information from long-term agent memory stores through crafted queries and memory retrieval manipulation.
16m
78
實驗室: Alignment Boundary Probingadvanced
Systematically probe alignment boundaries to map the exact thresholds where safety training engages and disengages.
16m
79
實驗室: Sparse Attention 利用ationadvanced
利用 sparse attention patterns in long-context models to hide injection payloads in low-attention regions.
16m
80
實驗室: Inference-Time Compute 利用ationadvanced
利用 inference-time compute scaling to manipulate reasoning depth and resource consumption in thinking models.
16m
81
實驗室: 模型 Supply Chain 投毒advanced
Simulate model supply chain attacks by injecting backdoors into model weights distributed through public registries.
16m
82
實驗室: Evaluation Framework Gamingadvanced
Demonstrate how to game safety evaluation frameworks to produce artificially high safety scores while retaining vulnerabilities.
16m
83
實驗室: 訓練 Curriculum 投毒advanced
利用 training data ordering and curriculum learning to amplify the impact of small numbers of poisoned examples.
16m
84
實驗室: 進階紅隊 Orchestrationadvanced
Build an advanced red team orchestration system that coordinates multiple attack agents against a defended target.
16m
85
實驗室: Cross-Lingual Transfer 攻擊sadvanced
Develop attacks in one language that transfer to others by exploiting shared multilingual representation spaces.
16m
86
實驗室: Reasoning Trace Manipulationadvanced
Manipulate chain-of-thought reasoning traces to inject false premises and redirect model conclusions.
16m
87
實驗室: Neural Network Trojan Detectionadvanced
Implement and test neural network trojan detection methods including activation clustering and spectral analysis.
16m
88
AutoDAN Implementation and Testingadvanced
Implement the AutoDAN methodology for generating human-readable stealthy jailbreak prompts using gradient guidance.
15m
89
Representation Probing for 漏洞 Discoveryadvanced
Probe model internal representations to discover exploitable features and latent vulnerability patterns.
15m
90
Adversarial Reward 模型利用ationadvanced
Craft inputs that exploit reward model weaknesses to achieve high safety scores while containing harmful content.
15m
91
Multi-代理 Trust Boundary 利用ationadvanced
利用 trust boundaries between cooperating agents to escalate privileges and access restricted capabilities.
15m
92
A2A Protocol Task Injectionadvanced
Inject malicious tasks into 代理-to-代理 protocol communication channels to redirect multi-agent workflows.
15m
93
Fine-Tuning Alignment Removal 攻擊advanced
Use fine-tuning API access to systematically remove safety alignment with minimal training examples.
15m
94
Constitutional Classifier Bypassadvanced
Develop techniques to bypass Anthropic-style constitutional classifiers through adversarial input crafting.
15m
95
Embedding Inversion 攻擊 Implementationadvanced
Implement embedding inversion to recover original text from vector database embeddings.
15m
96
MCP Multi-Server Pivot Chainadvanced
Chain exploits across multiple MCP servers to achieve lateral movement and capability escalation in agent systems.
15m
97
GUI 代理 Visual Injection 攻擊advanced
Inject adversarial content into screenshots and UI elements processed by computer-use AI agents.
15m
98
訓練 Data Extraction from Production LLMsadvanced
Implement Carlini et al.'s techniques to extract memorized training data from production language model APIs.
15m
99
LLM Watermark Detection and Removaladvanced
Detect and remove statistical watermarks from LLM-generated text while preserving content quality.
15m
100
Safety Classifier Reverse Engineeringadvanced
Reverse-engineer a safety classifier's decision boundaries through systematic adversarial probing.
15m
101
代理 Workflow State Tamperingadvanced
Manipulate agent workflow state machines to skip validation steps and reach privileged execution paths.
15m
102
Cross-模型 GCG Transfer 攻擊sadvanced
Generate adversarial suffixes on open-source models and test their transferability to commercial APIs.
15m
103
LangChain CVE 利用ation 實驗室advanced
Reproduce and analyze LangChain CVEs including CVE-2023-29374 and CVE-2023-36258 in a safe lab environment.
15m
104
Differential 模型 Behavior Analysisadvanced
Use differential testing across model versions and providers to discover inconsistent safety behaviors.
15m
105
Audio Injection via Speech-to-Text 模型sadvanced
Craft adversarial audio that embeds prompt injection payloads when transcribed by speech-to-text models.
15m
106
HarmBench Custom 攻擊 Submissionadvanced
Develop and evaluate custom attack methods against the HarmBench standardized evaluation framework.
15m
107
RAG Access Control Bypassadvanced
Bypass document-level access controls in enterprise RAG systems through query manipulation and context injection.
15m
108
LoRA Backdoor Insertion 攻擊advanced
Insert triggered backdoors through LoRA fine-tuning that activate on specific input patterns while passing safety evals.
15m
109
Steganographic Payload Embedding in Imagesadvanced
Hide prompt injection payloads in images using steganographic techniques undetectable to human observers.
15m
110
Multi-Modal 攻擊 Chain Orchestrationadvanced
Orchestrate attacks across text, image, and document modalities to bypass per-modality safety filters.
15m
111
Attention Pattern Manipulationadvanced
Craft inputs that manipulate transformer attention patterns to prioritize adversarial content over safety instructions.
15m
112
代理記憶體 Injection for Persistent Accessadvanced
Inject persistent instructions into agent memory systems that survive across conversation sessions.
15m
113
AutoDAN Implementation 實驗室advanced
Implement the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.
17m
114
Promptfoo 紅隊 Test Suite Developmentadvanced
Build comprehensive red team test suites in Promptfoo with custom graders and multi-model targeting.
15m
115
Code 代理 Sandbox Escape Techniquesadvanced
Develop and test sandbox escape techniques against code execution environments in AI coding assistants.
15m
116
Representation Probing for Vulnerabilitiesadvanced
Probe internal model representations to identify exploitable features and develop representation-level attacks.
17m
117
Custom Safety Classifier 訓練advanced
Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.
18m
118
Quantization-Induced Safety Regression Testingadvanced
Test how model quantization (INT8, INT4, GPTQ) degrades safety alignment and introduces exploitable gaps.
15m
119
Gradient-指南d Data 投毒advanced
Use gradient information from open-source models to craft optimally poisoned training examples.
17m
120
Reasoning Trace 利用ation in CoT 模型sadvanced
利用 visible chain-of-thought reasoning traces in models like o1 and DeepSeek-R1 to manipulate outputs.
15m
121
Building a 紅隊 Results Dashboardadvanced
Build a real-time dashboard for tracking and visualizing red team campaign results across targets and techniques.
15m
122
Multi-Objective 攻擊 Optimizationadvanced
Optimize attack payloads for multiple simultaneous objectives: jailbreaking, data extraction, and defense evasion.
17m
123
模型 Distillation 安全實驗室advanced
Extract model capabilities through distillation techniques using only black-box API access.
17m
124
Reward 模型 Analysis 實驗室advanced
Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.
18m
125
Cross-Plugin 利用ation Chainadvanced
Chain exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.
17m
126
進階 RAG Manipulation 實驗室advanced
Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.
17m
127
Adversarial Robustness Evaluationadvanced
Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.
17m
128
Multimodal 攻擊 Chain 實驗室advanced
Chain attacks across text, image, and structured data modalities to exploit multimodal system vulnerabilities.
17m
129
防禦 vs 紅隊 War Gameadvanced
Alternate between attacking and defending an LLM application to develop skills in both offensive and defensive operations.
18m
130
Production Environment Simulation 實驗室advanced
Test attacks against a simulated production environment with realistic logging, monitoring, and alerting.
17m

Start Learning

Edit this page on GitHub

進階 AI 紅隊實驗室

Learning Path

Related articles

進階 AI 紅隊實驗室

Learning Path

Related articles