# advanced
標記為「advanced」的 326 篇文章
Agent Memory Poisoning
Techniques for injecting malicious content into agent memory systems -- conversation history, RAG stores, and vector databases -- to achieve persistent cross-session compromise.
Manipulating Reasoning Chains
Techniques for influencing an AI agent's chain-of-thought reasoning to steer its planning, decision-making, and tool selection toward attacker-desired outcomes.
Attacking Multi-Agent Systems
Exploitation techniques for multi-agent architectures including inter-agent injection, trust boundary violations, cascading compromises, and A2A protocol attacks.
Advanced Practice Exam
25-question practice exam covering advanced AI red team techniques: multimodal attacks, training pipeline exploitation, agentic system attacks, embedding manipulation, and fine-tuning security.
Practice Exam 2: Advanced AI Security
25-question advanced practice exam covering multimodal attacks, training pipeline security, cloud AI security, forensics, and governance.
Advanced AI Security Practice Exam 1
Advanced practice exam covering agentic exploitation, training attacks, and frontier research.
Advanced AI Security Practice Exam 2
Second advanced practice exam focusing on multimodal, cloud, and pipeline security.
Agentic AI Security Practice Exam 2
Advanced practice exam covering agent memory poisoning, workflow exploitation, and A2A protocol attacks.
Comprehensive Advanced Practice Exam
Full-spectrum advanced practice exam covering all major AI red teaming domains.
Advanced Defense Practice Exam
Practice exam on guardrail architecture, constitutional AI defenses, and behavioral monitoring implementation.
Advanced Governance Practice Exam (Assessment)
Advanced governance practice exam covering international regulation, audit methodologies, and organizational frameworks.
Advanced Prompt Injection Practice Exam
Practice exam on indirect injection, multi-step chains, encoding bypass, and semantic manipulation.
Advanced Cloud AI Security Assessment
15-question advanced assessment covering cloud AI attack surfaces across AWS, Azure, and GCP: guardrail bypass, knowledge base exploitation, managed identity abuse, model customization risks, and multi-cloud attack paths.
Advanced Tool Proficiency Assessment
Advanced assessment on Garak, PyRIT, HarmBench, and custom tool development proficiency.
Advanced Tradecraft Assessment
Assessment covering OPSEC, evasion, persistence, attribution avoidance, and professional red team operations.
Advanced Agentic Exploitation Assessment
Advanced assessment covering MCP exploitation chains, multi-agent attacks, and A2A protocol injection.
Advanced Cloud AI Security Assessment (Assessment)
Advanced assessment on multi-cloud AI security, IAM misconfigurations, and endpoint hardening.
Advanced Code Generation Security Assessment
Advanced assessment on autonomous coding agents, sandbox escapes, and supply chain attacks.
Advanced Embedding Security Assessment
Advanced assessment on embedding inversion, vector DB attacks, and multimodal embedding exploitation.
Advanced AI Forensics Assessment
Advanced assessment on model behavior forensics, training data breach analysis, and attack attribution.
Advanced Multimodal Attacks Assessment
Advanced assessment covering cross-modal attacks, steganographic injection, and multimodal defense bypass.
Advanced Training Pipeline Assessment
Advanced assessment on RLHF exploitation, DPO vulnerabilities, and federated learning attacks.
Advanced Certification Prep Guide
Study guide for advanced certification covering complex topics and hands-on preparation strategies.
Advanced Topics Study Guide
Study guide covering AI security research techniques, automation, forensics, emerging attack vectors, and tool development for advanced practitioners.
Capstone: Cloud AI Security Assessment
Assess AI deployment security across AWS, Azure, and GCP cloud platforms, producing a comprehensive cloud AI security assessment report.
Capstone: Defense System Implementation
Build a complete AI defense stack with input filtering, output monitoring, guardrails, rate limiting, and logging, then evaluate it against automated attacks.
Capstone: AI Incident Response Exercise
Respond to a simulated AI security incident through triage, investigation, containment, remediation, and post-mortem reporting.
Capstone: Open Source Contribution
Contribute to an open-source AI security project such as garak, PyRIT, or MITRE ATLAS, producing a merged PR or submitted issue with proof of concept.
Capstone: Training Pipeline Attack & Defense
Attack a model training pipeline through data poisoning and backdoor insertion, then build defenses to detect and prevent these attacks.
Capstone: Build an AI Security Scanner
Design and implement an automated AI security testing tool that supports prompt injection detection, jailbreak testing, and output analysis.
Fall 2026 CTF: Advanced Attack Research
An expert-level CTF requiring novel technique development, research-grade problem solving, and original vulnerability discovery in AI systems.
Seasonal CTF: Summer 2026 Advanced Threats
Summer 2026 CTF focused on advanced threat scenarios including supply chain attacks, multi-model exploitation, and infrastructure compromise.
Advanced Training Data Extraction
Advanced techniques for extracting memorized training data from language models.
Gradient-Based Attacks Explained
How gradients are used to craft adversarial inputs for LLMs — FGSM, PGD, and GCG attacks explained with accessible math and practical examples.
Scaling Laws, Emergence & Capability Jumps
How scaling laws predict model performance, why emergent capabilities create unpredictable security properties, and what sleeper capabilities and emergent misalignment mean for red teaming.
Advanced AI Red Team Labs
Advanced hands-on labs covering automated attack algorithms, production red team tooling, multi-model testing, and sophisticated exploitation techniques for experienced AI security practitioners.
Advanced RAG Manipulation Lab
Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.
Adversarial Robustness Evaluation
Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.
Lab: Alignment Boundary Probing
Systematically probe alignment boundaries to map the exact thresholds where safety training engages and disengages.
Lab: Attack Chain Construction
Hands-on lab for chaining three or more distinct vulnerabilities into a complete exploit sequence that achieves objectives impossible with any single technique alone.
Lab: Audio Adversarial Examples
Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
AutoDAN Implementation Lab
Implement the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.
Automated Defense Evaluation Framework
Build an automated framework to evaluate defensive measures across attack categories.
Lab: Building an Automated Red Team Pipeline
Build a complete automated red teaming pipeline with attack generation, execution, scoring, and reporting.
Lab: Backdoor Persistence Through Safety Training
Test whether fine-tuned backdoors persist through subsequent safety training rounds and RLHF alignment.
Lab: Cloud AI Security Assessment
Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.
Lab: Cloud AI Assessment
Hands-on lab for conducting an end-to-end security assessment of a cloud-deployed AI system including infrastructure review, API testing, model security evaluation, and data flow analysis.
Lab: Constitutional AI Bypass Techniques
Test and bypass Constitutional AI safety mechanisms by exploiting the critique-revision training methodology.
Lab: Cross-Lingual Transfer Attacks
Develop attacks in one language that transfer to others by exploiting shared multilingual representation spaces.
Cross-Model Transfer Attacks
Develop attacks on open-source models that transfer to closed-source commercial APIs.
Cross-Plugin Exploitation Chain
Chain exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.
Lab: Training Curriculum Poisoning
Exploit training data ordering and curriculum learning to amplify the impact of small numbers of poisoned examples.
Custom Safety Classifier Training
Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.
Lab: Custom Test Harness for Specific Applications
Build a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.
Defense vs Red Team War Game
Alternate between attacking and defending an LLM application to develop skills in both offensive and defensive operations.
Differential Testing Across Models
Use differential testing to find behavior inconsistencies across model providers.
Lab: Distributed Training Attack Simulation
Simulate attacks on distributed training infrastructure including gradient poisoning and aggregation manipulation.
Embedding Adversarial Perturbation
Craft adversarial inputs that produce target embeddings for retrieval manipulation.
Lab: Evaluation Framework Gaming
Demonstrate how to game safety evaluation frameworks to produce artificially high safety scores while retaining vulnerabilities.
Lab: Federated Learning Poisoning Attacks
Execute model poisoning attacks in a federated learning simulation by manipulating local model updates.
Fine-Tuning Backdoor Insertion
Insert a triggered backdoor during fine-tuning that activates on specific input patterns.
Lab: Gradient-Based Attack Implementation
Implement gradient-based adversarial attacks on open-weight models including PGD, FGSM adapted for text inputs.
Gradient-Guided Data Poisoning
Use gradient information from open-source models to craft optimally poisoned training examples.
Guardrail Bypass Ensemble
Build an ensemble attack system that combines multiple bypass techniques for robustness.
Lab: Inference-Time Compute Exploitation
Exploit inference-time compute scaling to manipulate reasoning depth and resource consumption in thinking models.
Lab: Jailbreak Transferability Analysis
Analyze jailbreak transferability across model families to discover universal vulnerability patterns.
Lab: MCP Full Exploitation
Hands-on lab for conducting a complete Model Context Protocol server compromise including tool enumeration, permission escalation, cross-tool attacks, and data exfiltration through MCP channels.
Multi-Hop MCP Server Exploitation
Chain exploits across multiple MCP servers to achieve lateral movement in agent systems.
Lab: Long-Term Memory Extraction
Extract information from long-term agent memory stores through crafted queries and memory retrieval manipulation.
Model Distillation Security Lab
Extract model capabilities through distillation techniques using only black-box API access.
Lab: Model Merging Security Analysis
Analyze security implications of model merging techniques and test for backdoor propagation through merged model weights.
Lab: Model Steering with Activation Vectors
Use activation steering vectors to control model behavior without prompt modification for security testing.
Lab: Multi-Modal Attack Chain Construction
Construct attack chains that span text, image, and audio modalities to exploit cross-modal processing gaps.
Multi-Objective Attack Optimization
Optimize attack payloads for multiple simultaneous objectives: jailbreaking, data extraction, and defense evasion.
Lab: Advanced Multi-Turn Attacks
Hands-on lab for executing 20+ turn conversation steering strategies that gradually erode AI safety boundaries to extract protected information through patient, escalating dialogue.
Multi-Turn Crescendo Attack
Implement Microsoft's Crescendo attack pattern for gradual multi-turn jailbreaking.
Multimodal Attack Chain Lab
Chain attacks across text, image, and structured data modalities to exploit multimodal system vulnerabilities.
Lab: Neural Network Trojan Detection
Implement and test neural network trojan detection methods including activation clustering and spectral analysis.
Output Constraint Exploitation
Exploit structured output requirements to force models into revealing restricted information.
Production Environment Simulation Lab
Test attacks against a simulated production environment with realistic logging, monitoring, and alerting.
Lab: Building a Prompt Injection Compiler
Build a compiler that transforms high-level attack specifications into optimized prompt injection payloads.
Prompt Leakage via Log Probabilities
Extract system prompt information using token log probability analysis.
Lab: Purple Team Exercise
Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.
Lab: Purple Team Exercise (Advanced Lab)
Hands-on lab for conducting simultaneous attack and defense operations against an AI system with real-time metrics tracking, adaptive defense deployment, and coordinated red-blue team workflows.
Lab: Quantization Security Testing
Test behavioral differences between full-precision and quantized models to discover quantization-induced vulnerabilities.
Advanced RAG Poisoning Techniques
Execute sophisticated RAG poisoning including gradient-guided document crafting.
Lab: RAG Full Chain Attack
Hands-on lab for executing a complete RAG attack chain from document injection through retrieval manipulation to data exfiltration, targeting every stage of the Retrieval-Augmented Generation pipeline.
Lab: Reasoning Trace Manipulation
Manipulate chain-of-thought reasoning traces to inject false premises and redirect model conclusions.
Lab: Advanced Red Team Orchestration
Build an advanced red team orchestration system that coordinates multiple attack agents against a defended target.
Lab: Representation Engineering for Security
Use representation engineering to analyze and manipulate internal model representations for security research.
Representation Probing for Vulnerabilities
Probe internal model representations to identify exploitable features and develop representation-level attacks.
Lab: Advanced Reward Hacking Techniques
Develop reward hacking exploits that manipulate RLHF reward models through adversarial optimization.
Reward Model Analysis Lab
Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.
Reward Model Gaming (Advanced Lab)
Craft inputs that score highly on reward models while containing adversarial content.
Lab: RLHF Training Exploitation
Exploit reward model weaknesses to craft inputs that score highly on safety metrics while achieving adversarial objectives.
Lab: Safety Classifier Model Extraction
Extract the decision boundary of safety classifiers through systematic probing to craft maximally evasive payloads.
Safety Classifier Inversion
Reverse-engineer safety classifier decision boundaries through systematic probing.
Lab: Sparse Attention Exploitation
Exploit sparse attention patterns in long-context models to hide injection payloads in low-attention regions.
Lab: Model Supply Chain Poisoning
Simulate model supply chain attacks by injecting backdoors into model weights distributed through public registries.
Lab: Token-Level Adversarial Optimization
Implement token-level adversarial optimization to discover minimal perturbations that bypass safety training.
Lab: Tool Chain Lateral Movement
Chain tool call exploits to achieve lateral movement across services connected to LLM agent systems.
Lab: Transfer Attack Development
Hands-on lab for crafting adversarial prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.
Lab: Transfer Attack Development (Advanced Lab)
Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.
A2A Protocol Task Injection
Inject malicious tasks into Agent-to-Agent protocol communication channels to redirect multi-agent workflows.
Agent Workflow State Tampering
Manipulate agent workflow state machines to skip validation steps and reach privileged execution paths.
Attention Pattern Manipulation
Craft inputs that manipulate transformer attention patterns to prioritize adversarial content over safety instructions.
Audio Injection via Speech-to-Text Models
Craft adversarial audio that embeds prompt injection payloads when transcribed by speech-to-text models.
AutoDAN Implementation and Testing
Implement the AutoDAN methodology for generating human-readable stealthy jailbreak prompts using gradient guidance.
Constitutional Classifier Bypass
Develop techniques to bypass Anthropic-style constitutional classifiers through adversarial input crafting.
Cross-Model GCG Transfer Attacks
Generate adversarial suffixes on open-source models and test their transferability to commercial APIs.
Building a Red Team Results Dashboard
Build a real-time dashboard for tracking and visualizing red team campaign results across targets and techniques.
Differential Model Behavior Analysis
Use differential testing across model versions and providers to discover inconsistent safety behaviors.
Embedding Inversion Attack Implementation
Implement embedding inversion to recover original text from vector database embeddings.
Fine-Tuning Alignment Removal Attack
Use fine-tuning API access to systematically remove safety alignment with minimal training examples.
GUI Agent Visual Injection Attack
Inject adversarial content into screenshots and UI elements processed by computer-use AI agents.
HarmBench Custom Attack Submission
Develop and evaluate custom attack methods against the HarmBench standardized evaluation framework.
LangChain CVE Exploitation Lab
Reproduce and analyze LangChain CVEs including CVE-2023-29374 and CVE-2023-36258 in a safe lab environment.
LoRA Backdoor Insertion Attack
Insert triggered backdoors through LoRA fine-tuning that activate on specific input patterns while passing safety evals.
MCP Multi-Server Pivot Chain
Chain exploits across multiple MCP servers to achieve lateral movement and capability escalation in agent systems.
Agent Memory Injection for Persistent Access
Inject persistent instructions into agent memory systems that survive across conversation sessions.
Multi-Agent Trust Boundary Exploitation
Exploit trust boundaries between cooperating agents to escalate privileges and access restricted capabilities.
Multi-Modal Attack Chain Orchestration
Orchestrate attacks across text, image, and document modalities to bypass per-modality safety filters.
Promptfoo Red Team Test Suite Development
Build comprehensive red team test suites in Promptfoo with custom graders and multi-model targeting.
Quantization-Induced Safety Regression Testing
Test how model quantization (INT8, INT4, GPTQ) degrades safety alignment and introduces exploitable gaps.
RAG Access Control Bypass
Bypass document-level access controls in enterprise RAG systems through query manipulation and context injection.
Reasoning Trace Exploitation in CoT Models
Exploit visible chain-of-thought reasoning traces in models like o1 and DeepSeek-R1 to manipulate outputs.
Representation Probing for Vulnerability Discovery
Probe model internal representations to discover exploitable features and latent vulnerability patterns.
Adversarial Reward Model Exploitation
Craft inputs that exploit reward model weaknesses to achieve high safety scores while containing harmful content.
Safety Classifier Reverse Engineering
Reverse-engineer a safety classifier's decision boundaries through systematic adversarial probing.
Code Agent Sandbox Escape Techniques
Develop and test sandbox escape techniques against code execution environments in AI coding assistants.
Steganographic Payload Embedding in Images
Hide prompt injection payloads in images using steganographic techniques undetectable to human observers.
Training Data Extraction from Production LLMs
Implement Carlini et al.'s techniques to extract memorized training data from production language model APIs.
LLM Watermark Detection and Removal
Detect and remove statistical watermarks from LLM-generated text while preserving content quality.
Lab: LLM Watermark Detection and Removal
Detect and analyze LLM text watermarks using statistical methods and test watermark removal through paraphrasing.
CTF: Cloud AI Heist
Extract secrets from a cloud-deployed AI application by exploiting misconfigurations, SSRF, metadata endpoints, and model-level vulnerabilities in a realistic cloud environment.
CTF: Defense Breaker
Bypass progressively sophisticated defense mechanisms protecting an AI system. Each level introduces a new defense layer while retaining all previous defenses, testing your ability to adapt and chain bypass techniques.
CTF: Fine-Tune Detective
Detect backdoors in fine-tuned language models through behavioral analysis, weight inspection, and activation pattern examination. Practice the forensic techniques needed to identify compromised models before deployment.
CTF: AI Forensics Investigation
Analyze logs, model outputs, and system artifacts to reconstruct an AI security incident. Develop forensic analysis skills for AI-specific attack patterns, data exfiltration traces, and adversarial prompt detection.
CTF: Supply Chain Attack
Find and exploit vulnerabilities in an ML supply chain including compromised dependencies, poisoned models, backdoored training data, and malicious model files. Practice ML-specific supply chain security assessment.
Advanced Encoding Chain Attacks
Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
Lab: Advanced Function Calling Exploitation
Exploit advanced function calling patterns including nested calls, parallel execution, and schema manipulation.
Lab: Advanced RAG Poisoning
Hands-on lab for crafting documents that reliably get retrieved and influence RAG responses for specific target queries.
Advanced Token Smuggling
Use Unicode homoglyphs, zero-width characters, and tokenizer quirks to smuggle payloads.
Simulation: AI Bug Bounty
Find and report vulnerabilities in a simulated AI bug bounty program, practicing professional vulnerability disclosure and bounty-eligible reporting.
Simulation: Red vs Blue
Competitive exercise where teams alternate between attacking and defending an AI application, scoring points for successful attacks and effective defenses.
Cross-Context Injection (Prompt Injection)
Prompt injection techniques that persist across context boundaries: surviving conversation resets, session switches, memory boundaries, and multi-agent handoffs.
Defense Evasion
Advanced techniques for bypassing safety filters, content classifiers, guardrails, and detection systems deployed to protect LLM applications.
Continuous Red Teaming Programs
Designing and operating ongoing AI red team programs with automated testing pipelines, metrics dashboards, KPI frameworks, alert-driven assessments, and integration with CI/CD and model deployment workflows.
Purple Teaming for AI
Collaborative attack-defense exercises for AI systems: structuring purple team engagements, real-time knowledge transfer, joint attack simulation, and measuring defensive improvement through iterative testing.
Advanced Training Attack Vectors
Cutting-edge training attacks: federated learning poisoning, model merging exploits, distributed training vulnerabilities, emergent capability risks, and synthetic data pipeline attacks.
Composite Attack Chaining
Combining multiple prompt injection techniques into compound attacks that defeat layered defenses, building attack chains that leverage the strengths of each individual technique.
Advanced Context Manipulation Walkthrough
Manipulate conversation context through history injection, context overflow, and attention weight exploitation.
Instruction Hierarchy Bypass
Advanced techniques to bypass instruction priority and hierarchy enforcement in language models, exploiting conflicts between system, user, and assistant-level directives.
Multi-Turn Progressive Injection
Gradually escalating prompt injection across conversation turns to build compliance, using psychological techniques like foot-in-the-door and norm erosion.
Advanced Prompt Leaking Walkthrough
Advanced techniques for extracting system prompts including iterative reconstruction and side-channel methods.
Recursive Injection Chains
Creating self-reinforcing injection chains that amplify across conversation turns, building compound prompts where each step strengthens the next injection's effectiveness.
Instruction Hierarchy Enforcement (Defense Walkthrough)
Step-by-step walkthrough for enforcing instruction priority in LLM applications, ensuring system-level instructions always take precedence over user inputs through privilege separation, instruction tagging, and validation layers.
NeMo Guardrails Advanced Configuration
Advanced NeMo Guardrails configuration including custom actions, knowledge base integration, and multi-model pipelines.
Garak Advanced Configuration Guide
Deep dive into Garak's configuration system including custom probes, detectors, and generators.
Writing Custom Garak Probes for Novel Attack Vectors
Advanced walkthrough for building custom Garak probes that target novel and emerging attack vectors, covering probe architecture, payload generation, detector pairing, and integration into automated scanning pipelines.
Building Custom Garak Detectors
Advanced walkthrough on creating custom garak detectors for specific success criteria, including regex-based detectors, ML-based classifiers, multi-signal scoring, and integration with external evaluation services.
Writing Garak Generator Plugins for Custom API Targets
Advanced walkthrough on writing garak generator plugins to connect to custom API endpoints, proprietary model servers, and non-standard inference interfaces for vulnerability scanning.
Setting Up Garak Probes for MCP Tool Interactions
Advanced walkthrough on configuring garak probes that target Model Context Protocol (MCP) tool interactions, testing for tool misuse, privilege escalation through tools, and data exfiltration via tool calls.
代理 記憶體 投毒
Techniques for injecting malicious content into agent memory systems -- conversation history, RAG stores, and vector databases -- to achieve persistent cross-session compromise.
操弄推理鏈
影響 AI 代理的 chain-of-thought 推理,將其規劃、決策與工具選擇導向攻擊者期望結果的技術。
攻擊多代理系統
針對多代理架構的攻擊技術,涵蓋代理間注入、信任邊界破壞、連鎖入侵,以及 A2A 協定攻擊。
進階 Practice Exam
25-question practice exam covering advanced AI red team techniques: multimodal attacks, training pipeline exploitation, agentic system attacks, embedding manipulation, and fine-tuning security.
Practice Exam 2: 進階 AI 安全
25-question advanced practice exam covering multimodal attacks, training pipeline security, cloud AI security, forensics, and governance.
進階 AI 安全 Practice Exam 1
進階 practice exam covering agentic exploitation, training attacks, and frontier research.
進階 AI 安全 Practice Exam 2
Second advanced practice exam focusing on multimodal, cloud, and pipeline security.
代理式 AI 安全 Practice Exam 2
進階 practice exam covering agent memory poisoning, workflow exploitation, and A2A protocol attacks.
Comprehensive 進階 Practice Exam
Full-spectrum advanced practice exam covering all major AI red teaming domains.
進階 防禦 Practice Exam
Practice exam on guardrail architecture, constitutional AI defenses, and behavioral monitoring implementation.
進階 Governance Practice Exam (評量)
進階 governance practice exam covering international regulation, audit methodologies, and organizational frameworks.
進階 提示詞注入 Practice Exam
Practice exam on indirect injection, multi-step chains, encoding bypass, and semantic manipulation.
進階 Cloud AI 安全 評量
15-question advanced assessment covering cloud AI attack surfaces across AWS, Azure, and GCP: guardrail bypass, knowledge base exploitation, managed identity abuse, model customization risks, and multi-cloud attack paths.
進階 工具 Proficiency 評量
進階 assessment on Garak, PyRIT, HarmBench, and custom tool development proficiency.
進階 Tradecraft 評量
評量 covering OPSEC, evasion, persistence, attribution avoidance, and professional red team operations.
進階 代理式 利用ation 評量
進階 assessment covering MCP exploitation chains, multi-agent attacks, and A2A protocol injection.
進階 Cloud AI 安全 評量 (評量)
進階 assessment on multi-cloud AI security, IAM misconfigurations, and endpoint hardening.
進階 Code Generation 安全 評量
進階 assessment on autonomous coding agents, sandbox escapes, and supply chain attacks.
進階 Embedding 安全 評量
進階 assessment on embedding inversion, vector DB attacks, and multimodal embedding exploitation.
進階 AI Forensics 評量
進階 assessment on model behavior forensics, training data breach analysis, and attack attribution.
進階 Multimodal 攻擊s 評量
進階 assessment covering cross-modal attacks, steganographic injection, and multimodal defense bypass.
進階 訓練 Pipeline 評量
進階 assessment on RLHF exploitation, DPO vulnerabilities, and federated learning attacks.
進階 Certification Prep 指南
Study guide for advanced certification covering complex topics and hands-on preparation strategies.
進階 Topics Study 指南
Study guide covering AI security research techniques, automation, forensics, emerging attack vectors, and tool development for advanced practitioners.
Capstone: Cloud AI 安全 評量
Assess AI deployment security across AWS, Azure, and GCP cloud platforms, producing a comprehensive cloud AI security assessment report.
Capstone: 防禦 System Implementation
Build a complete AI defense stack with input filtering, output monitoring, guardrails, rate limiting, and logging, then evaluate it against automated attacks.
Capstone: AI Incident Response 練習
Respond to a simulated AI security incident through triage, investigation, containment, remediation, and post-mortem reporting.
Capstone: Open Source Contribution
Contribute to an open-source AI security project such as garak, PyRIT, or MITRE ATLAS, producing a merged PR or submitted issue with proof of concept.
Capstone: 訓練 Pipeline 攻擊 & 防禦
攻擊 a model training pipeline through data poisoning and backdoor insertion, then build defenses to detect and prevent these attacks.
Capstone: Build an AI 安全 Scanner
Design and implement an automated AI security testing tool that supports prompt injection detection, jailbreak testing, and output analysis.
2026 秋季 CTF:進階攻擊研究
需新穎技術發展、研究級問題解決與 AI 系統中原創漏洞發現之專家級 CTF。
Seasonal CTF: Summer 2026 進階 Threats
Summer 2026 CTF focused on advanced threat scenarios including supply chain attacks, multi-model exploitation, and infrastructure compromise.
進階 訓練 Data Extraction
進階 techniques for extracting memorized training data from language models.
Gradient-Based 攻擊s Explained
How gradients are used to craft adversarial inputs for LLMs — FGSM, PGD, and GCG attacks explained with accessible math and practical examples.
縮放定律、湧現與能力躍升
縮放定律如何預測模型效能、湧現能力為何造成不可預期的安全特性,以及沉睡能力與湧現式對齊失誤對紅隊的意涵。
進階 AI 紅隊實驗室
結合多種攻擊向量並需要精密工具使用的進階實驗室——PAIR/TAP 攻擊、對抗性後綴、微調後門與護欄繞過鏈。
進階 RAG Manipulation 實驗室
Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.
Adversarial Robustness Evaluation
Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.
實驗室: Alignment Boundary Probing
Systematically probe alignment boundaries to map the exact thresholds where safety training engages and disengages.
實驗室: 攻擊 Chain Construction
Hands-on lab for chaining three or more distinct vulnerabilities into a complete exploit sequence that achieves objectives impossible with any single technique alone.
實驗室: Audio Adversarial Examples
Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
AutoDAN Implementation 實驗室
Implement the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.
Automated 防禦 Evaluation Framework
Build an automated framework to evaluate defensive measures across attack categories.
實驗室: Building an Automated 紅隊 Pipeline
Build a complete automated red teaming pipeline with attack generation, execution, scoring, and reporting.
實驗室: Backdoor Persistence Through Safety 訓練
Test whether fine-tuned backdoors persist through subsequent safety training rounds and RLHF alignment.
實驗室: Cloud AI 安全 評量
Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.
實驗室: Cloud AI 評量
Hands-on lab for conducting an end-to-end security assessment of a cloud-deployed AI system including infrastructure review, API testing, model security evaluation, and data flow analysis.
實驗室: Constitutional AI Bypass Techniques
Test and bypass Constitutional AI safety mechanisms by exploiting the critique-revision training methodology.
實驗室: Cross-Lingual Transfer 攻擊s
Develop attacks in one language that transfer to others by exploiting shared multilingual representation spaces.
Cross-模型 Transfer 攻擊s
Develop attacks on open-source models that transfer to closed-source commercial APIs.
Cross-Plugin 利用ation Chain
Chain exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.
實驗室: 訓練 Curriculum 投毒
利用 training data ordering and curriculum learning to amplify the impact of small numbers of poisoned examples.
Custom Safety Classifier 訓練
Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.
實驗室: Custom Test Harness for Specific Applications
Build a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.
防禦 vs 紅隊 War Game
Alternate between attacking and defending an LLM application to develop skills in both offensive and defensive operations.
Differential Testing Across 模型s
Use differential testing to find behavior inconsistencies across model providers.
實驗室: Distributed 訓練 攻擊 Simulation
Simulate attacks on distributed training infrastructure including gradient poisoning and aggregation manipulation.
Embedding Adversarial Perturbation
Craft adversarial inputs that produce target embeddings for retrieval manipulation.
實驗室: Evaluation Framework Gaming
Demonstrate how to game safety evaluation frameworks to produce artificially high safety scores while retaining vulnerabilities.
實驗室: Federated Learning 投毒 攻擊s
Execute model poisoning attacks in a federated learning simulation by manipulating local model updates.
Fine-Tuning Backdoor Insertion
Insert a triggered backdoor during fine-tuning that activates on specific input patterns.
實驗室: Gradient-Based 攻擊 Implementation
Implement gradient-based adversarial attacks on open-weight models including PGD, FGSM adapted for text inputs.
Gradient-指南d Data 投毒
Use gradient information from open-source models to craft optimally poisoned training examples.
Guardrail Bypass Ensemble
Build an ensemble attack system that combines multiple bypass techniques for robustness.
實驗室: Inference-Time Compute 利用ation
利用 inference-time compute scaling to manipulate reasoning depth and resource consumption in thinking models.
實驗室: 越獄 Transferability Analysis
Analyze jailbreak transferability across model families to discover universal vulnerability patterns.
實驗室: MCP Full 利用ation
Hands-on lab for conducting a complete 模型 Context Protocol server compromise including tool enumeration, permission escalation, cross-tool attacks, and data exfiltration through MCP channels.
Multi-Hop MCP Server 利用ation
Chain exploits across multiple MCP servers to achieve lateral movement in agent systems.
實驗室: Long-Term 記憶體 Extraction
Extract information from long-term agent memory stores through crafted queries and memory retrieval manipulation.
模型 Distillation 安全 實驗室
Extract model capabilities through distillation techniques using only black-box API access.
實驗室: 模型 Merging 安全 Analysis
Analyze security implications of model merging techniques and test for backdoor propagation through merged model weights.
實驗室: 模型 Steering with Activation Vectors
Use activation steering vectors to control model behavior without prompt modification for security testing.
實驗室: Multi-Modal 攻擊 Chain Construction
Construct attack chains that span text, image, and audio modalities to exploit cross-modal processing gaps.
Multi-Objective 攻擊 Optimization
Optimize attack payloads for multiple simultaneous objectives: jailbreaking, data extraction, and defense evasion.
實驗室: 進階 Multi-Turn 攻擊s
Hands-on lab for executing 20+ turn conversation steering strategies that gradually erode AI safety boundaries to extract protected information through patient, escalating dialogue.
Multi-Turn Crescendo 攻擊
Implement Microsoft's Crescendo attack pattern for gradual multi-turn jailbreaking.
Multimodal 攻擊 Chain 實驗室
Chain attacks across text, image, and structured data modalities to exploit multimodal system vulnerabilities.
實驗室: Neural Network Trojan Detection
Implement and test neural network trojan detection methods including activation clustering and spectral analysis.
Output Constraint 利用ation
利用 structured output requirements to force models into revealing restricted information.
Production Environment Simulation 實驗室
Test attacks against a simulated production environment with realistic logging, monitoring, and alerting.
實驗室: Building a 提示詞注入 Compiler
Build a compiler that transforms high-level attack specifications into optimized prompt injection payloads.
Prompt Leakage via Log Probabilities
Extract system prompt information using token log probability analysis.
實驗室: Purple Team 練習
Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.
實驗室: Purple Team 練習 (進階 實驗室)
Hands-on lab for conducting simultaneous attack and defense operations against an AI system with real-time metrics tracking, adaptive defense deployment, and coordinated red-blue team workflows.
實驗室: Quantization 安全 Testing
Test behavioral differences between full-precision and quantized models to discover quantization-induced vulnerabilities.
進階 RAG 投毒 Techniques
Execute sophisticated RAG poisoning including gradient-guided document crafting.
實驗室: RAG Full Chain 攻擊
Hands-on lab for executing a complete RAG attack chain from document injection through retrieval manipulation to data exfiltration, targeting every stage of the Retrieval-Augmented Generation pipeline.
實驗室: Reasoning Trace Manipulation
Manipulate chain-of-thought reasoning traces to inject false premises and redirect model conclusions.
實驗室: 進階 紅隊 Orchestration
Build an advanced red team orchestration system that coordinates multiple attack agents against a defended target.
實驗室: Representation Engineering for 安全
Use representation engineering to analyze and manipulate internal model representations for security research.
Representation Probing for Vulnerabilities
Probe internal model representations to identify exploitable features and develop representation-level attacks.
實驗室: 進階 Reward Hacking Techniques
Develop reward hacking exploits that manipulate RLHF reward models through adversarial optimization.
Reward 模型 Analysis 實驗室
Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.
Reward 模型 Gaming (進階 實驗室)
Craft inputs that score highly on reward models while containing adversarial content.
實驗室: RLHF 訓練 利用ation
利用 reward model weaknesses to craft inputs that score highly on safety metrics while achieving adversarial objectives.
實驗室: Safety Classifier 模型 Extraction
Extract the decision boundary of safety classifiers through systematic probing to craft maximally evasive payloads.
Safety Classifier Inversion
Reverse-engineer safety classifier decision boundaries through systematic probing.
實驗室: Sparse Attention 利用ation
利用 sparse attention patterns in long-context models to hide injection payloads in low-attention regions.
實驗室: 模型 Supply Chain 投毒
Simulate model supply chain attacks by injecting backdoors into model weights distributed through public registries.
實驗室: Token-Level Adversarial Optimization
Implement token-level adversarial optimization to discover minimal perturbations that bypass safety training.
實驗室: 工具 Chain Lateral Movement
Chain tool call exploits to achieve lateral movement across services connected to LLM agent systems.
實驗室: Transfer 攻擊 Development
Hands-on lab for crafting adversarial prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.
實驗室: Transfer 攻擊 Development (進階 實驗室)
Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.
A2A Protocol Task Injection
Inject malicious tasks into 代理-to-代理 protocol communication channels to redirect multi-agent workflows.
代理 Workflow State Tampering
Manipulate agent workflow state machines to skip validation steps and reach privileged execution paths.
Attention Pattern Manipulation
Craft inputs that manipulate transformer attention patterns to prioritize adversarial content over safety instructions.
Audio Injection via Speech-to-Text 模型s
Craft adversarial audio that embeds prompt injection payloads when transcribed by speech-to-text models.
AutoDAN Implementation and Testing
Implement the AutoDAN methodology for generating human-readable stealthy jailbreak prompts using gradient guidance.
Constitutional Classifier Bypass
Develop techniques to bypass Anthropic-style constitutional classifiers through adversarial input crafting.
Cross-模型 GCG Transfer 攻擊s
Generate adversarial suffixes on open-source models and test their transferability to commercial APIs.
Building a 紅隊 Results Dashboard
Build a real-time dashboard for tracking and visualizing red team campaign results across targets and techniques.
Differential 模型 Behavior Analysis
Use differential testing across model versions and providers to discover inconsistent safety behaviors.
Embedding Inversion 攻擊 Implementation
Implement embedding inversion to recover original text from vector database embeddings.
Fine-Tuning Alignment Removal 攻擊
Use fine-tuning API access to systematically remove safety alignment with minimal training examples.
GUI 代理 Visual Injection 攻擊
Inject adversarial content into screenshots and UI elements processed by computer-use AI agents.
HarmBench Custom 攻擊 Submission
Develop and evaluate custom attack methods against the HarmBench standardized evaluation framework.
LangChain CVE 利用ation 實驗室
Reproduce and analyze LangChain CVEs including CVE-2023-29374 and CVE-2023-36258 in a safe lab environment.
LoRA Backdoor Insertion 攻擊
Insert triggered backdoors through LoRA fine-tuning that activate on specific input patterns while passing safety evals.
MCP Multi-Server Pivot Chain
Chain exploits across multiple MCP servers to achieve lateral movement and capability escalation in agent systems.
代理 記憶體 Injection for Persistent Access
Inject persistent instructions into agent memory systems that survive across conversation sessions.
Multi-代理 Trust Boundary 利用ation
利用 trust boundaries between cooperating agents to escalate privileges and access restricted capabilities.
Multi-Modal 攻擊 Chain Orchestration
Orchestrate attacks across text, image, and document modalities to bypass per-modality safety filters.
Promptfoo 紅隊 Test Suite Development
Build comprehensive red team test suites in Promptfoo with custom graders and multi-model targeting.
Quantization-Induced Safety Regression Testing
Test how model quantization (INT8, INT4, GPTQ) degrades safety alignment and introduces exploitable gaps.
RAG Access Control Bypass
Bypass document-level access controls in enterprise RAG systems through query manipulation and context injection.
Reasoning Trace 利用ation in CoT 模型s
利用 visible chain-of-thought reasoning traces in models like o1 and DeepSeek-R1 to manipulate outputs.
Representation Probing for 漏洞 Discovery
Probe model internal representations to discover exploitable features and latent vulnerability patterns.
Adversarial Reward 模型 利用ation
Craft inputs that exploit reward model weaknesses to achieve high safety scores while containing harmful content.
Safety Classifier Reverse Engineering
Reverse-engineer a safety classifier's decision boundaries through systematic adversarial probing.
Code 代理 Sandbox Escape Techniques
Develop and test sandbox escape techniques against code execution environments in AI coding assistants.
Steganographic Payload Embedding in Images
Hide prompt injection payloads in images using steganographic techniques undetectable to human observers.
訓練 Data Extraction from Production LLMs
Implement Carlini et al.'s techniques to extract memorized training data from production language model APIs.
LLM Watermark Detection and Removal
Detect and remove statistical watermarks from LLM-generated text while preserving content quality.
實驗室: LLM Watermark Detection and Removal
Detect and analyze LLM text watermarks using statistical methods and test watermark removal through paraphrasing.
CTF:雲端 AI 劫案
透過利用組態錯誤、SSRF、metadata 端點與模型層漏洞,從部署於雲端的 AI 應用中擷取機密。
CTF:Defense Breaker
繞過逐步增強、保護 AI 系統的防禦機制。每一關加入新的防禦層同時保留所有先前防禦,考驗你適應與串接繞過技術的能力。
CTF:Fine-Tune 偵探
透過行為分析、權重檢視與激活模式檢查,偵測微調語言模型中的後門。練習於部署前辨識被汙染模型所需的鑑識技術。
CTF:AI 鑑識調查
分析日誌、模型輸出與系統產物以重建 AI 安全事件。發展針對 AI 特有攻擊模式、資料外洩痕跡,與對抗提示偵測之鑑識分析技能。
CTF:供應鏈攻擊
尋找並利用 ML 供應鏈漏洞,包括遭入侵相依、被投毒模型、被植後門訓練資料與惡意模型檔。練習 ML 特有的供應鏈安全評估。
進階 Encoding Chain 攻擊s
Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
實驗室: 進階 Function Calling 利用ation
利用 advanced function calling patterns including nested calls, parallel execution, and schema manipulation.
實驗室: 進階 RAG 投毒
Hands-on lab for crafting documents that reliably get retrieved and influence RAG responses for specific target queries.
進階 Token Smuggling
Use Unicode homoglyphs, zero-width characters, and tokenizer quirks to smuggle payloads.
Simulation: AI Bug Bounty
Find and report vulnerabilities in a simulated AI bug bounty program, practicing professional vulnerability disclosure and bounty-eligible reporting.
Simulation: Red vs Blue
Competitive exercise where teams alternate between attacking and defending an AI application, scoring points for successful attacks and effective defenses.
Cross-Context Injection (提示詞注入)
Prompt injection techniques that persist across context boundaries: surviving conversation resets, session switches, memory boundaries, and multi-agent handoffs.
防禦規避
繞過為保護大型語言模型應用程式而部署之安全過濾器、內容分類器、護欄與偵測系統的進階技術。
Continuous 紅隊演練 Programs
Designing and operating ongoing AI red team programs with automated testing pipelines, metrics dashboards, KPI frameworks, alert-driven assessments, and integration with CI/CD and model deployment workflows.
Purple Teaming for AI
Collaborative attack-defense exercises for AI systems: structuring purple team engagements, real-time knowledge transfer, joint attack simulation, and measuring defensive improvement through iterative testing.
進階訓練漏洞
AI 訓練中的進階安全威脅——涵蓋聯邦學習攻擊、模型合併風險、水印移除、合成資料投毒、遺忘攻擊與持續學習漏洞。
Composite 攻擊 Chaining
Combining multiple prompt injection techniques into compound attacks that defeat layered defenses, building attack chains that leverage the strengths of each individual technique.
進階 Context Manipulation 導覽
Manipulate conversation context through history injection, context overflow, and attention weight exploitation.
Instruction Hierarchy Bypass
進階 techniques to bypass instruction priority and hierarchy enforcement in language models, exploiting conflicts between system, user, and assistant-level directives.
Multi-Turn Progressive Injection
Gradually escalating prompt injection across conversation turns to build compliance, using psychological techniques like foot-in-the-door and norm erosion.
進階 Prompt Leaking 導覽
進階 techniques for extracting system prompts including iterative reconstruction and side-channel methods.
Recursive Injection Chains
Creating self-reinforcing injection chains that amplify across conversation turns, building compound prompts where each step strengthens the next injection's effectiveness.
Instruction Hierarchy Enforcement (防禦 導覽)
Step-by-step walkthrough for enforcing instruction priority in LLM applications, ensuring system-level instructions always take precedence over user inputs through privilege separation, instruction tagging, and validation layers.
NeMo Guardrails 進階 Configuration
進階 NeMo Guardrails configuration including custom actions, knowledge base integration, and multi-model pipelines.
Garak 進階 Configuration 指南
Deep dive into Garak's configuration system including custom probes, detectors, and generators.
Writing Custom Garak Probes for Novel 攻擊 Vectors
進階 walkthrough for building custom Garak probes that target novel and emerging attack vectors, covering probe architecture, payload generation, detector pairing, and integration into automated scanning pipelines.
Building Custom Garak Detectors
進階 walkthrough on creating custom garak detectors for specific success criteria, including regex-based detectors, ML-based classifiers, multi-signal scoring, and integration with external evaluation services.
Writing Garak Generator Plugins for Custom API Targets
進階 walkthrough on writing garak generator plugins to connect to custom API endpoints, proprietary model servers, and non-standard inference interfaces for vulnerability scanning.
Setting Up Garak Probes for MCP 工具 Interactions
進階 walkthrough on configuring garak probes that target 模型 Context Protocol (MCP) tool interactions, testing for tool misuse, privilege escalation through tools, and data exfiltration via tool calls.