# labs
標記為「labs」的 421 篇文章
技能驗證概覽
AI 紅隊計時技能驗證實驗室概覽,包含格式、通過/失敗標準與準備指引。
進階 AI 紅隊實驗室
進階動手實驗室,涵蓋自動化攻擊演算法、正式環境紅隊工具、多模型測試以及針對資深 AI 安全從業者的精密利用技術。
進階 RAG 操控實驗室
執行精密的RAG 操控,包括 嵌入空間攻擊,中繼資料投毒,檢索演算法作弊.
對抗性穩健度評估
建構完整的adversarial 穩健度評估 framework,用於評估模型安全姿態.
對抗性後綴優化 (GCG)
實作 Greedy Coordinate Gradient 攻擊以產生對抗性後綴。
代理工具鏈利用
串聯 multiple tool calls in an agent system to achieve multi-step 漏洞利用.
實作:對齊邊界探測
系統性 probe 對齊 boundaries to map the exact thresholds where safety training engages and disengages.
AutoDAN 實作 實驗室
實作the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.
Automated Defense Evaluation Framework
建構 an automated framework to evaluate defensive measures across attack categories.
Lab: Building an Automated Red Team Pipeline
建構 a complete automated 紅隊演練 pipeline with attack generation, execution, scoring, and reporting.
Lab: Backdoor Persistence Through Safety Training
測試 whether fine-tuned 後門s persist through subsequent safety training rounds and RLHF 對齊.
Lab: Constitutional AI Bypass Techniques
測試 and bypass Constitutional AI safety mechanisms by exploiting the critique-revision training methodology.
實作:跨語言遷移攻擊
開發 attacks in one language that transfer to others by exploiting shared multilingual representation spaces.
Cross-Model Transfer Attacks
開發 attacks on open-source models that transfer to closed-source commercial APIs.
Cross-Plugin 利用 鏈
鏈 exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.
Lab: Training Curriculum Poisoning
Exploit training data ordering and curriculum learning to amplify the impact of small numbers of poisoned examples.
Custom 安全 Classifier 訓練
Train a custom input safety classifier與then develop payloads that reliably evade it to underst與classifier limitations.
Building a Custom Red Team Harness
建構 a complete red team testing harness with parallel execution, logging, and scoring.
防禦 vs Red Team War Game
Alternate between attacking與defending an LLM application to develop skills in both offensive與defensive operations.
Differential Testing Across Models
Use differential testing to find behavior inconsistencies across model providers.
Lab: Distributed Training Attack Simulation
Simulate attacks on distributed training infrastructure including 梯度 投毒 and aggregation manipulation.
Embedding Adversarial Perturbation
Craft 對抗性 inputs that produce target 嵌入 for retrieval manipulation.
實作:評估框架操控
Demonstrate how to game safety evaluation frameworks to produce artificially high safety scores while retaining vulnerabilities.
Lab: Federated Learning Poisoning Attacks
Execute model 投毒 attacks in a federated learning simulation by manipulating local model updates.
Fine-Tuning Backdoor Insertion
Insert a triggered 後門 during 微調 that activates on specific input patterns.
Lab: Gradient-Based Attack Implementation
實作 梯度-based 對抗性 attacks on open-weight models including PGD, FGSM adapted for text inputs.
Gradient-Guided Data 投毒
Use gradient information from open-source models to craft optimally poisoned training examples.
Guardrail Bypass Ensemble
建構 an ensemble attack system that combines multiple bypass 技術 for robustness.
Lab: Inference-Time Compute Exploitation
Exploit inference-time compute scaling to manipulate reasoning depth and resource consumption in thinking models.
Lab: Jailbreak Transferability Analysis
分析 越獄 transferability across model families to discover universal 漏洞 patterns.
Multi-Hop MCP Server Exploitation
串聯 exploits across multiple MCP servers to achieve lateral movement in agent systems.
Lab: Long-Term Memory Extraction
Extract information from long-term agent memory stores through crafted queries and memory retrieval manipulation.
模型 Distillation 安全 實驗室
萃取model capabilities through distillation techniques using only 黑盒 API access.
Lab: Model Merging Security Analysis
分析 security implications of model merging 技術 and test for 後門 propagation through merged model weights.
Lab: Model Steering with Activation Vectors
Use activation steering vectors to control model behavior without prompt modification for 安全測試.
Lab: Multi-Modal Attack Chain Construction
Construct attack chains that span text, image, and audio modalities to exploit cross-modal processing gaps.
Multi-Objective 攻擊 優化
Optimize attack payloads for multiple simultaneous objectives: jailbreaking,data extraction,防禦規避.
Multi-Turn Crescendo Attack
實作 Microsoft's Crescendo attack pattern for gradual multi-turn 越獄ing.
Multimodal 攻擊 鏈 實驗室
鏈 attacks across text,image,structured data modalities to exploit multimodal system vulnerabilities.
Multimodal Image Injection
Embed 對抗性 text in images that triggers 提示詞注入 in vision-語言模型.
Lab: Neural Network Trojan Detection
實作 and test neural network trojan detection methods including activation clustering and spectral analysis.
Output Constraint Exploitation
Exploit structured output requirements to force models into revealing restricted information.
Implementing PAIR Automated Attacks
實作 the PAIR (Prompt Automatic Iterative Refinement) attack for automated 越獄ing.
Production Environment Simulation 實驗室
測試attacks 對抗 a simulated production environment with realistic logging,monitoring,alerting.
Lab: Building a Prompt Injection Compiler
建構 a compiler that transforms high-level attack specifications into optimized 提示詞注入 payloads.
Prompt Leakage via Log Probabilities
Extract 系統提示詞 information using 符元 log probability analysis.
Lab: Quantization Security Testing
測試 behavioral differences between full-precision and quantized models to discover quantization-induced vulnerabilities.
Advanced RAG Poisoning Techniques
Execute sophisticated RAG 投毒 including 梯度-guided document crafting.
實作:推理軌跡操控
Manipulate chain-of-thought reasoning traces to inject false premises and redirect model conclusions.
實作:進階紅隊編排
建構 an advanced red team orchestration system that coordinates multiple attack agents against a defended target.
Lab: Representation Engineering for Security
Use representation engineering to analyze and manipulate internal model representations for security research.
Representation Probing for Vulnerabilities
Probe internal model representations to identify exploitable features與develop representation-level attacks.
Lab: Advanced Reward Hacking Techniques
開發 reward hacking exploits that manipulate RLHF reward models through 對抗性 optimization.
Reward 模型 分析 實驗室
Analyze與exploit reward model biases to craft responses that score high on safety while embedding harmful content.
Reward Model Gaming (Advanced Lab)
Craft inputs that score highly on reward models while containing 對抗性 content.
Lab: RLHF Training Exploitation
Exploit reward model weaknesses to craft inputs that score highly on safety metrics while achieving 對抗性 objectives.
Lab: Safety Classifier Model Extraction
Extract the decision boundary of safety classifiers through systematic probing to craft maximally evasive payloads.
Safety Classifier Inversion
Reverse-engineer safety classifier decision boundaries through systematic probing.
實作:稀疏注意力漏洞利用
Exploit sparse attention patterns in long-context models to hide injection payloads in low-attention regions.
Lab: Model Supply Chain Poisoning
Simulate model supply chain attacks by injecting 後門s into model weights distributed through public registries.
Tree of Attacks (TAP) Implementation
建構 a TAP attack system that generates and prunes 越獄 attempts using tree search.
Lab: Token-Level Adversarial Optimization
實作 符元-level 對抗性 optimization to discover minimal perturbations that bypass safety training.
實作:工具鏈橫向移動
串聯 tool call exploits to achieve lateral movement across services connected to LLM agent systems.
A2A 協定任務注入
注入惡意任務 into 代理對代理(代理-to-代理)協定 通訊通道,以重新導向 多代理工作流程.
代理 Workflow State Tampering
Manipulate agent workflow state machines to skip validation steps與reach privileged execution paths.
Attention Pattern 操控
Craft inputs that manipulate transformer attention patterns to prioritize adversarial content over safety instructions.
Audio 注入 via Speech-to-Text 模型
Craft adversarial audio that embeds 提示詞注入 payloads when transcribed by speech-to-text models.
AutoDAN 實作 and 測試
實作the AutoDAN methodology for generating human-readable stealthy jailbreak prompts using gradient guidance.
Constitutional Classifier 繞過
發展techniques to bypass Anthropic-style constitutional classifiers through adversarial input crafting.
Cross-模型 GCG 遷移 Attacks
Generate adversarial suffixes on open-source models與test their transferability to commercial APIs.
Building a Red Team Results Dashboard
建構real-time dashboard for tracking與visualizing red team campaign results across targets與techniques.
Differential 模型 Behavior 分析
Use differential testing across model versions與providers to discover inconsistent safety behaviors.
嵌入 Inversion 攻擊 實作
實作embedding inversion to recover original text from 向量資料庫 embeddings.
微調 對齊 移除 攻擊
Use fine-tuning API access to systematically remove safety alignment with minimal training examples.
GUI 代理 Visual 注入 攻擊
注入adversarial content into screenshots與UI elements processed by computer-use AI agents.
HarmBench Custom 攻擊 Submission
Develop與evaluate custom attack methods 對抗 the HarmBench standardized evaluation framework.
LangChain CVE 利用 實驗室
Reproduce與analyze LangChain CVEs,包括 CVE-2023-29374與CVE-2023-36258 in a safe lab environment.
LoRA 後門 Insertion 攻擊
Insert triggered backdoors through LoRA fine-tuning that activate on specific input patterns while passing safety evals.
MCP Multi-Server Pivot 鏈
鏈 exploits across multiple MCP servers to achieve lateral movement與capability escalation in agent systems.
代理 記憶 注入 for Persistent Access
注入persistent instructions into agent memory systems that survive across conversation sessions.
多代理 Trust Boundary 利用
利用 trust boundaries between cooperating agents to escalate privileges與access restricted capabilities.
Multi-Modal 攻擊 鏈 編排
Orchestrate attacks across text,image,document modalities to bypass per-modality safety filters.
Promptfoo Red Team Test Suite Development
Build完整的 red team test suites in Promptfoo with custom graders與multi-model targeting.
Quantization-Induced 安全 Regression 測試
測試how model quantization (INT8,INT4,GPTQ) degrades safety alignment與introduces exploitable gaps.
RAG Access Control 繞過
繞過document-level access controls in enterprise RAG systems through query manipulation與context injection.
Reasoning Trace 利用 in CoT 模型
利用 visible chain-of-thought reasoning traces in models like o1與DeepSeek-R1 to manipulate outputs.
Representation Probing for Vulnerability Discovery
Probe model internal representations to discover exploitable features與latent vulnerability patterns.
對抗性 Reward 模型 利用
Craft inputs that exploit reward model weaknesses to achieve high safety scores while containing harmful content.
安全 Classifier Reverse Engineering
Reverse-engineer a safety classifier's decision boundaries through systematic adversarial probing.
Code 代理 Sandbox Escape Techniques
Develop與test sandbox escape techniques 對抗 code execution environments in AI coding assistants.
Steganographic Payload 嵌入 in Images
Hide 提示詞注入 payloads in images using steganographic techniques undetectable to human observers.
訓練 Data 萃取 from Production LLMs
實作Carlini et al.'s techniques to extract memorized training data from production language model APIs.
LLM 浮水印 偵測 and 移除
Detect與remove statistical watermarks from LLM-generated text while preserving content quality.
Lab: LLM Watermark Detection and Removal
Detect and analyze LLM text 浮水印s using statistical methods and test 浮水印 removal through paraphrasing.
開始學習 AI 紅隊實驗
AI 紅隊動手實驗環境介紹。瞭解你所需的資源、實驗的結構,並掌握所有初階練習的概觀。
Lab: API Authentication Security Testing
測試 LLM API authentication mechanisms for vulnerabilities including 符元 leakage, replay attacks, and scope bypass.
Multi-Provider API Exploration
Explore與compare API behaviors across OpenAI,Anthropic,Google AI to underst與provider-specific security characteristics.
API Rate Limit and Error Handling
測試 LLM API rate limits and implement proper error handling for automated testing workflows.
Lab: API Response Parsing and Analysis
Parse and analyze LLM API responses to identify anomalies, safety filter triggers, and information leakage patterns.
Lab: Setting Up Automated LLM Testing
配置 automated testing pipelines for LLM 紅隊演練 using Python scripts, test runners, and result collection.
Lab: Crafting Basic Adversarial Examples
Create basic 對抗性 examples that cause LLMs to misclassify, misinterpret, or bypass safety checks on text input.
Basic Automated 測試 Setup
Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.
Basic Classifier Evasion
Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.
Basic Data Exfiltration Techniques
Extract sensitive information from LLM applications using social engineering and misdirection.
Basic 防禦 Mechanism 測試
Identify與categorize the defensive mechanisms present in a target LLM application through structured probing.
Lab: Introduction to LLM Fuzzing
學習 the basics of fuzzing LLM inputs by mutating prompts to discover unexpected behaviors and edge cases.
Lab: Jailbreak Technique Taxonomy
探索 the major categories of 越獄 技術 and practice classifying attack payloads by technique type.
Lab: Basic RAG System Security Testing
Introduction to testing Retrieval-Augmented Generation systems for document injection and retrieval manipulation vulnerabilities.
安全 Finding Documentation 練習
Practice documenting security findings in a professional format with reproducible steps與impact assessment.
Basic Role-Play Prompt Injection
Use role-play 技術 to bypass basic LLM safety 護欄s through character assumption.
Lab: Social Engineering LLM Applications
Practice social engineering 技術 adapted for LLM applications including authority escalation and urgency injection.
Lab: Content Policy Boundary Mapping
Map the content policy boundaries of LLM applications to identify permitted and restricted topic areas systematically.
Basic Context Window Overflow
Fill the context window with padding content to push safety instructions out of the attention window.
Conversation History 分析
分析how conversation history influences model behavior與identify opportunities for context manipulation.
Lab: Conversation History Manipulation
測試 how LLM applications handle conversation history including truncation, injection, and context window management.
Identifying LLM Defenses
Map the 防禦層 of an LLM application through systematic probing and error analysis.
Lab: Introduction to Defense Testing
學習 to 系統性地 test LLM application defenses by probing input filters, output validators, and 護欄s.
Delimiter Escape Techniques
Practice escaping common delimiters used to separate 系統提示詞s from user input.
嵌入 Basics for 安全
Underst與text embeddings與their security relevance by generating,comparing,manipulating embedding vectors.
Lab: Embedding Fundamentals for Red Teamers
學習 embedding fundamentals including vector similarity, semantic search, and how 嵌入 enable RAG systems.
Encoding and Obfuscation Basics
探索 how Base64, ROT13, and Unicode encoding can bypass input filters in LLM applications.
Lab: Testing Environment Hardening
Secure your AI red team 測試環境 to prevent credential leaks, accidental production testing, and data exposure.
Lab: Error Message Analysis for Recon
分析 LLM application error messages to extract information about system architecture, models, and 防禦層.
Error Message Intelligence Gathering
萃取system architecture information from error messages與response patterns in LLM applications.
Lab: Evidence Collection for LLM Testing
設定 evidence collection workflows including request/response logging, screenshots, and reproducibility documentation.
Few-Shot Injection Fundamentals
Craft few-shot examples that prime the model to follow attacker instructions in subsequent turns.
Your First Claude API Call
設定 the Anthropic SDK and make your first Claude API call with 系統提示詞s and messages.
Your First LLM API Call with OpenAI
設定 your Python environment and make your first LLM API call to understand request/response patterns.
Lab: Format String Injection in LLMs
Practice injecting format strings and template directives to manipulate LLM output structure and content.
Your First Garak Vulnerability Scan
Install and run NVIDIA's Garak tool to perform automated 漏洞 scanning on an LLM endpoint.
實作:幻覺偵測基礎
學習 to detect and trigger hallucinations in LLM outputs including factual errors, fabricated citations, and invented APIs.
Hello World 提示詞 注入
Write與test your first 提示詞注入 payload 對抗 a simple chatbot to underst與the fundamental attack mechanism.
Lab: Injection Attempt Log Analysis
建構 a logging pipeline to capture and analyze injection attempts for pattern discovery and technique refinement.
Instruction Hierarchy Testing
測試 how models prioritize conflicting instructions between system, user, and assistant roles.
Temperature and Sampling Effects on Jailbreaks
Experiment with temperature and top-p settings to understand their impact on 越獄 success rates.
JSON Output 利用 Basics
利用 LLM JSON mode to bypass content filters與extract information through structured output channels.
Lab: JSON Output Mode Security Testing
測試 JSON output mode for injection vulnerabilities including schema manipulation and structured output 漏洞利用.
Lab: Multi-Language Prompt Testing
測試 LLM safety boundaries across multiple languages to identify inconsistencies in safety training coverage.
Lab: Local Model Setup for Testing
設定 local LLM instances using Ollama for safe, cost-free red team testing without API costs or rate limits.
模型 安全 Comparison 實驗室
Compare the security posture of different LLM models by running identical test suites across providers.
Lab: Basic Model Fingerprinting
學習 to identify which LLM model powers an application through behavioral fingerprinting and response analysis.
Multi-Turn Conversation Probing
Use multi-turn conversations to gradually escalate requests and probe LLM safety boundaries.
Lab: Multimodal Input Testing Basics
Introduction to testing multimodal LLMs with image and text inputs to understand cross-modal behavior.
Lab: Detecting Output Filters
Identify and characterize output filtering mechanisms in LLM applications through systematic response analysis.
Output Format Control 實驗室
Practice controlling LLM output formats to underst與how format constraints affect safety mechanism effectiveness.
Output Format Manipulation
Force LLMs to output in specific formats (JSON, XML, code) to bypass output sanitization.
Setting Up Payload Logging
建構 a payload logging system to track 提示詞注入 attempts and model responses.
Lab: LLM Playground Exploration
探索 LLM provider playgrounds to understand model behavior, parameter effects, and interactive testing workflows.
LLM Playground 安全 測試
Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.
Lab: Testing Prompt Leaking Defenses
測試 various prompt leaking defense configurations to evaluate their effectiveness against extraction 技術.
提示詞 Structure 分析 實驗室
Reverse-engineer the prompt structure of target applications by analyzing response patterns與behavioral cues.
Lab: Prompt Template Vulnerability Testing
測試 common prompt template patterns for injection vulnerabilities including string interpolation and placeholder abuse.
Setting Up Promptfoo for LLM Evaluation
配置 Promptfoo to create automated test suites for evaluating LLM safety and robustness.
Lab: Rate Limit Enumeration and Bypass
Enumerate API rate limits and test common bypass 技術 including header manipulation and request distribution.
Rate Limit and Quota Mapping
Map the rate limits,quotas,throttling behavior of LLM APIs to plan efficient testing strategies.
Lab: Analyzing Model Refusal Patterns
系統性 map model refusal patterns to understand safety training boundaries and identify potential bypasses.
Lab: Red Team Report Writing Basics
Practice writing clear, actionable red team findings reports with evidence, risk ratings, and remediation guidance.
Response 分析 Fundamentals
學習to analyze LLM responses for information leakage,inconsistencies,indicators of safety mechanism activation.
實作:回應一致性測試
測試 LLM response consistency across repeated queries to identify non-deterministic safety boundary behavior.
Lab: Running Safety Benchmarks
Run standardized safety benchmarks against LLM models to establish baseline safety profiles for comparison.
安全 Boundary Mapping
Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics與documenting refusal patterns.
Lab: Introduction to Safety Testing
學習 the fundamentals of LLM safety testing including test case design, baseline measurement, and result documentation.
安全 訓練 Boundary Probing
Systematically probe the boundaries of RLHF safety training to underst與where與how safety behaviors are enforced.
Lab: Vulnerability Scoring Fundamentals
學習 漏洞 scoring frameworks adapted for LLM systems including severity, exploitability, and impact assessment.
Lab: Simple Payload Encoding Techniques
Practice encoding injection payloads using Base64, hex, URL encoding, and Unicode to bypass basic input filters.
Lab: System Prompt Enumeration Techniques
Practice systematic approaches to enumerating hidden 系統提示詞 content through varied probing strategies.
System Prompt Extraction Fundamentals
學習 to extract hidden 系統提示詞s from LLM applications using direct and indirect 技術.
Lab: Temperature and Sampling Security Effects
探索 how temperature and sampling parameters affect model susceptibility to 越獄s and safety boundary enforcement.
Lab: Designing LLM Red Team Test Cases
設計 effective red team test cases with clear objectives, success criteria, and reproducible execution procedures.
Token Counting and Estimation
理解 符元ization by counting and estimating 符元s across different models and encoders.
實作:符元操控基礎
理解 how 符元ization affects LLM behavior and practice basic 符元-level manipulation 技術.
Lab: Comparing Red Team Testing Tools
Compare Garak, PyRIT, and Promptfoo capabilities through hands-on exercises using each tool against the same target.
Red Team 工具 Installation and Configuration
Install與configure essential AI red teaming tools: Garak,PyRIT,Promptfoo,custom Python harnesses.
API Key Scope and Permission 測試
測試API key scoping與permission boundaries to identify over-privileged access configurations.
API Response Header 分析
分析HTTP response headers from LLM APIs to fingerprint providers,versions,middleware.
Basic Indirect 提示詞 注入
Plant與trigger a basic indirect 提示詞注入 payload in content consumed by an LLM.
Basic Payload Mutation Techniques
學習fundamental payload mutation strategies,包括 synonym replacement與structural transforms.
Basic RAG Query 注入
Craft user queries that manipulate RAG retrieval to surface unintended documents.
Basic Rate Limit Abuse Patterns
測試common rate-limit bypass patterns,包括 header manipulation與endpoint discovery.
Character Encoding 繞過 Techniques
繞過input filters using Unicode normalization,homoglyph substitution,mixed-script encoding.
Chatbot Persona and Capability Mapping
Map the configured persona,capabilities,restrictions of an LLM chatbot through systematic probing.
Completion Hijacking Fundamentals
Craft partial sentences that steer model completions toward attacker-desired outputs.
Conversation Reset Attacks
利用 conversation resets與context clearing to weaken model adherence to safety instructions.
Emoji and Unicode 注入 Techniques
Use emoji sequences與Unicode special characters to bypass text-based input filters.
Error Message 利用
Trigger與analyze error messages from LLM applications to reveal architecture與configuration details.
Your First HarmBench 評估
Run a standardized safety evaluation using the HarmBench framework 對抗 a target model.
Your First Inspect AI 評估
Set up與run a basic AI safety evaluation using the UK AISI Inspect framework.
JSON 注入 Basics
注入adversarial content through JSON-formatted inputs to exploit structured data processing.
Your First LLM Guard Scan
Install與configure LLM Guard to scan inputs與outputs,understanding detection capabilities與gaps.
Markdown Rendering Exfiltration
利用 Markdown rendering in chat UIs to exfiltrate data through image tags與links.
模型 Fingerprinting Basics
Identify which LLM model powers an application through behavioral fingerprinting techniques.
Introduction to NeMo 護欄
Set up NVIDIA NeMo Guardrails與create your first Colang-based guardrail for an LLM application.
提示詞 注入 via File Names
Embed 提示詞注入 payloads in filenames與metadata of uploaded documents.
提示詞 Leaking via Summarization Requests
Use summarization requests to trick an LLM into revealing its hidden 系統提示詞 through indirect disclosure.
Analyzing Refusal Messages for Intel
萃取useful information about model configuration與guardrails by systematically analyzing refusal messages.
Response Timing Side-Channel 分析
Use response timing differences to infer information about model processing與guardrail activation.
安全 Boundary Mapping 練習
Systematically map the safety boundaries of an LLM application across multiple topic categories.
Simple Output Constraint Attacks
Force models to output in constrained formats that bypass output safety filters.
系統 提示詞 重建 from Clues
Piece together a hidden 系統提示詞 by collecting partial information from multiple probe techniques.
系統 提示詞 萃取 via Error 注入
Trigger specific error conditions that cause the model to inadvertently reveal 系統提示詞 contents.
Temperature and Top-K Effects on 安全
Systematically test how temperature,top-k,top-p parameters affect safety guardrail effectiveness.
提示詞 注入 via Translation
利用 LLM translation capabilities to smuggle instructions through language boundaries.
XML 注入 in LLM Contexts
利用 XML tag handling in LLM applications to manipulate instruction parsing.
Agent Heist: Level 2 — MCP Server Takeover
Take control of an MCP-enabled agent by 投毒 tool descriptions and chaining exploits.
Alignment Breaker: Level 2 — Safety Fine-Tuned Model
Bypass safety 微調 on a model with RLHF, constitutional AI, and classifier-based defenses.
Cloud Infiltrator Challenge
Navigate through cloud AI service misconfigurations to access a protected model endpoint與extract its secrets.
Context Heist: System Prompt Extraction Championship
Extract 系統提示詞s from 10 progressively harder LLM applications using diverse 技術.
Data Heist Challenge
萃取a secret database of customer records from a RAG-enabled chatbot with strict data access controls.
防禦 Destroyer Challenge
繞過a state-of-the-art multi-layer defense system to extract a protected secret from the model.
Defense Gauntlet: Level 2 — Layered Security
Bypass a layered defense system with input classification, 護欄s, and output filtering.
Encoding Maestro Challenge
Solve 10 encoding puzzles that require creative use of Base64,Unicode,ROT13,custom encodings to bypass filters.
Forensics Detective Challenge
分析logs與artifacts from an AI security incident to reconstruct the attack chain與identify the attacker's technique.
Guardrail Olympics: Multi-Framework Bypass
Bypass 護欄 implementations from NeMo, LLM Guard, Rebuff, and custom classifiers.
Jailbreak Gauntlet: Level 2 — Adaptive Defenses
Break through adaptive defense systems that learn from your previous attack attempts.
模型 Whisperer Challenge
萃取the exact 系統提示詞 from 8 progressively harder LLM applications using creative extraction techniques.
提示詞 Smuggler Challenge
Smuggle a specific payload through 5 layers of increasingly精密的 input filtering to capture the flag.
RAG Infiltrator: Level 2 — Enterprise Knowledge Base
Infiltrate and exfiltrate data from a multi-tier enterprise RAG system with access controls.
Shadow 代理 Challenge
Take covert control of a 多代理 system by poisoning inter-agent communication without triggering monitors.
Supply Chain Saboteur: Level 2 — Model Registry
Compromise a model registry to inject 後門ed model weights into a deployment pipeline.
Time Traveler Challenge
利用 conversation memory與session management to manipulate the temporal context of an AI assistant.
Token Wars: Adversarial Optimization Challenge
Optimize 對抗性 符元s to achieve target model behaviors under strict character limits.
工具 Master Challenge
利用 the tool-use capabilities of an agent system to access unauthorized functions與capture hidden flags.
代理 Maze Runner: Multi-工具 Navigation
Navigate a maze of agent tools,each with unique vulnerabilities,to reach與exfiltrate a hidden flag.
AI Wargame: Attacker vs Defender Real-Time
Compete in a real-time wargame where you attack an LLM system while an AI defender adapts to your techniques.
Classifier Gauntlet: 10-Stage Evasion
繞過10 progressively harder input classifiers using different evasion techniques at each stage.
Data Heist: Multi-Channel Exfiltration
Exfiltrate a secret key through a heavily monitored LLM system using covert exfiltration channels.
防禦 Builder: Construct and Survive
Build defensive guardrails for an LLM application與survive 100 automated attack rounds.
AI Escape Room: 代理 Breakout Challenge
突破an AI agent out of its sandboxed environment by chaining tool-use vulnerabilities與injection techniques.
Function Calling Fortress Breach
繞過function calling restrictions to invoke unauthorized tools與extract a flag from a sandboxed agent.
護欄 Speedrun: Fastest 繞過 Challenge
繞過5 different guardrail implementations as fast as possible in a timed competition format.
Invisible Ink: Hidden Payload 萃取 Challenge
萃取a hidden flag by decoding multiple layers of obfuscation in model responses across encoded channels.
MCP Server Takeover Challenge
Compromise an MCP server through tool description injection與use it to pivot to other connected services.
Multimodal Cipher: Cross-Modal Decryption
Decode a flag split across text,image,audio inputs processed by a multimodal AI system.
Polyglot 注入 Challenge
Craft a single injection payload that works across 5 different languages與3 different model providers.
提示詞 Archaeology: Deep 系統 提示詞 Recovery
Recover fragments of a deeply hidden,multi-layer 系統提示詞 using various extraction techniques.
RAG Treasure Hunt: Vector DB Infiltration
Infiltrate a 向量資料庫 by crafting documents that surface for specific queries to reveal flag fragments.
Reasoning 鏈 Hijack: CoT 利用 CTF
Hijack a reasoning model's chain-of-thought,以重新導向 its conclusions while maintaining coherent reasoning.
Red vs Blue Tournament Challenge
Alternate between attacking與defending LLM systems in a scored tournament format.
Stealth Exfiltration Challenge: Zero 偵測
Exfiltrate data from a monitored LLM system without triggering any of the deployed anomaly detectors.
供應鏈 Detective: Find the 後門
分析a model pipeline to find where a backdoor was inserted — data,training,or post-processing.
Time Bomb Defusal: Sleeper 代理 偵測
Detect與neutralize a sleeper agent behavior trigger hidden in a fine-tuned model before it activates.
Token Smuggler Extreme: 對抗性 Token Crafting
Craft adversarial token sequences under extreme character limits to achieve target model behaviors.
運用啟動向量操縱進行利用
運用啟動向量操縱技術,在表徵層面操控模型行為,而不需修改提示詞。
Backdoor Detection Evasion
Insert 後門s into fine-tuned models that evade state-of-the-art detection methods.
Cross-Architecture 遷移 實驗室
發展attacks on one model architecture that reliably transfer to fundamentally different architectures.
Emergent Behavior 利用
Identify與exploit emergent behaviors in frontier models that arise from scale與are not present in smaller models.
Formal Verification Limits 實驗室
探索the limits of formal verification for neural network behavior與identify gaps exploitable by adversaries.
Interpretability-Guided 攻擊 Design
Use mechanistic interpretability to identify exploitable circuits與design targeted attacks.
Model Extraction via API Access
Extract a functionally equivalent model using only API query access.
MoE Routing 利用 實驗室
利用 Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.
Neural 後門 Engineering
Engineer精密的 neural backdoors that activate on specific trigger patterns while evading detection methods.
Quantization-Induced Vulnerability Exploitation
Exploit behavioral differences between full-precision and quantized models.
Reasoning Model Chain-of-Thought Exploitation
Exploit extended thinking and chain-of-thought reasoning in models like o1 and DeepSeek-R1.
Representation Engineering Attacks
Manipulate internal model representations to alter behavior without prompt modification.
End-to-End 訓練 Time Attacks
執行a complete training-time attack from 資料投毒 through model deployment to triggered exploitation.
Activation Patching for Targeted Behavior Modification
Use activation patching techniques to modify specific model behaviors without retraining or fine-tuning.
對齊 Faking 偵測 實驗室
實作detection methods for alignment faking behaviors where models behave differently during evaluation versus deployment.
Continuous Learning 流水線 投毒
利用 continuous/online learning pipelines by injecting adversarial examples that gradually shift model behavior.
Deceptive 對齊 測試 框架
建構testing framework for detecting mesa-optimization與deceptive alignment in fine-tuned models.
Emergent Capability 安全 Probing
設計evaluations that discover security-relevant emergent capabilities in frontier language models.
Federated Learning 投毒 攻擊
執行模型投毒 attacks in a federated learning setting through adversarial participant manipulation.
Formal Verification of LLM 安全 Properties
套用formal verification methods to prove or disprove safety properties of LLM-based applications.
Gradient-Based Universal 提示詞 Search
實作gradient-based optimization to search for universal adversarial prompts that transfer across contexts.
KV Cache 操控 Attacks
Manipulate key-value caches in inference servers to inject or alter cached context across requests.
Mechanistic Interpretability for 利用 Discovery
Use mechanistic interpretability tools to discover exploitable circuits與features in transformer models.
Membership 推論 Against Production LLMs
實作membership inference attacks to determine whether specific data was used in training an LLM.
模型 萃取 via Knowledge Distillation
萃取a functionally equivalent model from a commercial API using systematic distillation queries.
模型 Merging 後門 Propagation
示範how backdoors propagate through model merging techniques like TIES,DARE,spherical interpolation.
MoE Routing 利用
利用 Mixture-of-Experts routing mechanisms to activate specific expert networks for adversarial purposes.
多代理 Consensus 投毒
Poison 多代理 consensus mechanisms to manipulate collective decision-making in agent swarms.
Representation Engineering for Behavior Steering
Use representation engineering to steer model behavior by manipulating activation vectors during inference.
Reward Hacking in RLHF Pipelines
Craft adversarial training examples that exploit reward model weaknesses in RLHF training pipelines.
Minimal-Data 安全 微調 移除
Determine the minimum number of fine-tuning examples needed to remove safety alignment from various model families.
Speculative Decoding Side-Channel Attacks
利用 speculative decoding implementations to extract information about draft與verifier model behavior.
Token Probability 系統 提示詞 重建
Reconstruct complete system prompts using token-by-token probability analysis through API access.
LLM Watermark Removal Attacks
開發 技術 to remove or corrupt 浮水印s embedded in LLM-generated text.
實作實驗與動手練習
從入門到專家難度的漸進式動手實作練習,包含 CTF 挑戰與完整案件模擬,用於培養 AI 紅隊演練技能。
中階 AI 紅隊實驗室
中階動手實驗室總覽,涵蓋代理利用、RAG 投毒、多輪攻擊、自動化活動與進階注入技術於 AI 紅隊演練。
進階 Encoding 鏈 Attacks
鏈 multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
實驗:進階函式呼叫利用
利用進階的函式呼叫模式,包含巢狀呼叫、平行執行與 Schema 操弄。
實驗:自動化對抗提示生成
建構自動化系統,使用突變、交配與選擇策略生成對抗性提示。
代理 記憶 操控
利用 persistent memory in LLM agents to plant false context that persists across sessions.
實驗:代理工作流程安全測試
測試多步代理工作流程的注入點、狀態操弄與未授權的動作執行。
API Abuse Pattern 實驗室
Discover與exploit API abuse patterns,包括 parameter injection,header manipulation,endpoint confusion.
API Parameter Fuzzing for LLMs
Systematically fuzz LLM API parameters to discover unexpected behaviors與bypasses.
實驗:API 版本控管的安全縫隙
利用 LLM API 各版本間的差異,找出已淘汰的端點與向後相容的安全繞過。
實驗:金絲雀 Token 有效性測試
在系統提示中部署並測試金絲雀 token,評估其偵測提示擷取的有效性。
實驗:思考鏈利用
利用思考鏈推理,洩漏模型內部推理、繞過安全過濾器、操弄決策過程。
實驗:LLM 應用的 CI/CD 管線安全
測試 LLM 應用的 CI/CD 管線安全,包含提示樣板注入、模型成品竄改與部署攻擊。
實驗:進階分類器閃避
使用對抗性擾動、語意改寫與無梯度最佳化,閃避進階輸入分類器。
ML Classifier Evasion 實驗室
發展payloads that evade machine learning-based input classifiers through adversarial text perturbation.
實驗:雲端 AI 平台安全測試
測試雲端 AI 平台組態,包含 AWS Bedrock guardrails、Azure 內容過濾器與 GCP 安全設定。
實驗:程式碼生成安全測試
測試 LLM 程式碼生成中的不安全模式、注入漏洞與程式碼執行安全問題。
實驗:AI 合規測試基礎
以結構化測試程序,依 OWASP LLM Top 10 與 NIST AI RMF 的要求測試 LLM 應用。
實驗:上下文窗口攻擊最佳化
透過注意力動態,最佳化提示注入在上下文窗口中的擺放位置以最大化攻擊效果。
上下文 Window 投毒 實驗室
利用 context window management to inject persistent adversarial content that influences future model responses.
Conversation Splitting 攻擊 實驗室
Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.
Conversation State 投毒
利用 conversation history manipulation to alter model behavior across turns.
Cross-上下文 注入
注入prompts that persist across separate conversation contexts in shared deployments.
Custom 越獄 Wordlist 生成
Generate targeted jailbreak wordlists using mutation strategies與seed prompts.
Data Exfiltration via Side Channels
萃取data from LLM systems using timing,token probability,formatting side channels.
Building 防禦 繞過 Chains
Combine multiple bypass techniques into reliable chains that defeat layered defenses.
實驗:防禦回歸測試建置
建構一個回歸測試框架,持續驗證 LLM 防禦對已知攻擊模式仍然有效。
Document-Based RAG 注入 實驗室
注入adversarial content into documents that will be processed by a RAG system to influence model responses.
實驗:進階嵌入操弄
操弄嵌入向量以達到目標相似度分數,用於 RAG 投毒與取回操弄。
Practical 嵌入 操控
Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.
嵌入 Similarity Attacks
Manipulate text to achieve target embedding similarity scores for retrieval poisoning.
微調 API 安全 Probing
Probe fine-tuning APIs for security weaknesses,包括 insufficient validation與unsafe default configurations.
實驗:微調對安全的影響測試
透過比較微調前後的安全基準分數,衡量微調對模型安全性的影響。
實驗:防護欄繞過技術實驗室
針對 NeMo Guardrails、LLM Guard 與自製分類器防禦,練習防護欄繞過技術。
護欄 Fingerprinting
Systematically map the rules與thresholds of input/output guardrail systems.
護欄 Rule 萃取 實驗室
Systematically extract the rules與decision boundaries of guardrail systems through targeted probing.
實驗:利用 LLM 幻覺
利用幻覺傾向,觸發捏造的工具呼叫、虛構的 API 端點與虛假事實注入。
實驗:AI 事件回應模擬
透過模擬的提示注入事件,搭配升級與圍堵流程,演練 AI 事件回應程序。
Indirect 注入 via Web Content
Plant 提示詞注入 payloads in web pages consumed by RAG-enabled LLM applications.
實驗:指令遵循度量
量化衡量指令遵循的順從度,辨識模型在相互競爭的指令中如何排序。
LLM Judge 操控
Craft responses that exploit LLM-as-judge evaluation patterns to achieve high safety scores while embedding harmful content.
實驗:以對數機率進行攻擊分析
利用 token 對數機率,擷取訓練資料、系統提示與模型內部相關的資訊。
實驗:進階 MCP 協定測試
對 MCP 伺服器進行進階安全測試,包含傳輸分析、能力列舉與工具投毒。
MCP 工具 投毒 攻擊
利用 MCP tool descriptions to inject instructions that redirect agent behavior.
實驗:Model Context Protocol 模糊測試
對 MCP 協定訊息進行模糊測試,找出 MCP 伺服器中的解析錯誤、驗證缺口與非預期行為。
Practical 模型 Fingerprinting 實驗室
Fingerprint the underlying model behind an API endpoint to identify its type,version,configuration.
實驗:LLM 安全監控建置
為 LLM 應用部署具備異常偵測與告警的完整安全監控系統。
實驗:多代理通訊測試
測試多代理系統中代理間通訊的安全性,包含訊息注入與信任邊界違反。
Multi-Language 安全 繞過 實驗室
利用 gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.
Multi-Language 注入 Attacks
利用 language switching與low-resource language gaps to bypass safety training.
Multi-模型 攻擊 Chaining
鏈 attacks across multiple LLM models in a pipeline to bypass per-model defenses.
實驗:多模型比較安全測試
對多家 LLM 供應商執行相同的攻擊套件,比較其安全態勢並分析差異化回應。
Multi-模型 流水線 攻擊 實驗室
攻擊 a pipeline where multiple models process data sequentially,exploiting trust between pipeline stages.
實驗:多輪狀態追蹤攻擊
利用會話狀態追蹤,在多輪對話中累積出持久的攻擊脈絡。
實驗:中階多模態安全測試
以精心設計、含嵌入式文字、對抗性擾動與視覺注入酬載的影像,測試多模態 LLM。
Output Channel Discovery 實驗室
發掘alternative output channels in LLM applications that bypass standard output filtering.
實驗:輸出 Schema 利用
利用結構化輸出 Schema,迫使模型透過必填欄位的填寫揭露資訊。
實驗:提示快取側通道攻擊
利用提示快取機制,偵測已快取的提示,並透過時序側通道擷取資訊。
提示詞 Leakage Technique 實驗室
Practice multiple 系統提示詞 extraction techniques與measure their effectiveness across different targets.
Automated Red Teaming with PyRIT
Use Microsoft's PyRIT framework to run automated multi-turn attack campaigns.
實驗:進階 RAG 安全測試
測試 RAG 系統在分塊利用、重新排序操弄與跨文件注入攻擊上的弱點。
RAG 上下文 投毒
Poison a 向量資料庫 to inject adversarial content into RAG retrieval results.
Reasoning 鏈 Hijacking
Hijack the model's chain-of-thought reasoning,以重新導向 its conclusions toward attacker-controlled objectives.
實驗:角色階層利用
透過操弄 system、user 與 assistant 角色之間的指令階層,利用多角色 LLM 架構。
實驗:安全過濾器基準測試
使用標準化測試套件對各家供應商的安全過濾器進行基準測試,比較偵測率與誤判率。
Semantic Deception Crafting 實驗室
Craft semantically coherent prompts that conceal adversarial intent from both human reviewers與ML classifiers.
Semantic 注入 Crafting
Craft semantically coherent injections that evade both classifiers與human review.
實驗:語意搜尋操弄
透過設計文件以對特定查詢達到目標相似度分數,操弄語意搜尋結果。
實驗:會話隔離測試
測試多租戶 LLM 部署中的會話隔離,偵測跨會話的資料外洩與脈絡污染。
Session State 利用 實驗室
利用 session management vulnerabilities in LLM applications to access other users data or escalate privileges.
實驗:串流回應操弄
利用串流回應機制,繞過僅檢查完整回應的輸出過濾器。
Structured Output 安全 繞過
Use JSON mode與structured output constraints to extract information that would be refused in free-form responses.
Structured Output 利用
利用 JSON mode與structured output constraints to bypass safety filters.
實驗:系統提示重建(中階實驗)
透過迭代式的部分擷取與多個外洩向量的交叉比對,重建完整的系統提示。
進階 Token Smuggling
Use Unicode homoglyphs,zero-width characters,tokenizer quirks to smuggle payloads.
實驗:工具結果注入攻擊
透過工具呼叫結果注入對抗性內容,毒化模型推理並重新導向後續動作。
實驗:Unicode 正規化繞過攻擊
利用輸入驗證與模型處理之間 Unicode 正規化的差異,夾帶注入酬載。
實驗:向量資料庫存取控制測試
測試向量資料庫的存取控制,找出包含命名空間穿越與過濾條件操弄在內的繞過漏洞。
代理 Goal Hijacking
Redirect an AI agent's objectives through carefully crafted inputs that override its primary task.
Assistant Prefill 注入 Attacks
利用 assistant message prefilling to prime model responses與bypass safety alignment.
Chat History 操控 Attacks
注入false messages into conversation history to alter model behavior in subsequent turns.
Input Classifier Boundary Mapping
Systematically probe input classifiers to map their decision boundaries與find bypass regions.
上下文 Window Boundary 利用
利用 the boundaries of context windows to push safety instructions beyond the model's attention.
Crescendo 攻擊 實作
實作Microsoft's Crescendo multi-turn escalation attack with automated conversation management.
Cross-Plugin 利用 Chaining
鏈 exploits across multiple plugins/tools in an agent system to achieve multi-step attacks.
Function Schema 操控
利用 function calling schemas to inject instructions through parameter descriptions與enums.
Building Custom Garak Probes
發展custom probe modules for NVIDIA Garak,以測試 organization-specific vulnerability patterns.
護欄 Latency-Based 偵測
Use timing side channels to identify與characterize guardrail implementations in LLM applications.
Custom Inspect AI 安全 Evaluations
Build custom safety evaluation tasks in Inspect AI,以測試 application-specific vulnerability classes.
JailbreakBench Comparative 測試
Run standardized jailbreak benchmarks across multiple models與compare vulnerability profiles.
JSON Mode 安全 繞過
利用 structured JSON output modes to bypass safety filters that only check plain-text responses.
LLM Guard Evasion Techniques
發展evasion techniques 對抗 LLM Guard input scanners與output detectors.
Implementing Many-Shot Jailbreaking
實作Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.
MCP Resource Data Exfiltration (中階 實驗室)
利用 MCP resource URIs to access與exfiltrate data beyond intended scope boundaries.
Multi-Language 安全 Gap 利用
Systematically identify與exploit safety training gaps across low-resource languages.
Multi-Turn State Tracking 利用
利用 state management in multi-turn conversations to gradually shift model behavior over many turns.
NeMo 護欄 繞過 Techniques
Test與bypass NVIDIA NeMo 護欄 configurations using known evasion techniques.
Output Filter 繞過 via Encoding
繞過output safety filters by instructing the model to encode responses in Base64,hex,or custom schemes.
PDF Document 注入 for RAG Systems
Craft adversarial PDF documents that inject instructions when processed by RAG document loaders.
提示詞 Template Variable 注入
注入adversarial content through template variable expansion in prompt management systems.
PyRIT Custom 攻擊 Strategies
Build custom attack strategies與converters in Microsoft's PyRIT framework for targeted campaigns.
RAG Document 注入 Campaign
Design與execute a document injection campaign 對抗 a RAG-powered application with vector search.
Semantic Search 投毒
Craft adversarial documents that rank highly in semantic search for targeted queries in RAG systems.
Streaming Response Exfiltration
利用 streaming API responses to exfiltrate data that might be caught by batch output filters.
系統 提示詞 萃取 via Token Probabilities
利用 token probability APIs to reconstruct hidden system prompts character by character.
Token Boundary 操控
利用 tokenizer-specific behavior by crafting inputs that split across token boundaries in unexpected ways.
工具 Result 注入 Attacks
Craft malicious tool return values that inject instructions back into the model's reasoning chain.
Injecting Text via Images to VLMs
Embed adversarial text in images that vision-language models read與follow as instructions.
實驗:透過 Webhook 進行資料外洩
測試透過 LLM 觸發的 webhook、API 回呼與外部 URL 渲染進行資料外洩的路徑。
自動駕駛車輛 AI 通訊評估
評估自動駕駛車輛情境中的 AI 通訊與決策系統。
程式碼審查助理評估
測試程式碼審查 AI 的漏洞,包括程式碼分析、建議產生與儲存庫存取。
內容審核 AI 評估
評估 AI 內容審核系統的繞過技巧、誤報操控與對抗性內容產生。
客服代理紅隊演練
對具訂單系統、退款與客戶資料工具存取之客服代理進行紅隊演練。
教育 AI 家教評估
評估 AI 家教系統的內容安全、答案操控與學生資料暴露。
FinTech 聊天機器人安全評估
對處理敏感交易之金融服務聊天機器人進行完整安全評估。
政府服務 AI 評估
評估協助民眾使用政府服務之 AI 系統的資料隱私與操控風險。
醫療診斷 AI 評估
評估醫療診斷 AI 的安全關鍵漏洞與資料隱私合規性。
保險理賠 AI 評估
對處理保險理賠並提供核准建議之 AI 系統進行完整紅隊評估。
法律 AI 文件審閱評估
評估合約審閱之法律 AI 系統,檢視文件處理與權限提升漏洞。
醫療分流 AI 評估
評估 AI 醫療分流系統的安全關鍵漏洞與臨床決策操控風險。
多代理工作流程評估
對透過 A2A 協定溝通之專業代理組成的多代理系統進行紅隊演練。
AI 招募系統評估
對 AI 招募系統進行紅隊演練,檢視偏見利用、履歷注入與候選人資料隱私問題。
智慧家庭 AI 助理評估
對具裝置控制、使用者設定檔存取與購物能力之智慧家庭 AI 助理進行紅隊演練。
AI 供應鏈流水線評估
評估從資料擷取到模型部署之完整 ML 流水線,檢視供應鏈攻擊。
AI 交易機器人評估
對 AI 驅動交易系統進行紅隊演練,檢視市場操縱、資料洩漏與對抗性輸入攻擊。
自動化滲透測試代理評估
評估 AI 驅動的自動化滲透測試代理,檢視範圍違反、未授權動作與資料處理。
自動駕駛車輛規劃 AI 評估
透過對抗性情境注入,評估自動駕駛車輛規劃 AI 的安全邊界。
內容審核 AI 平台評估
評估 AI 內容審核系統的繞過技巧、漏報利用與偏見。
客戶資料平台 AI 評估
對整合客戶資料平台(處理 PII、行為資料與分群)之 AI 系統進行紅隊演練。
網路威脅情報 AI 評估
對處理 IOC、威脅報告與攻擊歸因之網路威脅情報 AI 進行紅隊演練。
資料分析 Copilot 評估
對具 SQL 產生能力、可存取企業資料庫之資料分析 Copilot 進行紅隊演練。
DevOps AI 助理安全評估
評估可存取 CI/CD 流水線、雲端基礎設施與部署系統之 DevOps AI 助理。
企業知識管理 AI 評估
評估具角色存取控制、文件權限與多租戶隔離之企業知識管理系統。
政府服務聊天機器人評估
對可存取稅務記錄、福利與身分資訊之政府公民服務聊天機器人進行紅隊演練。
AI 驅動事件回應系統評估
對 SOC 環境中可存取 SIEM、EDR 與工單系統之 AI 事件回應系統進行紅隊演練。
保險理賠機器人安全評估
對可存取保單與理賠資料之保險理賠處理聊天機器人進行完整安全評估。
法律合約審閱 AI 評估
對 AI 合約審閱系統進行紅隊演練,檢視律師—客戶特權違反、文件注入與幻覺條款。
醫療分流聊天機器人評估
對醫療分流聊天機器人進行紅隊演練,檢視危險醫療建議、資料暴露與安全關鍵失效。
多代理研究團隊評估
評估由研究員、撰寫者與審閱者代理組成、透過 A2A 協定溝通之多代理研究團隊。
房地產估價 AI 評估
評估 AI 房地產估價系統,檢視物業估值操控與資料暴露。
AI 招募助理評估
對篩選履歷、安排面試並存取候選人資料之 AI 招募助理進行紅隊演練。
智慧家庭 AI 助理評估 (模擬)
評估透過工具整合控制燈光、門鎖、攝影機與溫控之智慧家庭 AI 助理。
供應鏈最佳化 AI 評估
評估 AI 供應鏈最佳化系統,檢視需求預測與路徑決策的操控風險。
AI 交易助理安全評估
對可存取投資組合資料與下單執行之 AI 交易助理進行安全評估。
旅遊訂房代理紅隊演練
對可存取付款系統、會員計畫與個人資料之旅遊訂房 AI 代理進行紅隊演練。