# advanced
標記為「advanced」的 162 篇文章
代理記憶投毒
將惡意內容注入代理記憶系統(對話歷史、RAG 儲存與向量資料庫)以達成持續性跨工作階段攻陷的技術。
操弄推理鏈
影響 AI 代理的 chain-of-thought 推理,將其規劃、決策與工具選擇導向攻擊者期望結果的技術。
攻擊多代理系統
針對多代理架構的攻擊技術,涵蓋代理間注入、信任邊界破壞、連鎖入侵,以及 A2A 協定攻擊。
進階練習考試
涵蓋進階 AI 紅隊技術的 25 題練習考試:多模態攻擊、訓練管線利用、代理式系統攻擊、嵌入操弄與微調安全性。
模擬測驗 2:進階 AI 安全
涵蓋多模態攻擊、訓練管線安全、雲端 AI 安全、鑑識與治理的 25 題進階模擬測驗。
進階 AI 安全模擬測驗 1
進階模擬測驗涵蓋代理式利用、training 攻擊、前沿研究。
進階 AI 安全模擬測驗 2
Second advanced 模擬測驗 focusing on 多模態、雲端、pipeline 安全。
代理式 AI 安全模擬測驗 2
進階模擬測驗涵蓋代理記憶 poisoning、工作流利用、A2A 協定攻擊。
綜合進階模擬測驗
Full-spectrum advanced 模擬測驗涵蓋all major AI 紅隊ing domains。
進階 Defense 模擬測驗
模擬測驗涵蓋guardrail architecture、constitutional AI 防禦s、behavioral monitoring implementation。
進階治理模擬測驗 (評估)
進階治理模擬測驗涵蓋international regulation、audit methodologies、organizational frameworks。
進階提示詞注入模擬測驗
模擬測驗涵蓋indirect injection、多step chains、編碼繞過、semantic manipulation。
進階雲端 AI 安全評估
15 題進階評估,涵蓋 AWS、Azure、GCP 的雲端 AI 攻擊面:護欄繞過、知識庫利用、託管身分濫用、模型客製化風險與多雲攻擊路徑。
進階工具熟練度評估
進階評估涵蓋Garak、PyRIT、HarmBench、custom 工具開發 proficiency。
進階 Tradecraft 評估
評估涵蓋OPSEC、evasion、persistence、attribution avoidance、professional 紅隊 operations。
進階代理式利用評估
進階評估涵蓋MCP 利用 chains、多代理攻擊、A2A 協定 injection。
進階雲端 AI 安全評估 (評估)
進階評估涵蓋多雲端 AI 安全、IAM 錯誤組態、endpoint hardening。
進階程式碼生成安全評估
進階評估涵蓋autonomous coding agents、sandbox escapes、供應鏈攻擊。
進階 Embedding 安全評估
進階評估涵蓋embedding inversion、vector DB 攻擊、多模態 embedding 利用。
進階 AI 鑑識評估
進階評估涵蓋model behavior 鑑識、training data breach analysis、攻擊 attribution。
進階多模態攻擊評估
進階評估涵蓋cross-modal 攻擊、隱寫術 injection、多模態防禦 bypass。
進階訓練管線評估
進階評估涵蓋RLHF 利用、DPO 漏洞、federated learning 攻擊。
進階認證 Prep Guide
學習指南的advanced 認證涵蓋complex topics、動手實作 preparation strategies。
進階主題學習指南
面向進階從業者的學習指南,涵蓋 AI 安全研究方法、自動化、鑑識、新興攻擊向量與工具開發。
Capstone:雲端 AI 評估
Capstone 演練:對企業雲端 AI 部署的完整安全評估,涵蓋 IAM、網路、資料與模型層。
頂石專案:防禦系統實作
打造含輸入過濾、輸出監控、護欄、速率限制與日誌的完整 AI 防禦堆疊,再以自動化攻擊進行評估。
頂石專案:AI 事件回應演練
透過分流、調查、圍堵、修復與事後檢討報告,回應一場模擬的 AI 安全事件。
Capstone:開源 AI 安全貢獻
Capstone 專案:對開源 AI 安全工具或資料集做出有意義貢獻,包括新測試、文件或程式碼改進。
頂石專案:訓練管道攻擊與防禦
透過資料投毒與後門植入攻擊模型訓練管道,再建置偵測並阻止這些攻擊的防禦機制。
頂石專案:打造 AI 安全掃描器
設計並實作自動化 AI 安全測試工具,支援提示詞注入偵測、越獄測試與輸出分析。
2026 秋季 CTF:進階攻擊研究
需新穎技術發展、研究級問題解決與 AI 系統中原創漏洞發現之專家級 CTF。
季度 CTF:2026 夏季進階威脅
2026 夏季 CTF 聚焦於進階威脅情境,包括供應鏈攻擊、多模型利用與基礎設施入侵。
訓練資料擷取(進階)
從已部署的語言模型中逐字擷取訓練資料的進階技術。
基於梯度的攻擊解析
如何使用梯度為大型語言模型打造對抗輸入——以易懂數學與實務範例解釋 FGSM、PGD 與 GCG 攻擊。
縮放定律、湧現與能力躍升
縮放定律如何預測模型效能、湧現能力為何造成不可預期的安全特性,以及沉睡能力與湧現式對齊失誤對紅隊的意涵。
進階 AI 紅隊實驗室
進階動手實驗室,涵蓋自動化攻擊演算法、正式環境紅隊工具、多模型測試以及針對資深 AI 安全從業者的精密利用技術。
進階 RAG 操控實驗室
執行精密的RAG 操控,包括 嵌入空間攻擊,中繼資料投毒,檢索演算法作弊.
對抗性穩健度評估
建構完整的adversarial 穩健度評估 framework,用於評估模型安全姿態.
實作:對齊邊界探測
系統性 probe 對齊 boundaries to map the exact thresholds where safety training engages and disengages.
實作:攻擊鏈構建
動手實作 for chaining three or more distinct vulnerabilities into a complete exploit sequence that achieves objectives impossible with any single technique alone.
實作:音訊對抗樣本
動手實作 for crafting 對抗性 audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
AutoDAN 實作 實驗室
實作the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.
Automated Defense Evaluation Framework
建構 an automated framework to evaluate defensive measures across attack categories.
Lab: Building an Automated Red Team Pipeline
建構 a complete automated 紅隊演練 pipeline with attack generation, execution, scoring, and reporting.
Lab: Backdoor Persistence Through Safety Training
測試 whether fine-tuned 後門s persist through subsequent safety training rounds and RLHF 對齊.
實作:雲端 AI 安全評估
Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.
實驗室:雲端 AI 評估
動手實驗室,主題為conducting an 端對端 安全評估 of a 雲端部署 AI system,包括 基礎設施檢視,API 測試,模型安全 evaluation,資料流 analysis.
Lab: Constitutional AI Bypass Techniques
測試 and bypass Constitutional AI safety mechanisms by exploiting the critique-revision training methodology.
實作:跨語言遷移攻擊
開發 attacks in one language that transfer to others by exploiting shared multilingual representation spaces.
Cross-Model Transfer Attacks
開發 attacks on open-source models that transfer to closed-source commercial APIs.
Cross-Plugin 利用 鏈
鏈 exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.
Lab: Training Curriculum Poisoning
Exploit training data ordering and curriculum learning to amplify the impact of small numbers of poisoned examples.
Custom 安全 Classifier 訓練
Train a custom input safety classifier與then develop payloads that reliably evade it to underst與classifier limitations.
Lab: Custom Test Harness for Specific Applications
建構 a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.
防禦 vs Red Team War Game
Alternate between attacking與defending an LLM application to develop skills in both offensive與defensive operations.
Differential Testing Across Models
Use differential testing to find behavior inconsistencies across model providers.
Lab: Distributed Training Attack Simulation
Simulate attacks on distributed training infrastructure including 梯度 投毒 and aggregation manipulation.
Embedding Adversarial Perturbation
Craft 對抗性 inputs that produce target 嵌入 for retrieval manipulation.
實作:評估框架操控
Demonstrate how to game safety evaluation frameworks to produce artificially high safety scores while retaining vulnerabilities.
Lab: Federated Learning Poisoning Attacks
Execute model 投毒 attacks in a federated learning simulation by manipulating local model updates.
Fine-Tuning Backdoor Insertion
Insert a triggered 後門 during 微調 that activates on specific input patterns.
Lab: Gradient-Based Attack Implementation
實作 梯度-based 對抗性 attacks on open-weight models including PGD, FGSM adapted for text inputs.
Gradient-Guided Data 投毒
Use gradient information from open-source models to craft optimally poisoned training examples.
Guardrail Bypass Ensemble
建構 an ensemble attack system that combines multiple bypass 技術 for robustness.
Lab: Inference-Time Compute Exploitation
Exploit inference-time compute scaling to manipulate reasoning depth and resource consumption in thinking models.
Lab: Jailbreak Transferability Analysis
分析 越獄 transferability across model families to discover universal 漏洞 patterns.
Lab: MCP Full Exploitation
動手實作 for conducting a complete Model Context Protocol server compromise including tool enumeration, permission escalation, cross-tool attacks, and data exfiltration through MCP channels.
Multi-Hop MCP Server Exploitation
串聯 exploits across multiple MCP servers to achieve lateral movement in agent systems.
Lab: Long-Term Memory Extraction
Extract information from long-term agent memory stores through crafted queries and memory retrieval manipulation.
模型 Distillation 安全 實驗室
萃取model capabilities through distillation techniques using only 黑盒 API access.
Lab: Model Merging Security Analysis
分析 security implications of model merging 技術 and test for 後門 propagation through merged model weights.
Lab: Model Steering with Activation Vectors
Use activation steering vectors to control model behavior without prompt modification for 安全測試.
Lab: Multi-Modal Attack Chain Construction
Construct attack chains that span text, image, and audio modalities to exploit cross-modal processing gaps.
Multi-Objective 攻擊 優化
Optimize attack payloads for multiple simultaneous objectives: jailbreaking,data extraction,防禦規避.
Lab: Advanced Multi-Turn Attacks
動手實作 for executing 20+ turn conversation steering strategies that gradually erode AI safety boundaries to extract protected information through patient, escalating dialogue.
Multi-Turn Crescendo Attack
實作 Microsoft's Crescendo attack pattern for gradual multi-turn 越獄ing.
Multimodal 攻擊 鏈 實驗室
鏈 attacks across text,image,structured data modalities to exploit multimodal system vulnerabilities.
Lab: Neural Network Trojan Detection
實作 and test neural network trojan detection methods including activation clustering and spectral analysis.
Output Constraint Exploitation
Exploit structured output requirements to force models into revealing restricted information.
Production Environment Simulation 實驗室
測試attacks 對抗 a simulated production environment with realistic logging,monitoring,alerting.
Lab: Building a Prompt Injection Compiler
建構 a compiler that transforms high-level attack specifications into optimized 提示詞注入 payloads.
Prompt Leakage via Log Probabilities
Extract 系統提示詞 information using 符元 log probability analysis.
實作:紫隊演練
Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.
Lab: Purple Team Exercise (Advanced Lab)
動手實作 for conducting simultaneous attack and defense operations against an AI system with real-time metrics tracking, adaptive defense deployment, and coordinated red-blue team workflows.
Lab: Quantization Security Testing
測試 behavioral differences between full-precision and quantized models to discover quantization-induced vulnerabilities.
Advanced RAG Poisoning Techniques
Execute sophisticated RAG 投毒 including 梯度-guided document crafting.
Lab: RAG Full Chain Attack
動手實作 for executing a complete RAG attack chain from document injection through retrieval manipulation to data exfiltration, targeting every stage of the Retrieval-Augmented Generation pipeline.
實作:推理軌跡操控
Manipulate chain-of-thought reasoning traces to inject false premises and redirect model conclusions.
實作:進階紅隊編排
建構 an advanced red team orchestration system that coordinates multiple attack agents against a defended target.
Lab: Representation Engineering for Security
Use representation engineering to analyze and manipulate internal model representations for security research.
Representation Probing for Vulnerabilities
Probe internal model representations to identify exploitable features與develop representation-level attacks.
Lab: Advanced Reward Hacking Techniques
開發 reward hacking exploits that manipulate RLHF reward models through 對抗性 optimization.
Reward 模型 分析 實驗室
Analyze與exploit reward model biases to craft responses that score high on safety while embedding harmful content.
Reward Model Gaming (Advanced Lab)
Craft inputs that score highly on reward models while containing 對抗性 content.
Lab: RLHF Training Exploitation
Exploit reward model weaknesses to craft inputs that score highly on safety metrics while achieving 對抗性 objectives.
Lab: Safety Classifier Model Extraction
Extract the decision boundary of safety classifiers through systematic probing to craft maximally evasive payloads.
Safety Classifier Inversion
Reverse-engineer safety classifier decision boundaries through systematic probing.
實作:稀疏注意力漏洞利用
Exploit sparse attention patterns in long-context models to hide injection payloads in low-attention regions.
Lab: Model Supply Chain Poisoning
Simulate model supply chain attacks by injecting 後門s into model weights distributed through public registries.
Lab: Token-Level Adversarial Optimization
實作 符元-level 對抗性 optimization to discover minimal perturbations that bypass safety training.
實作:工具鏈橫向移動
串聯 tool call exploits to achieve lateral movement across services connected to LLM agent systems.
Lab: Transfer Attack Development
動手實作 for crafting 對抗性 prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.
Lab: Transfer Attack Development (Advanced Lab)
開發 對抗性 attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box 漏洞利用.
A2A 協定任務注入
注入惡意任務 into 代理對代理(代理-to-代理)協定 通訊通道,以重新導向 多代理工作流程.
代理 Workflow State Tampering
Manipulate agent workflow state machines to skip validation steps與reach privileged execution paths.
Attention Pattern 操控
Craft inputs that manipulate transformer attention patterns to prioritize adversarial content over safety instructions.
Audio 注入 via Speech-to-Text 模型
Craft adversarial audio that embeds 提示詞注入 payloads when transcribed by speech-to-text models.
AutoDAN 實作 and 測試
實作the AutoDAN methodology for generating human-readable stealthy jailbreak prompts using gradient guidance.
Constitutional Classifier 繞過
發展techniques to bypass Anthropic-style constitutional classifiers through adversarial input crafting.
Cross-模型 GCG 遷移 Attacks
Generate adversarial suffixes on open-source models與test their transferability to commercial APIs.
Building a Red Team Results Dashboard
建構real-time dashboard for tracking與visualizing red team campaign results across targets與techniques.
Differential 模型 Behavior 分析
Use differential testing across model versions與providers to discover inconsistent safety behaviors.
嵌入 Inversion 攻擊 實作
實作embedding inversion to recover original text from 向量資料庫 embeddings.
微調 對齊 移除 攻擊
Use fine-tuning API access to systematically remove safety alignment with minimal training examples.
GUI 代理 Visual 注入 攻擊
注入adversarial content into screenshots與UI elements processed by computer-use AI agents.
HarmBench Custom 攻擊 Submission
Develop與evaluate custom attack methods 對抗 the HarmBench standardized evaluation framework.
LangChain CVE 利用 實驗室
Reproduce與analyze LangChain CVEs,包括 CVE-2023-29374與CVE-2023-36258 in a safe lab environment.
LoRA 後門 Insertion 攻擊
Insert triggered backdoors through LoRA fine-tuning that activate on specific input patterns while passing safety evals.
MCP Multi-Server Pivot 鏈
鏈 exploits across multiple MCP servers to achieve lateral movement與capability escalation in agent systems.
代理 記憶 注入 for Persistent Access
注入persistent instructions into agent memory systems that survive across conversation sessions.
多代理 Trust Boundary 利用
利用 trust boundaries between cooperating agents to escalate privileges與access restricted capabilities.
Multi-Modal 攻擊 鏈 編排
Orchestrate attacks across text,image,document modalities to bypass per-modality safety filters.
Promptfoo Red Team Test Suite Development
Build完整的 red team test suites in Promptfoo with custom graders與multi-model targeting.
Quantization-Induced 安全 Regression 測試
測試how model quantization (INT8,INT4,GPTQ) degrades safety alignment與introduces exploitable gaps.
RAG Access Control 繞過
繞過document-level access controls in enterprise RAG systems through query manipulation與context injection.
Reasoning Trace 利用 in CoT 模型
利用 visible chain-of-thought reasoning traces in models like o1與DeepSeek-R1 to manipulate outputs.
Representation Probing for Vulnerability Discovery
Probe model internal representations to discover exploitable features與latent vulnerability patterns.
對抗性 Reward 模型 利用
Craft inputs that exploit reward model weaknesses to achieve high safety scores while containing harmful content.
安全 Classifier Reverse Engineering
Reverse-engineer a safety classifier's decision boundaries through systematic adversarial probing.
Code 代理 Sandbox Escape Techniques
Develop與test sandbox escape techniques 對抗 code execution environments in AI coding assistants.
Steganographic Payload 嵌入 in Images
Hide 提示詞注入 payloads in images using steganographic techniques undetectable to human observers.
訓練 Data 萃取 from Production LLMs
實作Carlini et al.'s techniques to extract memorized training data from production language model APIs.
LLM 浮水印 偵測 and 移除
Detect與remove statistical watermarks from LLM-generated text while preserving content quality.
Lab: LLM Watermark Detection and Removal
Detect and analyze LLM text 浮水印s using statistical methods and test 浮水印 removal through paraphrasing.
CTF:雲端 AI 劫案
透過利用組態錯誤、SSRF、metadata 端點與模型層漏洞,從部署於雲端的 AI 應用中擷取機密。
CTF:Defense Breaker
繞過逐步增強、保護 AI 系統的防禦機制。每一關加入新的防禦層同時保留所有先前防禦,考驗你適應與串接繞過技術的能力。
CTF:Fine-Tune 偵探
透過行為分析、權重檢視與激活模式檢查,偵測微調語言模型中的後門。練習於部署前辨識被汙染模型所需的鑑識技術。
CTF:AI 鑑識調查
分析日誌、模型輸出與系統產物以重建 AI 安全事件。發展針對 AI 特有攻擊模式、資料外洩痕跡,與對抗提示偵測之鑑識分析技能。
CTF:供應鏈攻擊
尋找並利用 ML 供應鏈漏洞,包括遭入侵相依、被投毒模型、被植後門訓練資料與惡意模型檔。練習 ML 特有的供應鏈安全評估。
進階 Encoding 鏈 Attacks
鏈 multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
實驗:進階函式呼叫利用
利用進階的函式呼叫模式,包含巢狀呼叫、平行執行與 Schema 操弄。
實驗室: 進階 RAG 投毒
動手實驗室,主題為crafting documents that reliably get retrieved與influence RAG responses for specific target queries.
進階 Token Smuggling
Use Unicode homoglyphs,zero-width characters,tokenizer quirks to smuggle payloads.
模擬:AI 漏洞賞金
在模擬 AI 漏洞賞金計畫中尋找並回報漏洞,練習專業的漏洞揭露與具賞金資格的報告撰寫。
Simulation: Red vs Blue
Competitive exercise where teams alternate between attacking and defending an AI application, scoring points for successful attacks and effective defenses.
防禦規避
繞過為保護大型語言模型應用程式而部署之安全過濾器、內容分類器、護欄與偵測系統的進階技術。
Continuous 紅隊演練 Programs
Designing and operating ongoing AI red team programs with automated testing pipelines, metrics dashboards, KPI frameworks, alert-driven assessments, and integration with CI/CD and model deployment workflows.
Purple Teaming for AI
Collaborative attack-defense exercises for AI systems: structuring purple team engagements, real-time knowledge transfer, joint attack simulation, and measuring defensive improvement through iterative testing.
進階訓練漏洞
AI 訓練中的進階安全威脅——涵蓋聯邦學習攻擊、模型合併風險、水印移除、合成資料投毒、遺忘攻擊與持續學習漏洞。
複合攻擊鏈詳解
將多種提示詞注入技術組合成複合攻擊以擊敗分層防禦,構建利用每種技術各自優勢的攻擊鏈。
Advanced Context Manipulation 詳解
Manipulate conversation context through history injection, context overflow, and attention weight exploitation.
Instruction Hierarchy Bypass
進階 techniques to bypass instruction priority and hierarchy enforcement in language models, exploiting conflicts between system, user, and assistant-level directives.
Multi-Turn Progressive Injection
Gradually escalating prompt injection across conversation turns to build compliance, using psychological techniques like foot-in-the-door and norm erosion.
Advanced Prompt Leaking 詳解
Advanced techniques for extracting system prompts including iterative reconstruction and side-channel methods.
Recursive Injection Chains
Creating self-reinforcing injection chains that amplify across conversation turns, building compound prompts where each step strengthens the next injection's effectiveness.
Instruction Hierarchy Enforcement (防禦 導覽)
Step-by-step walkthrough for enforcing instruction priority in LLM applications, ensuring system-level instructions always take precedence over user inputs through privilege separation, instruction tagging, and validation layers.
NeMo Guardrails Advanced Configuration
Advanced NeMo Guardrails configuration including custom actions, knowledge base integration, and multi-model pipelines.
Garak 進階配置指南
深入 Garak 配置系統(含自訂探測器、偵測器與生成器)。
Writing Custom Garak Probes for Novel 攻擊 Vectors
進階 walkthrough for building custom Garak probes that target novel and emerging attack vectors, covering probe architecture, payload generation, detector pairing, and integration into automated scanning pipelines.
Building Custom Garak Detectors
進階 walkthrough on creating custom garak detectors for specific success criteria, including regex-based detectors, ML-based classifiers, multi-signal scoring, and integration with external evaluation services.
Writing Garak Generator Plugins for Custom API Targets
進階 walkthrough on writing garak generator plugins to connect to custom API endpoints, proprietary model servers, and non-standard inference interfaces for vulnerability scanning.
Setting Up Garak Probes for MCP 工具 Interactions
進階 walkthrough on configuring garak probes that target 模型 Context Protocol (MCP) tool interactions, testing for tool misuse, privilege escalation through tools, and data exfiltration via tool calls.