# walkthroughs

walkthroughsa2aprotocolinjection

A2A 協定注入演練

利用 Google 的 Agent-to-Agent 協定進行跨代理提示詞注入的詳解。

walkthroughsadversarial-suffixcraftingoptimization

對抗性後綴打造演練

以梯度式與無梯度最佳化方法為黑盒模型打造對抗性後綴。

walkthroughsapiabuse-chainsmulti-step

API 濫用鏈攻擊詳解

鏈接多個 API 呼叫以實現任何單次呼叫都無法允許的未授權行動。

chainingattacksexploitationapiwalkthroughs

API 鏈式利用詳解

詳解代理系統中鏈接多個 API 呼叫以實現多步驟未授權行動的方法。

walkthroughsattentionhijackingtransformer

注意力劫持攻擊詳解

Hijack transformer attention mechanisms to redirect model focus toward adversarial instructions in the context.

walkthroughsautodanstealthyjailbreaking

AutoDAN 隱蔽越獄詳解

Walkthrough of generating stealthy, human-readable jailbreak prompts using the AutoDAN methodology.

walkthroughsbatch-processinginjectionpipeline

批次處理注入詳解

Inject payloads through batch processing pipelines where individual items are processed without isolation.

walkthroughscachepoisoninginfrastructure

LLM 快取投毒詳解

Poison LLM response caches to serve adversarial content to other users without direct injection.

attackscompetitivewalkthroughwalkthroughsjailbreak

競賽風格越獄技術詳解

Walkthrough of jailbreak techniques used in AI security competitions and CTF events.

attacksinjectionwalkthroughusecomputerwalkthroughs

Computer Use Agent Injection 詳解

Walkthrough of injecting prompts through UI elements and screenshots processed by computer-use agents.

walkthroughscontextmanipulationadvanced

Advanced Context Manipulation 詳解

Manipulate conversation context through history injection, context overflow, and attention weight exploitation.

diveattackscontextdeepmanipulationwalkthroughs

上下文操控深度探討

Advanced walkthrough of context window manipulation techniques including attention dilution and instruction priority shifting.

walkthroughscrescendomulti-turnjailbreaking

漸進式多輪越獄詳解

Step-by-step walkthrough of the Crescendo technique for gradually escalating multi-turn jailbreaks.

walkthroughscross-pluginexfiltrationagent

Cross-Plugin Data Exfiltration 詳解

Walkthrough of chaining multiple plugins/tools to exfiltrate data from LLM agent systems.

attacksharvestingdatawalkthroughwalkthroughs

Data Harvesting Through LLM Apps

Complete walkthrough of systematic data extraction from LLM applications using various exfiltration channels.

walkthroughsdocumentinjectionfile-processing

Document-Based Injection 詳解

Inject prompts through documents processed by LLM applications including PDFs, spreadsheets, and presentations.

walkthroughsembeddingcollisionretrieval

Embedding Collision 攻擊詳解

Craft documents that collide in embedding space with target queries to hijack RAG retrieval results.

walkthroughsembeddinginversionattack

Embedding Inversion 攻擊詳解

Walkthrough of inverting text embeddings to recover original documents from vector databases.

walkthroughsencodingchain-attacksevasion

Encoding Chain 攻擊詳解

Chain multiple encoding transformations to bypass input filters that only decode one layer of encoding.

attacksencodingchainbypasswalkthroughwalkthroughs

編碼鏈繞過詳解

Walkthrough of chaining Base64, URL encoding, and Unicode tricks to bypass multi-layer input filters.

walkthroughsfew-shotpoisoningin-context-learning

Few-Shot Example Poisoning 詳解

Poison few-shot examples in prompts to establish behavioral patterns that override system instructions.

walkthroughsfine-tuningsafety-bypasstraining

Fine-Tuning Safety Bypass 詳解

Walkthrough of using fine-tuning API access to remove safety behaviors from aligned models.

walkthroughsfunction-callingexploitationagent

函式呼叫利用指南

Complete walkthrough of exploiting function calling in OpenAI, Anthropic, and Google AI APIs.

walkthroughsfunction-schemapoisoningtool-use

Function Schema Poisoning 詳解

Poison function schemas to inject hidden instructions that redirect model tool selection and parameter filling.

walkthroughsgcgadversarialoptimization

GCG Adversarial Suffix 攻擊詳解

Complete walkthrough of implementing the Greedy Coordinate Gradient adversarial suffix attack.

walkthroughsindirect-injectionreal-worldweb

Real-World Indirect Prompt Injection

Walkthrough of planting and triggering indirect prompt injection in web-browsing AI assistants.

walkthroughsjsoninjectionstructured-output

JSON Injection 攻擊詳解

Exploit JSON parsing and generation in LLM applications to inject payloads through structured data boundaries.

walkthroughslogit-biasexploitationapi-parameters

Logit Bias Exploitation 詳解

Exploit logit bias API parameters to force models toward specific token sequences and bypass safety filters.

walkthroughsmany-shotjailbreakinganthropic

Many-Shot Jailbreaking 詳解

Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.

walkthroughsmarkdowninjectionrendering

Advanced Markdown Injection 詳解

Inject Markdown that triggers data exfiltration through image rendering, link generation, and code block escape.

walkthroughsmcptool-poisoningagent

MCP Tool Poisoning 攻擊詳解

Walkthrough of exploiting MCP tool descriptions to redirect agent behavior via hidden instructions.

attacksmemorywalkthroughpersistencewalkthroughs

Memory Persistence 攻擊詳解

Walkthrough of achieving persistent memory manipulation in agent systems for cross-session influence.

walkthroughscontext-overflowattentioninjection

Model Context Window Overflow 詳解

Overflow the context window to push safety instructions outside the effective attention range.

walkthroughsmodel-extractionstealingapi

Model Extraction 攻擊詳解

Walkthrough of extracting model weights/behavior through systematic API querying.

walkthroughsmultimodaldocumentcombined-attack

Multi-Modal Document 攻擊詳解

Combine visual and textual injection in documents processed by multimodal LLM applications.

walkthroughsmulti-turnpersonaescalation

Multi-Turn Persona Escalation 詳解

Gradually escalate through persona manipulation across multiple turns to achieve full safety bypass.

walkthroughsmultimodalimageinjection

多模態圖像注入詳解

Step-by-step walkthrough of embedding adversarial prompts in images for vision model exploitation.

walkthroughspairautomatedjailbreaking

PAIR 自動越獄詳解

End-to-end walkthrough of the PAIR attack methodology for automated black-box jailbreaking.

attacksagentsprivilegeescalationwalkthroughwalkthroughs

Agent Privilege Escalation 詳解

Walkthrough of escalating privileges in multi-agent systems through trust chain exploitation.

walkthroughsprompt-leakingadvancedextraction

Advanced Prompt Leaking 詳解

Advanced techniques for extracting system prompts including iterative reconstruction and side-channel methods.

walkthroughsragpoisoningdata

RAG Poisoning End-to-End 詳解

Complete walkthrough of poisoning a RAG system from document injection through information extraction.

walkthroughsreasoningexploitationchain-of-thought

Reasoning Model Exploitation 詳解

Exploit extended thinking in reasoning models to inject false premises and manipulate conclusion generation.

walkthroughsregexfilter-bypassevasion

Regex Filter Bypass 詳解

Systematically bypass regex-based input filters using Unicode tricks, encoding, and pattern-specific evasion.

walkthroughsrole-confusioninstruction-hierarchyattacks

Role Confusion 攻擊詳解

Exploit role confusion between system, user, and assistant messages to override safety instructions.

attacksroleexploitationconfusionwalkthroughs

Role Confusion Exploitation 詳解

Exploit role boundaries in chat APIs by injecting assistant-role messages and system-level instructions.

attackscamouflagesemanticwalkthroughwalkthroughs

語意偽裝詳解

Walkthrough of crafting semantically camouflaged injections that evade both classifiers and human review.

walkthroughssemanticdeceptionattacks

Semantic Deception 攻擊詳解

Craft semantically deceptive prompts that appear benign to classifiers while achieving adversarial objectives.

walkthroughssleeper-agentsdetectionalignment

Sleeper Agent Detection 詳解

Walkthrough of detecting deceptive sleeper agent behaviors in fine-tuned language models.

walkthroughssupply-chaininjectiondependencies

Supply Chain Prompt Injection 詳解

Plant injection payloads in upstream data sources consumed by LLM applications including packages and documentation.

attackssupplychainpoisoningwalkthroughwalkthroughs

Model Supply Chain Poisoning

Walkthrough of poisoning ML supply chains through dependency confusion, model weight manipulation, and hub attacks.

walkthroughssynthetic-identityinjectionsocial-engineering

Synthetic Identity Injection 詳解

Create synthetic identities that exploit LLM trust mechanisms to achieve elevated instruction priority.

walkthroughstaptree-of-attacksjailbreaking

TAP Tree of 攻擊s 詳解

Detailed walkthrough of building and running Tree of Attacks for systematic jailbreak generation.

walkthroughsthought-injectionreasoningchain-of-thought

Thought Injection in Reasoning Models 詳解

Inject adversarial thoughts into the reasoning chain of thinking models to manipulate final outputs.

walkthroughstimingside-channelinformation-leakage

Timing Side-Channel 攻擊詳解

Extract information from LLM applications through timing differences in response generation and safety filter processing.

attacksoptimizationtokenwalkthroughs

Token-Level 攻擊 Optimization 詳解

Walkthrough of optimizing adversarial token sequences for maximum jailbreak effectiveness.

walkthroughstool-shadowingmcpagent

Tool Shadowing 攻擊詳解

walkthroughsvision-modelattacksmultimodal

視覺模型攻擊詳解

Attack vision-language models through adversarial images with embedded text, perturbations, and visual trojans.

attacksvisionmodelwalkthroughwalkthroughs

Vision Model 攻擊詳解 (Attack 詳解)

Step-by-step walkthrough of visual prompt injection, adversarial images, and OCR exploitation in vision-language models.

walkthroughsa2aimpersonationmulti-agent

A2A Agent Impersonation 詳解

Walkthrough of impersonating a trusted agent in an A2A network to intercept and redirect delegated tasks.

walkthroughsapi-parametersexploitationtechnique

API Parameter Exploitation 詳解

Walkthrough of exploiting undocumented and misconfigured API parameters in LLM service endpoints.

walkthroughsbatch-apiexploitationtechnique

Batch API Exploitation 詳解

Walkthrough of exploiting batch processing APIs for large-scale prompt injection and output analysis.

walkthroughschain-of-thoughthijackingreasoning

Chain-of-Thought Hijacking 詳解

Walkthrough of hijacking visible reasoning traces in CoT models to redirect conclusions and bypass safety checks.

walkthroughscognitive-overloadjailbreakingtechnique

Cognitive Overload Jailbreak 詳解

Step-by-step walkthrough of the cognitive overload technique that overwhelms safety training with complex nested instructions.

walkthroughscontext-virtualizationjailbreakingtechnique

Context Virtualization 攻擊詳解

Walkthrough of creating virtualized contexts that make the model believe it is operating in a different environment.

walkthroughsconversation-poisoningmulti-turntechnique

Conversation Poisoning 攻擊詳解

Step-by-step walkthrough of poisoning conversation history to alter model behavior in future turns.

walkthroughsdocument-parsingexploit-chainpdf

Document Parsing Exploit Chain 詳解

End-to-end walkthrough of exploiting document parsers to inject instructions through PDFs, DOCX, and HTML files.

walkthroughsfew-shotscalingjailbreaking

Few-Shot 攻擊 Scaling Analysis

Detailed analysis of how few-shot examples scale to influence model behavior, from 2-shot to many-shot regime.

walkthroughssteganographyimage-injectionmultimodal

Image Steganography Injection 詳解

Walkthrough of hiding prompt injection payloads in images using steganographic techniques for vision model attacks.

walkthroughsinstruction-hierarchyexploitationtechnique

Instruction Hierarchy Exploitation 詳解

Walkthrough of exploiting how models prioritize system, user, and tool instructions to override safety directives.

walkthroughsknowledge-graphinjectionattack

Knowledge Graph Injection 攻擊詳解

Walkthrough of injecting adversarial facts into knowledge graphs consumed by LLM-based reasoning systems.

walkthroughslangchainexploit-chaincve

LangChain Exploit Chain 詳解

Walkthrough of chaining LangChain CVEs for remote code execution from prompt injection through to shell access.

walkthroughsmcpsampling-apiabuse

MCP Sampling API Abuse 詳解

Walkthrough of exploiting the MCP sampling API to manipulate how servers request and process LLM completions.

walkthroughsmemory-injectionpersistenceagent

Memory Injection and Persistence 詳解

Walkthrough of injecting persistent instructions into agent memory systems that survive across sessions.

walkthroughsmulti-providerattack-campaigntechnique

Multi-Provider 攻擊 Campaign 詳解

End-to-end walkthrough of conducting an attack campaign across OpenAI, Anthropic, Google, and open-source models.

walkthroughsoutput-formatexfiltrationtechnique

Output Format Exfiltration Techniques 詳解

Walkthrough of using output format manipulation to exfiltrate data through code blocks, markdown, and structured outputs.

walkthroughsraghybrid-searchpoisoning

RAG Hybrid Search Poisoning 詳解

Walkthrough of poisoning both vector and keyword search in hybrid RAG architectures for maximum retrieval influence.

walkthroughsrecursiveinjectionself-replicating

Recursive Prompt Injection 詳解

Walkthrough of creating self-replicating injection payloads that persist through model output-to-input loops.

walkthroughsrefusal-suppressionjailbreakingtechnique

Refusal Suppression Technique 詳解

Walkthrough of refusal suppression techniques that prime models to avoid safety refusal patterns.

walkthroughssemantic-obfuscationevasiontechnique

Semantic Obfuscation 詳解

Walkthrough of semantically obfuscating adversarial payloads so they appear benign to both classifiers and humans.

walkthroughsstructured-outputdata-leakagetechnique

Structured Output Data Leakage 詳解

Walkthrough of using structured output requirements to extract sensitive data embedded in model responses.

walkthroughstool-useconfusionfunction-calling

Tool Use Confusion 攻擊詳解

Walkthrough of confusing model tool-use decisions to invoke unintended functions or skip safety-critical tools.

walkthroughsunicodenormalizationbypass

Unicode Normalization Bypass 詳解

Step-by-step guide to exploiting Unicode normalization differences between input filters and model tokenizers.

walkthroughsvoice-aiinjectionspeech

Voice AI Prompt Injection 詳解

Walkthrough of injecting prompts into voice-based AI assistants through adversarial audio and ultrasonic signals.

walkthroughswatermarkanalysisdetection

LLM Watermark Analysis 詳解

Walkthrough of detecting and analyzing watermarks in LLM-generated text using statistical methods.

walkthroughsxmlinjectionstructured-data

XML Injection in LLM Systems 詳解

Exploit XML parsing in LLM application pipelines to inject instructions through entity expansion and CDATA sections.

attacksxmlinjectionwalkthroughjsonwalkthroughs

XML and JSON Injection in LLM Apps

Walkthrough of exploiting XML and JSON parsing in LLM applications for injection and data manipulation.

walkthroughsdefenserobustness-testingframework

對抗韌性測試框架

建構一個框架，持續測試已部署的 LLM 防禦機制的對抗韌性。

walkthroughsdefenseadversarial-trainingrobustness

用於 LLM 防禦的對抗訓練（防禦逐步指南）

實作對抗訓練技術，提升 LLM 對抗提示注入與越獄的韌性。

walkthroughsdefenseagent-permissionsboundaries

代理權限邊界的強制執行

為 LLM 代理實作細緻的權限邊界，依脈絡與使用者角色限制工具存取。

automatedtestingwalkthroughsdefense

Automated 防禦 Regression Testing

Build automated regression tests for LLM security defenses using attack replay and continuous monitoring.

walkthroughsdefenseautomated-red-teamcontinuous

自動化紅隊防禦迴路

建構一個自動化的持續紅隊測試迴路，以迭代方式發現並修補漏洞。

walkthroughsdefensebehavioral-anomalydetection

LLM 的行為異常偵測

實作行為異常偵測，辨識模型輸出何時偏離預期的安全樣貌。

walkthroughsdefenseclassifierinput-safety

Building an Input Safety Classifier

Build a production-quality input classifier that detects prompt injection attempts.

walkthroughsdefenseconstitutional-aialignment

Constitutional AI Implementation 指南

Implement constitutional AI principles in a custom fine-tuning and RLHF pipeline.

walkthroughsdefensecontext-isolationpatterns

脈絡隔離模式實作

實作脈絡隔離模式，防止系統提示與使用者資料之間的指令外洩。

walkthroughsdefensedlpdata-protection

LLM 應用的資料外洩防護

為 LLM 應用建構資料外洩防護層，防止輸出中暴露敏感資料。

benchmarkingsystemwalkthroughsdefense

防禦基準測試系統詳解

Build a benchmarking system to continuously evaluate defense effectiveness against known attack classes.

walkthroughsdefensedefense-in-deptharchitecture

Defense-in-Depth Architecture for LLM Apps

Design and implement a complete defense-in-depth architecture for production LLM applications.

setupdefenseresponseincidentwalkthroughs

AI Incident Response System 設置

Set up comprehensive incident response capabilities for AI-specific security incidents.

walkthroughsdefenseembedding-firewallinput-filtering

輸入嵌入防火牆部署

部署基於嵌入的防火牆，透過與已知攻擊模式的語意相似度偵測注入嘗試。

walkthroughsdefenseloggingforensics

用於防禦的安全輸入／輸出記錄

為 LLM 的輸入／輸出配對實作安全的記錄機制，在不洩漏敏感資料的前提下支援鑑識分析。

walkthroughsdefensellm-firewallarchitecture

LLM 防火牆架構設計（防禦逐步指南）

設計並實作一個全面的 LLM 防火牆，具備輸入分析、輸出過濾與行為監控。

deploymentdefensellmhoneypotwalkthroughs

LLM 蜜罐部署詳解

Deploy LLM honeypots to detect and study attacker behavior patterns and techniques.

defensemonitoringmodelbehaviorwalkthroughs

Model Behavior Monitoring 設置

Set up comprehensive model behavior monitoring to detect drift, anomalies, and potential compromise.

walkthroughsdefensemonitoringabuse-detection

Monitoring LLM Applications for Abuse

Build a monitoring and alerting system to detect ongoing attacks against LLM applications.

walkthroughsdefensemulti-modelconsensus

多模型安全共識

實作安全共識機制，要求多個模型達成一致才能執行敏感動作。

walkthroughsdefenseoutput-filteringpipeline

Output Filtering Pipeline Design

Design and implement a multi-stage output filtering pipeline for LLM applications.

walkthroughsdefenseoutput-groundingverification

輸出基準對齊與驗證

實作輸出基準對齊驗證，確保 LLM 回應在事實上有所提供的脈絡支持。

defenseimplementationarmorpromptwalkthroughs

Prompt Armor Implementation 指南

Implement a comprehensive prompt armoring system with instruction isolation, delimiter hardening, and priority enforcement.

walkthroughsdefensehoneypotdetection

提示注入蜜罐部署

部署蜜罐提示與金絲雀資料，以偵測並刻畫提示注入嘗試。

walkthroughsdefenseragsanitization

RAG Input Sanitization 詳解

Implement input sanitization for RAG systems to prevent document-based injection attacks.

defenserealtimedetectionattackwalkthroughs

即時攻擊偵測系統詳解

Build a real-time attack detection system that monitors LLM interactions for adversarial patterns.

walkthroughsdefensered-teamfeedback-loop

Red Team-Defense Feedback Loop

Build a continuous red team-defense improvement loop with automated testing and metric tracking.

walkthroughsdefensewatermarkingtraceability

回應浮水印實作

實作回應浮水印，以實現可追溯性並偵測 LLM 輸出的未授權重製。

walkthroughsdefenseruntime-monitorsafety

執行期安全監控器實作

實作執行期安全監控器，即時偵測並封鎖不安全的模型輸出。

walkthroughsdefensesafety-classifiertraining

訓練自訂安全分類器

訓練針對你應用程式特定威脅模型與內容政策調校的自訂安全分類器。

architecturesecuredefenseagentwalkthroughs

Secure Agent Architecture Design

Design a secure architecture for LLM agent systems with sandboxing, capability controls, and audit trails.

callingsecuredefensepatternsfunctionwalkthroughs

安全函式呼叫模式詳解

Implement secure function calling with input validation, output sanitization, and capability restrictions.

architecturesecuredefenseragwalkthroughs

Secure RAG Architecture 詳解

Design and implement a secure RAG architecture with document sanitization, access controls, and output validation.

walkthroughsdefensesecure-ragpipeline

安全的 RAG 管線架構

建構安全的 RAG 管線，具備文件淨化、取回驗證與抗注入的增強。

walkthroughsdefensesystem-promptprotection

系統提示保護技術

實作多層次的系統提示保護，以對抗擷取、覆寫與外洩攻擊。

threatsecuritydefenseintelligencewalkthroughs

AI Security Threat Intelligence

Build a threat intelligence pipeline for staying current with AI security threats and attack techniques.

walkthroughsdefensetoken-filteringinput-safety

Token 層級輸入過濾

實作 Token 層級的輸入過濾，在模型處理之前、於 token 化階段偵測注入模式。

walkthroughsdefensetool-authorizationframework

工具呼叫授權框架

實作工具呼叫授權框架，在執行之前依政策驗證工具的呼叫。

walkthroughsdefenseagentaccess-control

Agent Tool Access Control Implementation

Implement fine-grained tool access control for LLM agents with capability-based security and approval workflows.

walkthroughsdefenseanomaly-detectiontraffic

Anomaly Detection for LLM Traffic

Build anomaly detection for LLM API traffic to identify attack patterns, abuse, and prompt injection attempts.

walkthroughsdefenseconversation-integritymonitoring

Conversation Integrity Monitoring

Build a conversation integrity monitoring system that detects manipulation across multi-turn interactions.

walkthroughsdefensemetricsdashboard

Defense Effectiveness Metrics Dashboard

Build a dashboard to measure and visualize the effectiveness of defensive measures against attack categories.

walkthroughsdefensetestingautomation

Automated 防禦 Testing Pipeline

Build an automated pipeline that continuously tests defensive measures against evolving attack techniques.

walkthroughsdefenseembeddingpoisoning-detection

Embedding Poisoning Detection System

Build a detection system for identifying poisoned documents in vector databases using statistical analysis.

walkthroughsdefensefunction-callingguardrails

Function Calling Guardrails Implementation

Implement guardrails for function calling that validate tool selection, parameters, and execution scope.

walkthroughsdefensehoneypotdeception

LLM Honeypot Deployment 指南

Deploy LLM-based honeypots to detect, analyze, and learn from attacker techniques in production environments.

walkthroughsdefenseincident-responseplaybook

Incident Response Playbook for LLM Applications

Design and implement an incident response playbook specific to LLM application security incidents.

walkthroughsdefensellm-guardproduction

LLM Guard Production Deployment 指南

Deploy LLM Guard in a production environment with custom scanners, performance optimization, and monitoring.

walkthroughsdefensemcpserver-hardening

MCP Server Security Hardening 指南

Harden MCP server implementations against tool poisoning, transport attacks, and capability escalation.

walkthroughsdefensemulti-modelensemble

Multi-Model 防禦 Ensemble

Build an ensemble defense system using multiple models to cross-validate inputs and outputs for safety.

walkthroughsdefensenemo-guardrailsadvanced

NeMo Guardrails Advanced Configuration

Advanced NeMo Guardrails configuration including custom actions, knowledge base integration, and multi-model pipelines.

walkthroughsdefenseoutput-scanningpipeline

Building an Output Scanning Pipeline

Build a production output scanning pipeline that detects data leakage, harmful content, and injection artifacts.

walkthroughsdefensepiiredaction

PII Detection and Redaction for LLMs

Build a PII detection and redaction system for LLM inputs and outputs to prevent data exposure.

walkthroughsdefenseclassifiertraining

Training a Prompt Injection Classifier

Train a custom prompt injection detection classifier using labeled datasets and modern NLP techniques.

walkthroughsdefenseragsandboxing

RAG Document Sandboxing Implementation

Implement document-level sandboxing for RAG systems to prevent cross-document injection and privilege escalation.

walkthroughsdefenserate-limitingadaptive

Adaptive Rate Limiting for LLM APIs

Implement adaptive rate limiting that adjusts thresholds based on detected attack patterns and user behavior.

walkthroughsdefenseragsecure-architecture

Secure RAG Architecture Implementation

Implement a security-hardened RAG architecture with input sanitization, access control, and output validation.

walkthroughsdefensesystem-promptprotection

System Prompt Protection Layers

Implement multi-layer system prompt protection including injection-resistant formatting and runtime validation.

architecturetrustdefensellmzerowalkthroughs

Zero Trust Architecture for LLM Apps

Implement zero trust principles in LLM application architecture with continuous verification and least privilege.

walkthroughsengagementcode-assistantsecurity

Full 演練: AI Code Assistant

End-to-end engagement for assessing an AI-powered code assistant with repository access.

generationcontentengagementwalkthroughsengagements

Full 演練: Content Generation Platform

Full engagement walkthrough for a content generation AI platform with brand safety and copyright concerns.

analyticsengagementcustomerwalkthroughsengagements

完整演練：客戶分析 AI 安全評估

Red team engagement for an AI-powered customer analytics system processing sensitive behavioral data.

devtoolsengagementwalkthroughsengagements

Full 演練: AI Developer Tools

End-to-end engagement for AI-powered developer tools with repository access and code execution capabilities.

educationalplatformengagementwalkthroughsengagements

Full 演練: Educational AI Platform

End-to-end engagement walkthrough for an AI-powered educational platform with student safety requirements.

walkthroughsengagementfinancialadvisor

Full 演練: AI Financial Advisor

Full engagement for assessing an AI financial advisor chatbot handling investment recommendations.

governmentengagementwalkthroughsengagements

Full 演練: Government AI System

Full red team engagement for a government-deployed AI system with classification and compliance requirements.

walkthroughsengagementhealthcarecompliance

Full 演練: Healthcare AI System

End-to-end engagement walkthrough for a healthcare AI system with HIPAA compliance requirements.

walkthroughsengagementinternalknowledge-bot

完整演練：內部知識機器人安全評估

Full engagement walkthrough for an internal enterprise knowledge bot with RAG and document access.

walkthroughsengagementmulti-agentplatform

完整演練：多代理平台安全評估

Full red team engagement of a multi-agent platform with MCP servers and A2A communication.

walkthroughsengagementsaaschatbot

Full 演練: SaaS Customer Chatbot

End-to-end engagement walkthrough for assessing a SaaS customer-facing chatbot with tool access.

auditsupplychainengagementwalkthroughsengagements

Full 演練: ML Supply Chain Audit

End-to-end engagement walkthrough for auditing an organization's ML model supply chain security.

walkthroughsengagementcoding-agentautonomous

Full 演練: Autonomous Coding Agent

End-to-end engagement for an autonomous AI coding agent with repo access, CI/CD integration, and deployment.

walkthroughsengagementcontent-generationplatform

Full 演練: AI Content Generation Platform

Red team engagement of an AI content generation platform used for marketing, SEO, and social media.

walkthroughsengagementcustomer-supportmulti-tool

Full 演練: Multi-Tool Customer Support Agent

End-to-end engagement for a customer support agent with order management, refund processing, and CRM access.

walkthroughsengagementdevopsinfrastructure

Full 演練: DevOps AI Assistant

End-to-end engagement for a DevOps AI assistant with CI/CD, cloud infrastructure, and monitoring access.

walkthroughsengagementdocument-processingpipeline

Full 演練: Document Processing Pipeline

Full engagement of an AI document processing pipeline handling invoices, contracts, and compliance documents.

walkthroughsengagementeducationplatform

Full 演練: AI Education Platform

End-to-end engagement walkthrough for assessing an AI-powered education platform with tutoring and grading.

walkthroughsengagementgovernmentcitizen-services

Full 演練: Government Citizen Services AI

Red team engagement of a government citizen services AI handling tax, benefits, and identity verification.

walkthroughsengagementhrbenefits

Full 演練: HR Benefits Chatbot

Full engagement walkthrough for an HR benefits chatbot with access to employee records and benefits administration.

walkthroughsengagementinsuranceunderwriting

Full 演練: AI Insurance Underwriting

Full engagement for an AI underwriting system with risk scoring, policy generation, and claims data access.

walkthroughsengagementlegalresearch

Full 演練: Legal Research AI System

Full red team engagement of a legal research AI with case law access, privilege management, and citation generation.

walkthroughsengagementmulti-agententerprise

Full 演練: Enterprise Multi-Agent System

Full engagement of an enterprise multi-agent system with specialized agents for HR, IT, Finance, and Legal.

walkthroughsengagementretailpersonalization

Full 演練: Retail Personalization AI

Red team engagement of an AI personalization engine with access to customer profiles and purchase history.

walkthroughsengagementsecuritycopilot

Full 演練: AI Security Copilot

Red team engagement of an AI security copilot with access to SIEM, vulnerability scanners, and threat intelligence.

walkthroughsengagementsupply-chainoptimizer

Full 演練: Supply Chain AI Optimizer

End-to-end engagement for a supply chain AI with access to logistics, inventory, and supplier management systems.

walkthroughsengagementtelehealthhealthcare

Full 演練: Telehealth AI Assistant

End-to-end engagement for a telehealth AI assistant with appointment scheduling, symptom assessment, and EHR access.

walkthroughstutorialstoolsmethodologydefenseplatformsengagements

逐步導覽

涵蓋紅隊工具、案件方法論、防禦實作、平台特定測試與完整案件工作流程的逐步引導式導覽。

入門

對抗性模擬設計

設計貼近真實威脅行為者與攻擊情境的對抗性模擬,用於 AI 系統的紅隊演練。

walkthroughsmethodologyadversarial-simulationdesign

walkthroughsmethodologyagenticassessment

代理式系統評估方法論

為代理式 AI 系統(包含工具使用、記憶與多代理互動)設計的完整評估方法論。

pentestmethodologyplanningwalkthroughs

AI 滲透測試規劃

規劃 AI 專屬滲透測試的完整方法論,涵蓋範圍界定、資源分配與時程。

methodologymaturityteamredwalkthroughs

AI 紅隊成熟度模型(方法論詳解)

用以評估並改善組織 AI 紅隊演練能力的成熟度模型。

metricssecurityframeworkmethodologywalkthroughs

AI 安全指標框架

以量化指標衡量與回報 AI 安全態勢的框架。

vulnerabilityclassificationmethodologywalkthroughs

AI 漏洞分類系統

依類型、影響與可利用性為 AI 特有漏洞分類的結構化系統。

walkthroughsmethodologyattack-prioritizationframework

攻擊優先排序框架

依目標架構、時間限制與成功機率,為攻擊技術排定優先順序。

walkthroughsmethodologyattack-treesplanning

LLM 系統攻擊樹建構

使用 MITRE ATLAS 與 OWASP 對應建構 LLM 系統評估的系統化攻擊樹。

walkthroughsmethodologyautomated-reconworkflow

自動化 AI 偵察工作流程

建立自動化偵察工作流程,用於盤點 AI 應用的架構、模型與防禦配置。

walkthroughsmethodologycollaborativeteam

協作式 AI 紅隊評估

協調多人紅隊評估,包含角色分工、溝通協定與發現去衝突。

securityanalysismethodologycompetitivewalkthroughs

AI 安全工具競品分析

針對紅隊作戰用途評估並比較 AI 安全工具的方法論。

walkthroughsmethodologycompliancetesting

合規導向測試方法論

將法規要求對應至具體測試案例,以支援合規導向的 AI 紅隊評估。

walkthroughsmethodologycontinuous-monitoringintegration

持續監控整合方法論

將紅隊發現整合進持續監控系統,以進行持續威脅偵測與防禦驗證。

walkthroughsmethodologydeduplicationtriage

發現去重與分級

將自動化與人工測試的發現去重並分級,產出可行動且已排序的漏洞報告。

walkthroughsmethodologymulti-modeltesting

多模型測試方法論

為處理流程中使用多個 LLM 模型的應用程式,建立結構化測試方法論。

walkthroughsmethodologypost-engagementanalysis

任務後續分析方法論

進行完整的任務後續分析,包含經驗教訓、技術成效與方法論精進。

walkthroughsmethodologypurple-teamcollaborative

紫隊 AI 評估方法論

以協作式紫隊方法進行 AI 評估,紅藍隊作戰之間即時互相回饋。

walkthroughsmethodologyregression-testingcontinuous

AI 安全回歸測試方法論

設計回歸測試套件,驗證安全修復在模型更新與部署後仍維持有效。

walkthroughsmethodologyrisk-basedprioritization

風險導向 AI 測試方法

採用風險導向的測試方法,將評估資源聚焦於影響最大的漏洞類別。

managementstakeholderredteammethodologywalkthroughs

AI 紅隊演練的利害關係人管理

在 AI 紅隊任務全程管理利害關係人期待與溝通。

walkthroughsmethodologystakeholder-reportingcommunication

利害關係人導向回報方法論

為不同利害關係人(主管、開發者、安全團隊、合規人員)量身製作紅隊報告。

securitytabletopmethodologyexercisewalkthroughs

AI 安全桌上推演

設計並主持以 AI 安全事件情境為主題的桌上推演。

walkthroughsmethodologytime-boxedefficient

時間限制型 AI 評估方法論

以優先排序的攻擊樹與平行測試,在嚴格時間限制下最大化測試覆蓋率。

walkthroughsmethodologyattack-surfaceenumeration

AI 攻擊面列舉方法論

為 AI 驅動應用程式系統性列舉完整攻擊面的方法論。

walkthroughsmethodologycompliancetesting

AI 合規測試方法論

針對法規合規要求(含 EU AI Act 與 NIST)測試 AI 系統的方法論。

walkthroughsmethodologymitre-atlasmapping

將發現對應至 MITRE ATLAS

將 AI 紅隊發現對應至 MITRE ATLAS 戰術、技術與程序的方法論。

walkthroughsmethodologyreport-structurepentest

AI 滲透測試報告結構

AI 滲透測試的詳細報告結構,含發現樣板與嚴重度評分。

walkthroughsmethodologyscopingtemplates

AI 紅隊範圍界定樣板

跨不同應用類型界定 AI 紅隊任務範圍的樣板與程序。

walkthroughsmethodologyrisk-quantificationmetrics

AI 風險量化方法論

針對 AI 漏洞進行量化風險評估的方法論,含機率與影響評分。

walkthroughsmethodologytabletopexercise

AI 安全桌上推演設計

設計並主持 AI 安全桌上推演,用以評估組織備戰能力。

walkthroughsmethodologyevidencecollection

AI 測試期間的證據蒐集

AI 紅隊評估期間蒐集、組織與保存證據的最佳實務。

walkthroughsmethodologymulti-modelassessment

多模型評估方法論

評估使用多個 AI 模型(管線或整體配置)的應用程式方法論。

walkthroughsmethodologyowasptesting-guide

OWASP LLM Top 10 測試方法論

針對 OWASP LLM Top 10 2025 每一項漏洞的完整測試方法論。

walkthroughsmethodologypurple-teamoperations

AI 安全紫隊作戰

將紅隊攻擊與藍隊防禦精進結合的紫隊作戰方法論。

walkthroughsmethodologyregression-testingcontinuous

AI 安全回歸測試方法論(方法論詳解)

AI 應用在更新與模型變更後進行持續回歸測試的方法論。

walkthroughsmethodologystakeholder-communicationrisks

向利害關係人傳達 AI 風險

向技術與非技術利害關係人有效傳達 AI 安全風險的指南。

walkthroughsmethodologythreat-intelligenceai-systems

AI 系統的威脅情資

針對 AI 系統攻擊與防禦蒐集並應用威脅情資的方法論。

walkthroughsmethodologyvulnerability-prioritizationframework

AI 漏洞優先排序框架

依可利用性、影響與修復成本對 AI 漏洞排序的框架。

walkthroughsmethodologyreportingprofessional

撰寫 AI 紅隊報告

撰寫清晰、可行動的 AI 紅隊評估報告(含發現與建議)的指南。

ai21platformstestingmodelswalkthroughs

測試 AI21 Labs 模型

AI21 Labs Jamba 模型(含長上下文與效率特性)的紅隊測試指南。

walkthroughsplatformsanthropicclaude

測試 Anthropic Claude:完整指南

Anthropic Claude 的完整紅隊測試指南(含工具使用、延伸思考與電腦使用)。

walkthroughsplatformsawsbedrock

測試 AWS Bedrock 部署

透過 AWS Bedrock 部署的模型(含護欄與存取控制)的紅隊測試指南。

walkthroughsplatformsazureopenai

測試 Azure OpenAI 服務

Azure OpenAI(含內容過濾、受管身分與網路控制)的紅隊測試指南。

cohereplatformstestingmodelswalkthroughs

測試 Cohere 模型

Cohere Command-R 系列模型(含 RAG 與工具使用特性)的紅隊測試指南。

walkthroughsplatformsdeepseekreasoning

測試 DeepSeek 模型

DeepSeek 模型(含 R1 推論利用與 MoE 架構)的紅隊測試指南。

fireworksplatformstestingwalkthroughs

測試 Fireworks AI 平台

Fireworks AI(含函式呼叫與複合 AI 系統)的紅隊測試指南。

walkthroughsplatformsgooglegemini

測試 Google Gemini:完整指南

Google Gemini 的完整紅隊測試指南(含多模態輸入與長上下文)。

inferencegroqplatformstestingwalkthroughs

測試 Groq 推論平台

Groq 高速推論平台及其安全特性的紅隊測試指南。

walkthroughsplatformshuggingfacehosted

測試 Hugging Face 代管模型

Hugging Face 代管模型(含 Inference API 與 Spaces)的紅隊測試指南。

walkthroughsplatformsmetallama

測試 Meta Llama:完整指南

Meta Llama 模型的完整紅隊測試指南(含微調與自架部署)。

mistralplatformstestingmodelswalkthroughs

測試 Mistral AI 模型

Mistral AI 模型的完整紅隊測試指南,涵蓋 Mixtral MoE 架構與聊天端點。

localollamaplatformstestingwalkthroughs

測試 Ollama 本地部署

透過 Ollama 本地部署模型的安全測試指南,含網路暴露面與 API 安全。

walkthroughsplatformsopenaigpt-4o

測試 OpenAI GPT-4o:完整指南

OpenAI GPT-4o 的完整紅隊測試指南(含 API、函式呼叫與結構化輸出)。

replicateplatformstestingwalkthroughshosted

測試 Replicate 代管模型

針對 Replicate 代管模型(含開源模型部署)的紅隊測試指南。

togetherplatformstestingwalkthroughs

測試 Together AI 平台

Together AI(含微調模型端點與自訂部署)的紅隊測試指南。

walkthroughsplatformsai21jamba

測試 AI21 Jamba 模型

AI21 Jamba 模型(含 SSM-Transformer 混合架構分析)的紅隊測試指南。

walkthroughsplatformsanyscaleray-serve

測試 Anyscale/Ray Serve 端點

透過 Anyscale 與 Ray Serve 提供服務的模型(含分散式推論安全)的紅隊測試指南。

walkthroughsplatformscoherecommand

測試 Cohere 模型:完整指南

Cohere Command 系列模型(含 RAG 整合與工具使用)的完整紅隊測試指南。

walkthroughsplatformsfireworks-aiinference

測試 Fireworks AI 部署

透過 Fireworks AI 部署的模型(含函式呼叫與微調模型)的紅隊測試指南。

walkthroughsplatformsgcpvertex-ai

測試 GCP Vertex AI 部署

透過 GCP Vertex AI 部署的模型(含 Model Garden 與自訂端點)的紅隊測試指南。

walkthroughsplatformsmistralmixtral

測試 Mistral/Mixtral:完整指南

Mistral 與 Mixtral 模型的完整紅隊測試指南,含 MoE 路由與 API 漏洞。

walkthroughsplatformsollamalocal

測試 Ollama 本地部署(平台詳解)

透過 Ollama 本地部署的模型(含 API 端點與模型管理)的紅隊測試指南。

walkthroughsplatformsreplicatehosted

測試 Replicate 代管模型

Replicate 代管模型(含 API 利用與模型版本管理)的紅隊測試指南。

walkthroughsplatformstogether-aiinference

測試 Together AI 代管模型

Together AI 推論平台代管模型的紅隊測試指南。

walkthroughsplatformsvllminference

測試 vLLM 推論部署

透過 vLLM 提供服務的模型(含批次、KV 快取與推測解碼)的紅隊測試指南。

benchmarkrunnertoolsdevelopmentwalkthroughs

安全基準測試執行器開發

建立基準測試執行器,用於跨模型與配置標準化評估 LLM 安全。

replaytoolsbuildingtoolattackwalkthroughs

建立攻擊重播工具

建立能錄製並重播攻擊序列的工具,供回歸測試與防禦驗證使用。

walkthroughscounterfitml-securitytesting

Counterfit ML 安全測試

使用 Microsoft Counterfit 對部署的模型端點進行對抗性 ML 測試。

walkthroughsfuzzercustomdevelopment

建立自訂 LLM Fuzzer

為 LLM 應用建立自訂模糊測試工具,含變異引擎與覆蓋率追蹤。

customtoolsmutationwalkthroughsengine

建立自訂載荷變異引擎

開發透過遺傳演算法演化提示詞注入載荷的變異引擎。

walkthroughsgarakconfigurationadvanced

Garak 進階配置指南

深入 Garak 配置系統(含自訂探測器、偵測器與生成器)。

garakcustomtoolsprobeswalkthroughs

建立自訂 Garak 探測器(工具詳解)

為 NVIDIA Garak 開發自訂探測模組,以測試領域特定漏洞。

customtoolsbehaviorsharmbenchwalkthroughs

HarmBench 自訂行為集

以自訂行為規格擴充 HarmBench,進行領域特定的漏洞測試。

walkthroughsharmbenchevaluationbenchmark

HarmBench 評測詳解

使用 HarmBench 框架執行標準化攻擊評測並解讀結果。

toolswalkthroughsgarakpyritpromptfooburp-suiteinspect-aiollamapython

工具導覽

必備 AI 紅隊演練工具的端對端實務導覽，涵蓋安裝、設定、執行與結果詮釋。

入門

Inspect AI 安全評測

使用 UK AISI Inspect 框架建立並執行 AI 安全評測。

walkthroughsinspect-aisafetyevaluations

walkthroughsjailbreakbenchbenchmarktesting

JailbreakBench 使用與結果提交

使用 JailbreakBench 評估越獄技術並將結果提交至基準測試。

walkthroughsllm-guarddeploymentdefense

LLM Guard 部署與測試

部署 LLM Guard 進行輸入/輸出掃描,並測試其對常見攻擊的成效。

llmtoolstrafficanalyzerwalkthroughs

建立 LLM 流量分析工具

建立基於代理伺服器的 LLM 流量分析工具,攔截並分析 API 通訊。

integrationtoolscicdpromptfoowalkthroughs

Promptfoo CI/CD 整合

將 Promptfoo 整合至 CI/CD 管線,進行持續的 LLM 安全回歸測試。

walkthroughspromptfoored-teamevaluation

Promptfoo 紅隊評測

配置 Promptfoo 進行完整紅隊評測,含自訂斷言與評分器。

walkthroughspyritmulti-turncampaigns

PyRIT 多輪攻擊戰役編排

使用 PyRIT 編排器系統建立並執行複雜的多輪攻擊戰役。

integrationtoolspyritscoringwalkthroughs

PyRIT 自訂評分整合

將自訂評分指標整合至 PyRIT,用於組織特定的紅隊評估標準。

walkthroughstoolsagent-scannerdevelopment

代理安全掃描器開發

為 AI 代理系統建立自動化安全掃描器,測試工具使用、記憶與工作流程漏洞。

walkthroughstoolsreport-generationautomation

自動化紅隊報告生成

以測試資料與發現為輸入,自動生成結構化紅隊報告的系統。

walkthroughstoolsattack-proxydevelopment

建立 LLM 攻擊代理伺服器

為 LLM API 流量建立攔截式代理伺服器,支援載荷注入、回應分析與日誌記錄。

walkthroughstoolsbenchmarkingdefense

防禦基準測試工具開發

建立工具以標準化攻擊套件對防禦措施的成效進行基準測試。

walkthroughstoolsembedding-attackstoolkit

建立嵌入層攻擊工具組

建立用以進行嵌入層攻擊的工具組,含反演、投毒與操縱。

walkthroughsgarakplugindevelopment

Garak 自訂外掛開發

為組織特定測試開發自訂 Garak 外掛,含探測器、偵測器與生成器。

walkthroughsinspect-aicustom-solversdevelopment

Inspect AI 自訂 Solver 開發

在 Inspect AI 中建立自訂 Solver,用於專門的安全評測情境與攻擊模擬。

walkthroughstoolsdataset-curationjailbreaks

越獄資料集整理工具

建立用以整理、組織與測試越獄提示詞資料集的工具,並追蹤成效。

walkthroughstoolstraffic-analysismonitoring

LLM 流量分析工具

建立工具分析並視覺化 LLM API 流量樣態,辨識攻擊指標。

walkthroughstoolsmcp-auditsecurity

MCP 安全稽核工具

建立工具,對 MCP 伺服器實作進行常見安全漏洞與設定錯誤稽核。

walkthroughstoolstest-harnessmulti-model

多模型測試骨架建置

建立統一測試骨架,跨 OpenAI、Anthropic、Google 與本地模型端點執行攻擊。

walkthroughstoolsmutation-frameworkpayloads

載荷變異框架開發

建立具文法式、語意式與對抗式變異策略的載荷變異框架。

walkthroughspromptfooci-cdintegration

Promptfoo CI/CD 整合(工具詳解)

將 Promptfoo 紅隊評測整合至 CI/CD 管線,進行自動化安全回歸測試。

walkthroughspyritconverterdevelopment

PyRIT 自訂 Converter 開發

建立自訂 PyRIT Converter,用於載荷轉換、編碼與多模態攻擊生成。

walkthroughstoolsrag-testingframework

RAG 安全測試框架

為 RAG 應用建立系統性安全測試框架,含投毒與資料外洩。