# walkthroughs
標記為「walkthroughs」的 272 篇文章
A2A 訊息攔截演練
攔截並竄改基於 A2A 協定之多代理系統中各代理之間的訊息。
A2A 協定注入演練
利用 Google 的 Agent-to-Agent 協定進行跨代理提示詞注入的詳解。
對抗性後綴打造演練
以梯度式與無梯度最佳化方法為黑盒模型打造對抗性後綴。
API 濫用鏈攻擊詳解
鏈接多個 API 呼叫以實現任何單次呼叫都無法允許的未授權行動。
API 鏈式利用詳解
詳解代理系統中鏈接多個 API 呼叫以實現多步驟未授權行動的方法。
注意力劫持攻擊詳解
Hijack transformer attention mechanisms to redirect model focus toward adversarial instructions in the context.
AutoDAN 隱蔽越獄詳解
Walkthrough of generating stealthy, human-readable jailbreak prompts using the AutoDAN methodology.
批次處理注入詳解
Inject payloads through batch processing pipelines where individual items are processed without isolation.
LLM 快取投毒詳解
Poison LLM response caches to serve adversarial content to other users without direct injection.
競賽風格越獄技術詳解
Walkthrough of jailbreak techniques used in AI security competitions and CTF events.
Computer Use Agent Injection 詳解
Walkthrough of injecting prompts through UI elements and screenshots processed by computer-use agents.
Advanced Context Manipulation 詳解
Manipulate conversation context through history injection, context overflow, and attention weight exploitation.
上下文操控深度探討
Advanced walkthrough of context window manipulation techniques including attention dilution and instruction priority shifting.
漸進式多輪越獄詳解
Step-by-step walkthrough of the Crescendo technique for gradually escalating multi-turn jailbreaks.
Cross-Plugin Data Exfiltration 詳解
Walkthrough of chaining multiple plugins/tools to exfiltrate data from LLM agent systems.
Data Harvesting Through LLM Apps
Complete walkthrough of systematic data extraction from LLM applications using various exfiltration channels.
Document-Based Injection 詳解
Inject prompts through documents processed by LLM applications including PDFs, spreadsheets, and presentations.
Embedding Collision 攻擊 詳解
Craft documents that collide in embedding space with target queries to hijack RAG retrieval results.
Embedding Inversion 攻擊 詳解
Walkthrough of inverting text embeddings to recover original documents from vector databases.
Encoding Chain 攻擊 詳解
Chain multiple encoding transformations to bypass input filters that only decode one layer of encoding.
編碼鏈繞過詳解
Walkthrough of chaining Base64, URL encoding, and Unicode tricks to bypass multi-layer input filters.
Few-Shot Example Poisoning 詳解
Poison few-shot examples in prompts to establish behavioral patterns that override system instructions.
Fine-Tuning Safety Bypass 詳解
Walkthrough of using fine-tuning API access to remove safety behaviors from aligned models.
函式呼叫利用指南
Complete walkthrough of exploiting function calling in OpenAI, Anthropic, and Google AI APIs.
Function Schema Poisoning 詳解
Poison function schemas to inject hidden instructions that redirect model tool selection and parameter filling.
GCG Adversarial Suffix 攻擊 詳解
Complete walkthrough of implementing the Greedy Coordinate Gradient adversarial suffix attack.
Real-World Indirect Prompt Injection
Walkthrough of planting and triggering indirect prompt injection in web-browsing AI assistants.
JSON Injection 攻擊 詳解
Exploit JSON parsing and generation in LLM applications to inject payloads through structured data boundaries.
Logit Bias Exploitation 詳解
Exploit logit bias API parameters to force models toward specific token sequences and bypass safety filters.
Many-Shot Jailbreaking 詳解
Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.
Advanced Markdown Injection 詳解
Inject Markdown that triggers data exfiltration through image rendering, link generation, and code block escape.
MCP Tool Poisoning 攻擊 詳解
Walkthrough of exploiting MCP tool descriptions to redirect agent behavior via hidden instructions.
Memory Persistence 攻擊 詳解
Walkthrough of achieving persistent memory manipulation in agent systems for cross-session influence.
Model Context Window Overflow 詳解
Overflow the context window to push safety instructions outside the effective attention range.
Model Extraction 攻擊 詳解
Walkthrough of extracting model weights/behavior through systematic API querying.
Multi-Modal Document 攻擊 詳解
Combine visual and textual injection in documents processed by multimodal LLM applications.
Multi-Turn Persona Escalation 詳解
Gradually escalate through persona manipulation across multiple turns to achieve full safety bypass.
多模態圖像注入詳解
Step-by-step walkthrough of embedding adversarial prompts in images for vision model exploitation.
PAIR 自動越獄詳解
End-to-end walkthrough of the PAIR attack methodology for automated black-box jailbreaking.
Agent Privilege Escalation 詳解
Walkthrough of escalating privileges in multi-agent systems through trust chain exploitation.
Advanced Prompt Leaking 詳解
Advanced techniques for extracting system prompts including iterative reconstruction and side-channel methods.
RAG Poisoning End-to-End 詳解
Complete walkthrough of poisoning a RAG system from document injection through information extraction.
Reasoning Model Exploitation 詳解
Exploit extended thinking in reasoning models to inject false premises and manipulate conclusion generation.
Regex Filter Bypass 詳解
Systematically bypass regex-based input filters using Unicode tricks, encoding, and pattern-specific evasion.
Role Confusion 攻擊 詳解
Exploit role confusion between system, user, and assistant messages to override safety instructions.
Role Confusion Exploitation 詳解
Exploit role boundaries in chat APIs by injecting assistant-role messages and system-level instructions.
語意偽裝詳解
Walkthrough of crafting semantically camouflaged injections that evade both classifiers and human review.
Semantic Deception 攻擊 詳解
Craft semantically deceptive prompts that appear benign to classifiers while achieving adversarial objectives.
Sleeper Agent Detection 詳解
Walkthrough of detecting deceptive sleeper agent behaviors in fine-tuned language models.
Supply Chain Prompt Injection 詳解
Plant injection payloads in upstream data sources consumed by LLM applications including packages and documentation.
Model Supply Chain Poisoning
Walkthrough of poisoning ML supply chains through dependency confusion, model weight manipulation, and hub attacks.
Synthetic Identity Injection 詳解
Create synthetic identities that exploit LLM trust mechanisms to achieve elevated instruction priority.
TAP Tree of 攻擊s 詳解
Detailed walkthrough of building and running Tree of Attacks for systematic jailbreak generation.
Thought Injection in Reasoning Models 詳解
Inject adversarial thoughts into the reasoning chain of thinking models to manipulate final outputs.
Timing Side-Channel 攻擊 詳解
Extract information from LLM applications through timing differences in response generation and safety filter processing.
Token-Level 攻擊 Optimization 詳解
Walkthrough of optimizing adversarial token sequences for maximum jailbreak effectiveness.
Tool Shadowing 攻擊 詳解
Register shadow tools that override legitimate tool definitions to intercept and manipulate agent actions.
視覺模型攻擊詳解
Attack vision-language models through adversarial images with embedded text, perturbations, and visual trojans.
Vision Model 攻擊 詳解 (Attack 詳解)
Step-by-step walkthrough of visual prompt injection, adversarial images, and OCR exploitation in vision-language models.
A2A Agent Impersonation 詳解
Walkthrough of impersonating a trusted agent in an A2A network to intercept and redirect delegated tasks.
API Parameter Exploitation 詳解
Walkthrough of exploiting undocumented and misconfigured API parameters in LLM service endpoints.
Batch API Exploitation 詳解
Walkthrough of exploiting batch processing APIs for large-scale prompt injection and output analysis.
Chain-of-Thought Hijacking 詳解
Walkthrough of hijacking visible reasoning traces in CoT models to redirect conclusions and bypass safety checks.
Cognitive Overload Jailbreak 詳解
Step-by-step walkthrough of the cognitive overload technique that overwhelms safety training with complex nested instructions.
Context Virtualization 攻擊 詳解
Walkthrough of creating virtualized contexts that make the model believe it is operating in a different environment.
Conversation Poisoning 攻擊 詳解
Step-by-step walkthrough of poisoning conversation history to alter model behavior in future turns.
Document Parsing Exploit Chain 詳解
End-to-end walkthrough of exploiting document parsers to inject instructions through PDFs, DOCX, and HTML files.
Few-Shot 攻擊 Scaling Analysis
Detailed analysis of how few-shot examples scale to influence model behavior, from 2-shot to many-shot regime.
Image Steganography Injection 詳解
Walkthrough of hiding prompt injection payloads in images using steganographic techniques for vision model attacks.
Instruction Hierarchy Exploitation 詳解
Walkthrough of exploiting how models prioritize system, user, and tool instructions to override safety directives.
Knowledge Graph Injection 攻擊 詳解
Walkthrough of injecting adversarial facts into knowledge graphs consumed by LLM-based reasoning systems.
LangChain Exploit Chain 詳解
Walkthrough of chaining LangChain CVEs for remote code execution from prompt injection through to shell access.
MCP Sampling API Abuse 詳解
Walkthrough of exploiting the MCP sampling API to manipulate how servers request and process LLM completions.
Memory Injection and Persistence 詳解
Walkthrough of injecting persistent instructions into agent memory systems that survive across sessions.
Multi-Provider 攻擊 Campaign 詳解
End-to-end walkthrough of conducting an attack campaign across OpenAI, Anthropic, Google, and open-source models.
Output Format Exfiltration Techniques 詳解
Walkthrough of using output format manipulation to exfiltrate data through code blocks, markdown, and structured outputs.
RAG Hybrid Search Poisoning 詳解
Walkthrough of poisoning both vector and keyword search in hybrid RAG architectures for maximum retrieval influence.
Recursive Prompt Injection 詳解
Walkthrough of creating self-replicating injection payloads that persist through model output-to-input loops.
Refusal Suppression Technique 詳解
Walkthrough of refusal suppression techniques that prime models to avoid safety refusal patterns.
Semantic Obfuscation 詳解
Walkthrough of semantically obfuscating adversarial payloads so they appear benign to both classifiers and humans.
Structured Output Data Leakage 詳解
Walkthrough of using structured output requirements to extract sensitive data embedded in model responses.
Tool Use Confusion 攻擊 詳解
Walkthrough of confusing model tool-use decisions to invoke unintended functions or skip safety-critical tools.
Unicode Normalization Bypass 詳解
Step-by-step guide to exploiting Unicode normalization differences between input filters and model tokenizers.
Voice AI Prompt Injection 詳解
Walkthrough of injecting prompts into voice-based AI assistants through adversarial audio and ultrasonic signals.
LLM Watermark Analysis 詳解
Walkthrough of detecting and analyzing watermarks in LLM-generated text using statistical methods.
XML Injection in LLM Systems 詳解
Exploit XML parsing in LLM application pipelines to inject instructions through entity expansion and CDATA sections.
XML and JSON Injection in LLM Apps
Walkthrough of exploiting XML and JSON parsing in LLM applications for injection and data manipulation.
對抗韌性測試框架
建構一個框架,持續測試已部署的 LLM 防禦機制的對抗韌性。
用於 LLM 防禦的對抗訓練(防禦逐步指南)
實作對抗訓練技術,提升 LLM 對抗提示注入與越獄的韌性。
代理權限邊界的強制執行
為 LLM 代理實作細緻的權限邊界,依脈絡與使用者角色限制工具存取。
Automated 防禦 Regression Testing
Build automated regression tests for LLM security defenses using attack replay and continuous monitoring.
自動化紅隊防禦迴路
建構一個自動化的持續紅隊測試迴路,以迭代方式發現並修補漏洞。
LLM 的行為異常偵測
實作行為異常偵測,辨識模型輸出何時偏離預期的安全樣貌。
Building an Input Safety Classifier
Build a production-quality input classifier that detects prompt injection attempts.
Constitutional AI Implementation 指南
Implement constitutional AI principles in a custom fine-tuning and RLHF pipeline.
脈絡隔離模式實作
實作脈絡隔離模式,防止系統提示與使用者資料之間的指令外洩。
LLM 應用的資料外洩防護
為 LLM 應用建構資料外洩防護層,防止輸出中暴露敏感資料。
防禦基準測試系統詳解
Build a benchmarking system to continuously evaluate defense effectiveness against known attack classes.
Defense-in-Depth Architecture for LLM Apps
Design and implement a complete defense-in-depth architecture for production LLM applications.
AI Incident Response System 設置
Set up comprehensive incident response capabilities for AI-specific security incidents.
輸入嵌入防火牆部署
部署基於嵌入的防火牆,透過與已知攻擊模式的語意相似度偵測注入嘗試。
用於防禦的安全輸入/輸出記錄
為 LLM 的輸入/輸出配對實作安全的記錄機制,在不洩漏敏感資料的前提下支援鑑識分析。
LLM 防火牆架構設計(防禦逐步指南)
設計並實作一個全面的 LLM 防火牆,具備輸入分析、輸出過濾與行為監控。
LLM 蜜罐部署詳解
Deploy LLM honeypots to detect and study attacker behavior patterns and techniques.
Model Behavior Monitoring 設置
Set up comprehensive model behavior monitoring to detect drift, anomalies, and potential compromise.
Monitoring LLM Applications for Abuse
Build a monitoring and alerting system to detect ongoing attacks against LLM applications.
多模型安全共識
實作安全共識機制,要求多個模型達成一致才能執行敏感動作。
Output Filtering Pipeline Design
Design and implement a multi-stage output filtering pipeline for LLM applications.
輸出基準對齊與驗證
實作輸出基準對齊驗證,確保 LLM 回應在事實上有所提供的脈絡支持。
Prompt Armor Implementation 指南
Implement a comprehensive prompt armoring system with instruction isolation, delimiter hardening, and priority enforcement.
提示注入蜜罐部署
部署蜜罐提示與金絲雀資料,以偵測並刻畫提示注入嘗試。
RAG Input Sanitization 詳解
Implement input sanitization for RAG systems to prevent document-based injection attacks.
即時攻擊偵測系統詳解
Build a real-time attack detection system that monitors LLM interactions for adversarial patterns.
Red Team-Defense Feedback Loop
Build a continuous red team-defense improvement loop with automated testing and metric tracking.
回應浮水印實作
實作回應浮水印,以實現可追溯性並偵測 LLM 輸出的未授權重製。
執行期安全監控器實作
實作執行期安全監控器,即時偵測並封鎖不安全的模型輸出。
訓練自訂安全分類器
訓練針對你應用程式特定威脅模型與內容政策調校的自訂安全分類器。
Secure Agent Architecture Design
Design a secure architecture for LLM agent systems with sandboxing, capability controls, and audit trails.
安全函式呼叫模式詳解
Implement secure function calling with input validation, output sanitization, and capability restrictions.
Secure RAG Architecture 詳解
Design and implement a secure RAG architecture with document sanitization, access controls, and output validation.
安全的 RAG 管線架構
建構安全的 RAG 管線,具備文件淨化、取回驗證與抗注入的增強。
系統提示保護技術
實作多層次的系統提示保護,以對抗擷取、覆寫與外洩攻擊。
AI Security Threat Intelligence
Build a threat intelligence pipeline for staying current with AI security threats and attack techniques.
Token 層級輸入過濾
實作 Token 層級的輸入過濾,在模型處理之前、於 token 化階段偵測注入模式。
工具呼叫授權框架
實作工具呼叫授權框架,在執行之前依政策驗證工具的呼叫。
Agent Tool Access Control Implementation
Implement fine-grained tool access control for LLM agents with capability-based security and approval workflows.
Anomaly Detection for LLM Traffic
Build anomaly detection for LLM API traffic to identify attack patterns, abuse, and prompt injection attempts.
Conversation Integrity Monitoring
Build a conversation integrity monitoring system that detects manipulation across multi-turn interactions.
Defense Effectiveness Metrics Dashboard
Build a dashboard to measure and visualize the effectiveness of defensive measures against attack categories.
Automated 防禦 Testing Pipeline
Build an automated pipeline that continuously tests defensive measures against evolving attack techniques.
Embedding Poisoning Detection System
Build a detection system for identifying poisoned documents in vector databases using statistical analysis.
Function Calling Guardrails Implementation
Implement guardrails for function calling that validate tool selection, parameters, and execution scope.
LLM Honeypot Deployment 指南
Deploy LLM-based honeypots to detect, analyze, and learn from attacker techniques in production environments.
Incident Response Playbook for LLM Applications
Design and implement an incident response playbook specific to LLM application security incidents.
LLM Guard Production Deployment 指南
Deploy LLM Guard in a production environment with custom scanners, performance optimization, and monitoring.
MCP Server Security Hardening 指南
Harden MCP server implementations against tool poisoning, transport attacks, and capability escalation.
Multi-Model 防禦 Ensemble
Build an ensemble defense system using multiple models to cross-validate inputs and outputs for safety.
NeMo Guardrails Advanced Configuration
Advanced NeMo Guardrails configuration including custom actions, knowledge base integration, and multi-model pipelines.
Building an Output Scanning Pipeline
Build a production output scanning pipeline that detects data leakage, harmful content, and injection artifacts.
PII Detection and Redaction for LLMs
Build a PII detection and redaction system for LLM inputs and outputs to prevent data exposure.
Training a Prompt Injection Classifier
Train a custom prompt injection detection classifier using labeled datasets and modern NLP techniques.
RAG Document Sandboxing Implementation
Implement document-level sandboxing for RAG systems to prevent cross-document injection and privilege escalation.
Adaptive Rate Limiting for LLM APIs
Implement adaptive rate limiting that adjusts thresholds based on detected attack patterns and user behavior.
Secure RAG Architecture Implementation
Implement a security-hardened RAG architecture with input sanitization, access control, and output validation.
System Prompt Protection Layers
Implement multi-layer system prompt protection including injection-resistant formatting and runtime validation.
Zero Trust Architecture for LLM Apps
Implement zero trust principles in LLM application architecture with continuous verification and least privilege.
Full 演練: AI Code Assistant
End-to-end engagement for assessing an AI-powered code assistant with repository access.
Full 演練: Content Generation Platform
Full engagement walkthrough for a content generation AI platform with brand safety and copyright concerns.
完整演練:客戶分析 AI 安全評估
Red team engagement for an AI-powered customer analytics system processing sensitive behavioral data.
Full 演練: AI Developer Tools
End-to-end engagement for AI-powered developer tools with repository access and code execution capabilities.
Full 演練: Educational AI Platform
End-to-end engagement walkthrough for an AI-powered educational platform with student safety requirements.
Full 演練: AI Financial Advisor
Full engagement for assessing an AI financial advisor chatbot handling investment recommendations.
Full 演練: Government AI System
Full red team engagement for a government-deployed AI system with classification and compliance requirements.
Full 演練: Healthcare AI System
End-to-end engagement walkthrough for a healthcare AI system with HIPAA compliance requirements.
完整演練:內部知識機器人安全評估
Full engagement walkthrough for an internal enterprise knowledge bot with RAG and document access.
完整演練:多代理平台安全評估
Full red team engagement of a multi-agent platform with MCP servers and A2A communication.
Full 演練: SaaS Customer Chatbot
End-to-end engagement walkthrough for assessing a SaaS customer-facing chatbot with tool access.
Full 演練: ML Supply Chain Audit
End-to-end engagement walkthrough for auditing an organization's ML model supply chain security.
Full 演練: Autonomous Coding Agent
End-to-end engagement for an autonomous AI coding agent with repo access, CI/CD integration, and deployment.
Full 演練: AI Content Generation Platform
Red team engagement of an AI content generation platform used for marketing, SEO, and social media.
Full 演練: Multi-Tool Customer Support Agent
End-to-end engagement for a customer support agent with order management, refund processing, and CRM access.
Full 演練: DevOps AI Assistant
End-to-end engagement for a DevOps AI assistant with CI/CD, cloud infrastructure, and monitoring access.
Full 演練: Document Processing Pipeline
Full engagement of an AI document processing pipeline handling invoices, contracts, and compliance documents.
Full 演練: AI Education Platform
End-to-end engagement walkthrough for assessing an AI-powered education platform with tutoring and grading.
Full 演練: Government Citizen Services AI
Red team engagement of a government citizen services AI handling tax, benefits, and identity verification.
Full 演練: HR Benefits Chatbot
Full engagement walkthrough for an HR benefits chatbot with access to employee records and benefits administration.
Full 演練: AI Insurance Underwriting
Full engagement for an AI underwriting system with risk scoring, policy generation, and claims data access.
Full 演練: Legal Research AI System
Full red team engagement of a legal research AI with case law access, privilege management, and citation generation.
Full 演練: Enterprise Multi-Agent System
Full engagement of an enterprise multi-agent system with specialized agents for HR, IT, Finance, and Legal.
Full 演練: Retail Personalization AI
Red team engagement of an AI personalization engine with access to customer profiles and purchase history.
Full 演練: AI Security Copilot
Red team engagement of an AI security copilot with access to SIEM, vulnerability scanners, and threat intelligence.
Full 演練: Supply Chain AI Optimizer
End-to-end engagement for a supply chain AI with access to logistics, inventory, and supplier management systems.
Full 演練: Telehealth AI Assistant
End-to-end engagement for a telehealth AI assistant with appointment scheduling, symptom assessment, and EHR access.
逐步導覽
涵蓋紅隊工具、案件方法論、防禦實作、平台特定測試與完整案件工作流程的逐步引導式導覽。
對抗性模擬設計
設計貼近真實威脅行為者與攻擊情境的對抗性模擬,用於 AI 系統的紅隊演練。
代理式系統評估方法論
為代理式 AI 系統(包含工具使用、記憶與多代理互動)設計的完整評估方法論。
AI 滲透測試規劃
規劃 AI 專屬滲透測試的完整方法論,涵蓋範圍界定、資源分配與時程。
AI 紅隊成熟度模型(方法論詳解)
用以評估並改善組織 AI 紅隊演練能力的成熟度模型。
AI 安全指標框架
以量化指標衡量與回報 AI 安全態勢的框架。
AI 漏洞分類系統
依類型、影響與可利用性為 AI 特有漏洞分類的結構化系統。
攻擊優先排序框架
依目標架構、時間限制與成功機率,為攻擊技術排定優先順序。
LLM 系統攻擊樹建構
使用 MITRE ATLAS 與 OWASP 對應建構 LLM 系統評估的系統化攻擊樹。
自動化 AI 偵察工作流程
建立自動化偵察工作流程,用於盤點 AI 應用的架構、模型與防禦配置。
協作式 AI 紅隊評估
協調多人紅隊評估,包含角色分工、溝通協定與發現去衝突。
AI 安全工具競品分析
針對紅隊作戰用途評估並比較 AI 安全工具的方法論。
合規導向測試方法論
將法規要求對應至具體測試案例,以支援合規導向的 AI 紅隊評估。
持續監控整合方法論
將紅隊發現整合進持續監控系統,以進行持續威脅偵測與防禦驗證。
發現去重與分級
將自動化與人工測試的發現去重並分級,產出可行動且已排序的漏洞報告。
多模型測試方法論
為處理流程中使用多個 LLM 模型的應用程式,建立結構化測試方法論。
任務後續分析方法論
進行完整的任務後續分析,包含經驗教訓、技術成效與方法論精進。
紫隊 AI 評估方法論
以協作式紫隊方法進行 AI 評估,紅藍隊作戰之間即時互相回饋。
AI 安全回歸測試方法論
設計回歸測試套件,驗證安全修復在模型更新與部署後仍維持有效。
風險導向 AI 測試方法
採用風險導向的測試方法,將評估資源聚焦於影響最大的漏洞類別。
AI 紅隊演練的利害關係人管理
在 AI 紅隊任務全程管理利害關係人期待與溝通。
利害關係人導向回報方法論
為不同利害關係人(主管、開發者、安全團隊、合規人員)量身製作紅隊報告。
AI 安全桌上推演
設計並主持以 AI 安全事件情境為主題的桌上推演。
時間限制型 AI 評估方法論
以優先排序的攻擊樹與平行測試,在嚴格時間限制下最大化測試覆蓋率。
AI 攻擊面列舉方法論
為 AI 驅動應用程式系統性列舉完整攻擊面的方法論。
AI 合規測試方法論
針對法規合規要求(含 EU AI Act 與 NIST)測試 AI 系統的方法論。
將發現對應至 MITRE ATLAS
將 AI 紅隊發現對應至 MITRE ATLAS 戰術、技術與程序的方法論。
AI 滲透測試報告結構
AI 滲透測試的詳細報告結構,含發現樣板與嚴重度評分。
AI 紅隊範圍界定樣板
跨不同應用類型界定 AI 紅隊任務範圍的樣板與程序。
AI 風險量化方法論
針對 AI 漏洞進行量化風險評估的方法論,含機率與影響評分。
AI 安全桌上推演設計
設計並主持 AI 安全桌上推演,用以評估組織備戰能力。
AI 測試期間的證據蒐集
AI 紅隊評估期間蒐集、組織與保存證據的最佳實務。
多模型評估方法論
評估使用多個 AI 模型(管線或整體配置)的應用程式方法論。
OWASP LLM Top 10 測試方法論
針對 OWASP LLM Top 10 2025 每一項漏洞的完整測試方法論。
AI 安全紫隊作戰
將紅隊攻擊與藍隊防禦精進結合的紫隊作戰方法論。
AI 安全回歸測試方法論(方法論詳解)
AI 應用在更新與模型變更後進行持續回歸測試的方法論。
向利害關係人傳達 AI 風險
向技術與非技術利害關係人有效傳達 AI 安全風險的指南。
AI 系統的威脅情資
針對 AI 系統攻擊與防禦蒐集並應用威脅情資的方法論。
AI 漏洞優先排序框架
依可利用性、影響與修復成本對 AI 漏洞排序的框架。
撰寫 AI 紅隊報告
撰寫清晰、可行動的 AI 紅隊評估報告(含發現與建議)的指南。
測試 AI21 Labs 模型
AI21 Labs Jamba 模型(含長上下文與效率特性)的紅隊測試指南。
測試 Anthropic Claude:完整指南
Anthropic Claude 的完整紅隊測試指南(含工具使用、延伸思考與電腦使用)。
測試 AWS Bedrock 部署
透過 AWS Bedrock 部署的模型(含護欄與存取控制)的紅隊測試指南。
測試 Azure OpenAI 服務
Azure OpenAI(含內容過濾、受管身分與網路控制)的紅隊測試指南。
測試 Cohere 模型
Cohere Command-R 系列模型(含 RAG 與工具使用特性)的紅隊測試指南。
測試 DeepSeek 模型
DeepSeek 模型(含 R1 推論利用與 MoE 架構)的紅隊測試指南。
測試 Fireworks AI 平台
Fireworks AI(含函式呼叫與複合 AI 系統)的紅隊測試指南。
測試 Google Gemini:完整指南
Google Gemini 的完整紅隊測試指南(含多模態輸入與長上下文)。
測試 Groq 推論平台
Groq 高速推論平台及其安全特性的紅隊測試指南。
測試 Hugging Face 代管模型
Hugging Face 代管模型(含 Inference API 與 Spaces)的紅隊測試指南。
測試 Meta Llama:完整指南
Meta Llama 模型的完整紅隊測試指南(含微調與自架部署)。
測試 Mistral AI 模型
Mistral AI 模型的完整紅隊測試指南,涵蓋 Mixtral MoE 架構與聊天端點。
測試 Ollama 本地部署
透過 Ollama 本地部署模型的安全測試指南,含網路暴露面與 API 安全。
測試 OpenAI GPT-4o:完整指南
OpenAI GPT-4o 的完整紅隊測試指南(含 API、函式呼叫與結構化輸出)。
測試 Replicate 代管模型
針對 Replicate 代管模型(含開源模型部署)的紅隊測試指南。
測試 Together AI 平台
Together AI(含微調模型端點與自訂部署)的紅隊測試指南。
測試 AI21 Jamba 模型
AI21 Jamba 模型(含 SSM-Transformer 混合架構分析)的紅隊測試指南。
測試 Anyscale/Ray Serve 端點
透過 Anyscale 與 Ray Serve 提供服務的模型(含分散式推論安全)的紅隊測試指南。
測試 Cohere 模型:完整指南
Cohere Command 系列模型(含 RAG 整合與工具使用)的完整紅隊測試指南。
測試 Fireworks AI 部署
透過 Fireworks AI 部署的模型(含函式呼叫與微調模型)的紅隊測試指南。
測試 GCP Vertex AI 部署
透過 GCP Vertex AI 部署的模型(含 Model Garden 與自訂端點)的紅隊測試指南。
測試 Mistral/Mixtral:完整指南
Mistral 與 Mixtral 模型的完整紅隊測試指南,含 MoE 路由與 API 漏洞。
測試 Ollama 本地部署(平台詳解)
透過 Ollama 本地部署的模型(含 API 端點與模型管理)的紅隊測試指南。
測試 Replicate 代管模型
Replicate 代管模型(含 API 利用與模型版本管理)的紅隊測試指南。
測試 Together AI 代管模型
Together AI 推論平台代管模型的紅隊測試指南。
測試 vLLM 推論部署
透過 vLLM 提供服務的模型(含批次、KV 快取與推測解碼)的紅隊測試指南。
安全基準測試執行器開發
建立基準測試執行器,用於跨模型與配置標準化評估 LLM 安全。
建立攻擊重播工具
建立能錄製並重播攻擊序列的工具,供回歸測試與防禦驗證使用。
Counterfit ML 安全測試
使用 Microsoft Counterfit 對部署的模型端點進行對抗性 ML 測試。
建立自訂 LLM Fuzzer
為 LLM 應用建立自訂模糊測試工具,含變異引擎與覆蓋率追蹤。
建立自訂載荷變異引擎
開發透過遺傳演算法演化提示詞注入載荷的變異引擎。
Garak 進階配置指南
深入 Garak 配置系統(含自訂探測器、偵測器與生成器)。
建立自訂 Garak 探測器(工具詳解)
為 NVIDIA Garak 開發自訂探測模組,以測試領域特定漏洞。
HarmBench 自訂行為集
以自訂行為規格擴充 HarmBench,進行領域特定的漏洞測試。
HarmBench 評測詳解
使用 HarmBench 框架執行標準化攻擊評測並解讀結果。
工具導覽
必備 AI 紅隊演練工具的端對端實務導覽,涵蓋安裝、設定、執行與結果詮釋。
Inspect AI 安全評測
使用 UK AISI Inspect 框架建立並執行 AI 安全評測。
JailbreakBench 使用與結果提交
使用 JailbreakBench 評估越獄技術並將結果提交至基準測試。
LLM Guard 部署與測試
部署 LLM Guard 進行輸入/輸出掃描,並測試其對常見攻擊的成效。
建立 LLM 流量分析工具
建立基於代理伺服器的 LLM 流量分析工具,攔截並分析 API 通訊。
Promptfoo CI/CD 整合
將 Promptfoo 整合至 CI/CD 管線,進行持續的 LLM 安全回歸測試。
Promptfoo 紅隊評測
配置 Promptfoo 進行完整紅隊評測,含自訂斷言與評分器。
PyRIT 多輪攻擊戰役編排
使用 PyRIT 編排器系統建立並執行複雜的多輪攻擊戰役。
PyRIT 自訂評分整合
將自訂評分指標整合至 PyRIT,用於組織特定的紅隊評估標準。
代理安全掃描器開發
為 AI 代理系統建立自動化安全掃描器,測試工具使用、記憶與工作流程漏洞。
自動化紅隊報告生成
以測試資料與發現為輸入,自動生成結構化紅隊報告的系統。
建立 LLM 攻擊代理伺服器
為 LLM API 流量建立攔截式代理伺服器,支援載荷注入、回應分析與日誌記錄。
防禦基準測試工具開發
建立工具以標準化攻擊套件對防禦措施的成效進行基準測試。
建立嵌入層攻擊工具組
建立用以進行嵌入層攻擊的工具組,含反演、投毒與操縱。
Garak 自訂外掛開發
為組織特定測試開發自訂 Garak 外掛,含探測器、偵測器與生成器。
Inspect AI 自訂 Solver 開發
在 Inspect AI 中建立自訂 Solver,用於專門的安全評測情境與攻擊模擬。
越獄資料集整理工具
建立用以整理、組織與測試越獄提示詞資料集的工具,並追蹤成效。
LLM 流量分析工具
建立工具分析並視覺化 LLM API 流量樣態,辨識攻擊指標。
MCP 安全稽核工具
建立工具,對 MCP 伺服器實作進行常見安全漏洞與設定錯誤稽核。
多模型測試骨架建置
建立統一測試骨架,跨 OpenAI、Anthropic、Google 與本地模型端點執行攻擊。
載荷變異框架開發
建立具文法式、語意式與對抗式變異策略的載荷變異框架。
Promptfoo CI/CD 整合(工具詳解)
將 Promptfoo 紅隊評測整合至 CI/CD 管線,進行自動化安全回歸測試。
PyRIT 自訂 Converter 開發
建立自訂 PyRIT Converter,用於載荷轉換、編碼與多模態攻擊生成。
RAG 安全測試框架
為 RAG 應用建立系統性安全測試框架,含投毒與資料外洩。