# defense

防禦 Destroyer Challenge

繞過a state-of-the-art multi-layer defense system to extract a protected secret from the model.

destroyerlabsdefensectf

專家

Defense Gauntlet: Level 2 — Layered Security

Bypass a layered defense system with input classification, 護欄s, and output filtering.

labsctfdefenselayered

ctfdefenseblue-teamguardrails

CTF：防禦挑戰賽（藍隊）

藍隊 CTF 挑戰：你要建構並防禦一個 AI 聊天機器人，對抗一連串日益精密的自動化攻擊。

labsctfdefenseconstruction

防禦 Builder: Construct and Survive

Build defensive guardrails for an LLM application與survive 100 automated attack rounds.

labdefensemetricsmeasurementguardrailsintermediate

實驗室: 防禦 Effectiveness Measurement

動手實驗室,主題為quantifying AI guardrail robustness using attack success rates,evasion metrics,false positive rates,statistical analysis of defense performance.

labsdefenseregression-testingintermediate

實驗：防禦回歸測試建置

建構一個回歸測試框架，持續驗證 LLM 防禦對已知攻擊模式仍然有效。

simulationdefensechatbotguardrailsblue-team

Simulation: Build & Defend a Chatbot

防禦 simulation where you build a chatbot with layered defenses, test it against a standardized attack suite, measure defense effectiveness, and iterate on weaknesses.

simulationdefensedefense-in-depthautomationblue-team

Simulation: 防禦 in Depth

專家-level defense simulation implementing a full defense stack including input filter, output monitor, rate limiter, anomaly detector, and circuit breaker, then measuring effectiveness against automated attacks.

專家

Simulation: Guardrail Engineering

防禦 simulation where you design and implement a multi-layer guardrail system, test it against progressively sophisticated attacks, and document false positive/negative rates.

simulationdefenseguardrailsengineeringblue-team

simulationdefensesocmonitoringincident-response

Simulation: AI SOC Simulation

防禦 simulation where you set up monitoring for an AI application, then respond to simulated attacks by practicing alert triage, investigation, and escalation procedures.

defensemultimodalcross-modalperceptual-hashingnsfwsafety

Multimodal 防禦 Strategies

Comprehensive defense approaches for multimodal AI systems: cross-modal verification, perceptual hashing, NSFW detection, input sanitization, and defense-in-depth architectures.

multimodaldefensesafetymonitoringsanitization

防禦多模態 AI 系統

多模態 AI 系統的全面防禦策略，包括輸入清理、跨模態安全分類器、指令層級，以及針對對抗性多模態輸入的監控。

mechanismreferencescatalogdefense

防禦機制目錄

防禦機制目錄,附各攻擊類別的有效性評級。

referencecomparisondefensemechanisms

防禦機制比較

LLM 防禦機制的完整比較,包含護欄、分類器、過濾與架構方法,附有效性資料。

securitychecklistdefensequick-referencebest-practices

LLM 安全檢查清單

LLM 驅動應用程式的完整安全檢查清單，涵蓋輸入驗證、提示詞強化、輸出過濾、工具安全、RAG 管線與事件應變。

purple-teamcollaborationdefenseattackmethodologytradecraftadvanced

Purple Teaming for AI

Collaborative attack-defense exercises for AI systems: structuring purple team engagements, real-time knowledge transfer, joint attack simulation, and measuring defensive improvement through iterative testing.

walkthroughsdefenserobustness-testingframework

對抗韌性測試框架

建構一個框架，持續測試已部署的 LLM 防禦機制的對抗韌性。

walkthroughsdefenseadversarial-trainingrobustness

用於 LLM 防禦的對抗訓練（防禦逐步指南）

實作對抗訓練技術，提升 LLM 對抗提示注入與越獄的韌性。

walkthroughsdefenseagent-permissionsboundaries

代理權限邊界的強制執行

為 LLM 代理實作細緻的權限邊界，依脈絡與使用者角色限制工具存取。

automatedtestingwalkthroughsdefense

Automated 防禦 Regression Testing

Build automated regression tests for LLM security defenses using attack replay and continuous monitoring.

walkthroughsdefenseautomated-red-teamcontinuous

自動化紅隊防禦迴路

建構一個自動化的持續紅隊測試迴路，以迭代方式發現並修補漏洞。

walkthroughsdefensebehavioral-anomalydetection

LLM 的行為異常偵測

實作行為異常偵測，辨識模型輸出何時偏離預期的安全樣貌。

walkthroughsdefenseclassifierinput-safety

Building an Input Safety Classifier

Build a production-quality input classifier that detects prompt injection attempts.

input-sanitizationprompt-injectiondefensellm-securityinput-validationwalkthrough

Building a Production Input Sanitizer

Step-by-step walkthrough for building a production-grade input sanitizer that cleans, normalizes, and validates user prompts before they reach an LLM, covering encoding normalization, injection pattern stripping, length enforcement, and integration testing.

canary-tokensprompt-injectiondetectionmonitoringdefensewalkthrough

Canary Token Deployment

Step-by-step walkthrough for deploying canary tokens in LLM system prompts and context to detect prompt injection and data exfiltration attempts, covering token generation, placement strategies, monitoring, and alerting.

access-controlcapabilitiespermissionsauthorizationdefensewalkthrough

能力式存取控制

為 LLM 功能實作細粒度能力控管的逐步演練,涵蓋能力符記設計、權限範圍、動態能力授予與稽核軌跡。

walkthroughsdefenseconstitutional-aialignment

Constitutional AI Implementation 指南

Implement constitutional AI principles in a custom fine-tuning and RLHF pipeline.

constitutional-aiclassifierprinciplessafetydefensewalkthrough

憲法式分類器設置

實作憲法式 AI 風格分類器以原則集合評估 LLM 輸出的逐步演練,涵蓋原則定義、分類器訓練、思維鏈評估與部署。

content-filteringdefenseclassifiersmoderationllm-judgeimplementationwalkthrough

Setting Up Content Filtering

Step-by-step walkthrough for implementing multi-layer content filtering for AI applications: keyword filtering, classifier-based detection, LLM-as-judge evaluation, testing effectiveness, and tuning for production.

walkthroughsdefensecontext-isolationpatterns

脈絡隔離模式實作

實作脈絡隔離模式，防止系統提示與使用者資料之間的指令外洩。

walkthroughsdefensedlpdata-protection

LLM 應用的資料外洩防護

為 LLM 應用建構資料外洩防護層，防止輸出中暴露敏感資料。

benchmarkingsystemwalkthroughsdefense

防禦基準測試系統詳解

Build a benchmarking system to continuously evaluate defense effectiveness against known attack classes.

walkthroughsdefensedefense-in-deptharchitecture

Defense-in-Depth Architecture for LLM Apps

Design and implement a complete defense-in-depth architecture for production LLM applications.

nemo-guardrailsnvidiaguardrailscolangdefensewalkthrough

部署 NeMo Guardrails

於生產環境設置 NVIDIA NeMo Guardrails 的逐步演練，涵蓋安裝、Colang 配置、自訂動作、主題與安全護欄、測試與監控。

dual-llmarchitecturevalidationdefense-in-depthdefensewalkthrough

雙 LLM 架構設置

實作雙 LLM 模式的逐步演練——一個模型產生回應、另一個模型驗證之,涵蓋架構設計、驗證者提示詞工程、延遲最佳化與失敗處理。

hallucinationdetectionfactual-groundingoutput-filteringdefensewalkthrough

Hallucination Detection

Step-by-step walkthrough for detecting and flagging hallucinated content in LLM outputs, covering factual grounding checks, self-consistency verification, source attribution validation, and confidence scoring.

guardrailsinput-validationprompt-injection-defensecontent-safetydefensewalkthrough

Building Input Guardrails for LLM Applications

Step-by-step walkthrough for implementing production-grade input guardrails that protect LLM applications from prompt injection, content policy violations, and resource abuse through multi-layer validation, classification, and rate limiting.

incident-responseplaybookbreach-responsecontainmentinvestigationdefensewalkthrough

Incident Response Playbook for AI 安全 Breaches

導覽 for building an incident response playbook tailored to AI security breaches, covering detection triggers, triage procedures, containment strategies, investigation workflows, remediation validation, and post-incident review processes.

setupdefenseresponseincidentwalkthroughs

AI Incident Response System 設置

Set up comprehensive incident response capabilities for AI-specific security incidents.

defenseguardrailsmonitoringincident-responseimplementationwalkthrough

防禦實作演練

實作 AI 安全防禦的逐步指南：護欄配置、監控與偵測設置，以及 AI 系統的事件回應準備。

walkthroughsdefenseembedding-firewallinput-filtering

輸入嵌入防火牆部署

部署基於嵌入的防火牆，透過與已知攻擊模式的語意相似度偵測注入嘗試。

walkthroughsdefenseloggingforensics

用於防禦的安全輸入／輸出記錄

為 LLM 的輸入／輸出配對實作安全的記錄機制，在不洩漏敏感資料的前提下支援鑑識分析。

instruction-hierarchyprompt-injectionprivilege-separationdefenseadvancedwalkthrough

Instruction Hierarchy Enforcement (防禦導覽)

Step-by-step walkthrough for enforcing instruction priority in LLM applications, ensuring system-level instructions always take precedence over user inputs through privilege separation, instruction tagging, and validation layers.

walkthroughsdefensellm-firewallarchitecture

LLM 防火牆架構設計（防禦逐步指南）

設計並實作一個全面的 LLM 防火牆，具備輸入分析、輸出過濾與行為監控。

deploymentdefensellmhoneypotwalkthroughs

LLM 蜜罐部署詳解

Deploy LLM honeypots to detect and study attacker behavior patterns and techniques.

llm-judgeoutput-validationsafetyevaluationdefensewalkthrough

LLM 評審實作

使用 LLM 評審另一個 LLM 之輸出以評估安全與品質的逐步演練,涵蓋評審提示詞設計、評分準則、校準、成本最佳化與部署模式。

defensemonitoringmodelbehaviorwalkthroughs

Model Behavior Monitoring 設置

Set up comprehensive model behavior monitoring to detect drift, anomalies, and potential compromise.

output-validationsanitizationresponse-safetyschema-validationdefensewalkthrough

Validating and Sanitizing 模型 Outputs

導覽 for building output validation systems that verify LLM responses meet structural, factual, and safety requirements before delivery, covering schema validation, factual grounding checks, response consistency verification, and safe rendering.

walkthroughsdefensemonitoringabuse-detection

Monitoring LLM Applications for Abuse

Build a monitoring and alerting system to detect ongoing attacks against LLM applications.

monitoringobservabilitysecurity-eventsalertingproductiondefensewalkthrough

Production Monitoring for LLM 安全 Events

導覽 for building production monitoring systems that detect LLM security events in real time, covering log collection, anomaly detection, alert configuration, dashboard design, and incident correlation.

defense-in-depthinput-validationpipelinemulti-layerdefensewalkthrough

Multi-Layer Input Validation

Step-by-step walkthrough for building a defense-in-depth input validation pipeline that combines regex matching, semantic similarity, ML classification, and rate limiting into a unified validation system for LLM applications.

walkthroughsdefensemulti-modelconsensus

多模型安全共識

實作安全共識機制，要求多個模型達成一致才能執行敏感動作。

output-filteringclassifiercontent-moderationsafetydefensewalkthrough

Output Content Classifier

Step-by-step walkthrough for building a classifier to filter harmful LLM outputs, covering taxonomy definition, multi-label classification, threshold calibration, and deployment as a real-time output gate.

walkthroughsdefenseoutput-filteringpipeline

Output Filtering Pipeline Design

Design and implement a multi-stage output filtering pipeline for LLM applications.

output-filteringcontent-safetypii-detectionresponse-validationdefensewalkthrough

Output Filtering and Content Safety Implementation

導覽 for building output filtering systems that inspect and sanitize LLM responses before they reach users, covering content classifiers, PII detection, response validation, canary tokens, and filter bypass resistance.

walkthroughsdefenseoutput-groundingverification

輸出基準對齊與驗證

實作輸出基準對齊驗證，確保 LLM 回應在事實上有所提供的脈絡支持。

piiredactionprivacyoutput-filteringcompliancedefensewalkthrough

PII Redaction Pipeline

Step-by-step walkthrough for building an automated PII detection and redaction pipeline for LLM outputs, covering regex-based detection, NER-based detection, presidio integration, redaction strategies, and compliance testing.

defenseimplementationarmorpromptwalkthroughs

Prompt Armor Implementation 指南

Implement a comprehensive prompt armoring system with instruction isolation, delimiter hardening, and priority enforcement.

classifiermachine-learningprompt-injectiondetectiontrainingdefensewalkthrough

Prompt Classifier 訓練

Step-by-step walkthrough for training a machine learning classifier to detect malicious prompts, covering dataset curation, feature engineering, model selection, training pipeline, evaluation, and deployment as a real-time detection service.

prompt-injectionmachine-learningdetectionclassifierdefensewalkthrough

ML-Based 提示詞注入 Detection Systems

導覽 for building and deploying ML-based prompt injection detection systems, covering training data collection, feature engineering, model architecture selection, threshold tuning, production deployment, and continuous improvement.

walkthroughsdefensehoneypotdetection

提示注入蜜罐部署

部署蜜罐提示與金絲雀資料，以偵測並刻畫提示注入嘗試。

ragaccess-controlretrievaldata-leakageauthorizationdefensewalkthrough

Implementing Access Control in RAG Pipelines

導覽 for building access control systems in RAG pipelines that enforce document-level permissions, prevent cross-user data leakage, filter retrieved context based on user authorization, and resist retrieval poisoning attacks.

walkthroughsdefenseragsanitization

RAG Input Sanitization 詳解

Implement input sanitization for RAG systems to prevent document-based injection attacks.

rate-limitingabuse-preventionapi-securitytoken-bucketcost-controldefensewalkthrough

Rate Limiting and Abuse Prevention for LLM APIs

導覽 for implementing rate limiting and abuse prevention systems for LLM API endpoints, covering token bucket algorithms, per-user quotas, cost-based limiting, anomaly detection, and graduated enforcement.

rate-limitingdefenseabuse-preventiontokensthrottlingimplementationwalkthrough

AI Rate Limiting 導覽

Step-by-step walkthrough for implementing token-aware rate limiting for AI applications: request-level limiting, token budget enforcement, sliding window algorithms, abuse detection, and production deployment.

defenserealtimedetectionattackwalkthroughs

即時攻擊偵測系統詳解

Build a real-time attack detection system that monitors LLM interactions for adversarial patterns.

walkthroughsdefensered-teamfeedback-loop

Red Team-Defense Feedback Loop

Build a continuous red team-defense improvement loop with automated testing and metric tracking.

regexprompt-injectionpattern-matchinginput-filteringdefensewalkthrough

Regex-Based Prompt Filter

Step-by-step walkthrough for building a regex-based prompt filter that detects common injection payloads using pattern matching, covering pattern library construction, performance optimization, false positive management, and continuous updates.

response-boundariesoutput-filteringcontent-policyguardrailsdefensewalkthrough

Response Boundary Enforcement

Step-by-step walkthrough for keeping LLM responses within defined topic, format, and content boundaries, covering boundary definition, violation detection, response rewriting, and boundary drift monitoring.

walkthroughsdefensewatermarkingtraceability

回應浮水印實作

實作回應浮水印，以實現可追溯性並偵測 LLM 輸出的未授權重製。

walkthroughsdefenseruntime-monitorsafety

執行期安全監控器實作

實作執行期安全監控器，即時偵測並封鎖不安全的模型輸出。

walkthroughsdefensesafety-classifiertraining

訓練自訂安全分類器

訓練針對你應用程式特定威脅模型與內容政策調校的自訂安全分類器。

sandboxtool-executionisolationcontainerssecuritydefensewalkthrough

沙箱化工具執行

於隔離沙箱中執行 LLM 工具呼叫的逐步演練，涵蓋以容器為基礎的隔離、資源限制、網路限制與輸出清理。

sandboxingtool-usepermissionsagent-securityleast-privilegedefensewalkthrough

Sandboxing and Permission 模型s for 工具-Using 代理s

導覽 for implementing sandboxing and permission models that constrain tool-using LLM agents, covering least-privilege design, parameter validation, execution sandboxes, approval workflows, and audit logging.

architecturesecuredefenseagentwalkthroughs

Secure Agent Architecture Design

Design a secure architecture for LLM agent systems with sandboxing, capability controls, and audit trails.

callingsecuredefensepatternsfunctionwalkthroughs

安全函式呼叫模式詳解

Implement secure function calling with input validation, output sanitization, and capability restrictions.

architecturesecuredefenseragwalkthroughs

Secure RAG Architecture 詳解

Design and implement a secure RAG architecture with document sanitization, access controls, and output validation.

walkthroughsdefensesecure-ragpipeline

安全的 RAG 管線架構

建構安全的 RAG 管線，具備文件淨化、取回驗證與抗注入的增強。

embeddingssemantic-similarityprompt-injectionvector-databasedefensewalkthrough

Semantic Similarity Detection

Step-by-step walkthrough for using text embeddings to detect semantically similar prompt injection attempts, covering embedding model selection, vector database setup, similarity threshold tuning, and production deployment.

session-isolationmulti-tenancysecuritydefensewalkthrough

會話隔離模式

於 LLM 應用中隔離使用者會話的逐步演練，防止使用者之間的上下文、記憶與權限互相污染。

structured-outputvalidationjson-schemaoutput-filteringdefensewalkthrough

Structured Output Validation

Step-by-step walkthrough for validating structured LLM outputs against schemas, covering JSON schema validation, type coercion, constraint enforcement, and handling malformed model outputs gracefully.

walkthroughsdefensesystem-promptprotection

系統提示保護技術

實作多層次的系統提示保護，以對抗擷取、覆寫與外洩攻擊。

threatsecuritydefenseintelligencewalkthroughs

AI Security Threat Intelligence

Build a threat intelligence pipeline for staying current with AI security threats and attack techniques.

walkthroughsdefensetoken-filteringinput-safety

Token 層級輸入過濾

實作 Token 層級的輸入過濾，在模型處理之前、於 token 化階段偵測注入模式。

walkthroughsdefensetool-authorizationframework

工具呼叫授權框架

實作工具呼叫授權框架，在執行之前依政策驗證工具的呼叫。

toxicityscoringoutput-filteringcontent-moderationsafetydefensewalkthrough

毒性評分管線

建置 LLM 輸出過濾毒性評分管線的逐步詳解,涵蓋模型選擇、多維評分、閾值校準與即時評分的生產部署。

unicodenormalizationencodinghomoglyphdefensewalkthrough

Unicode Normalization 防禦

Step-by-step walkthrough for implementing Unicode normalization to prevent encoding-based prompt injection bypasses, covering homoglyph detection, invisible character stripping, bidirectional text handling, and normalization testing.

walkthroughsdefenseagentaccess-control

Agent Tool Access Control Implementation

Implement fine-grained tool access control for LLM agents with capability-based security and approval workflows.

walkthroughsdefenseanomaly-detectiontraffic

Anomaly Detection for LLM Traffic

Build anomaly detection for LLM API traffic to identify attack patterns, abuse, and prompt injection attempts.

walkthroughsdefenseconversation-integritymonitoring

Conversation Integrity Monitoring

Build a conversation integrity monitoring system that detects manipulation across multi-turn interactions.

walkthroughsdefensemetricsdashboard

Defense Effectiveness Metrics Dashboard

Build a dashboard to measure and visualize the effectiveness of defensive measures against attack categories.

walkthroughsdefensetestingautomation

Automated 防禦 Testing Pipeline

Build an automated pipeline that continuously tests defensive measures against evolving attack techniques.

walkthroughsdefenseembeddingpoisoning-detection

Embedding Poisoning Detection System

Build a detection system for identifying poisoned documents in vector databases using statistical analysis.

walkthroughsdefensefunction-callingguardrails

Function Calling Guardrails Implementation

Implement guardrails for function calling that validate tool selection, parameters, and execution scope.

walkthroughsdefensehoneypotdeception

LLM Honeypot Deployment 指南

Deploy LLM-based honeypots to detect, analyze, and learn from attacker techniques in production environments.

walkthroughsdefenseincident-responseplaybook

Incident Response Playbook for LLM Applications

Design and implement an incident response playbook specific to LLM application security incidents.

walkthroughsdefensellm-guardproduction

LLM Guard Production Deployment 指南

Deploy LLM Guard in a production environment with custom scanners, performance optimization, and monitoring.

walkthroughsdefensemcpserver-hardening

MCP Server Security Hardening 指南

Harden MCP server implementations against tool poisoning, transport attacks, and capability escalation.

walkthroughsdefensemulti-modelensemble

Multi-Model 防禦 Ensemble

Build an ensemble defense system using multiple models to cross-validate inputs and outputs for safety.

walkthroughsdefensenemo-guardrailsadvanced

NeMo Guardrails Advanced Configuration

Advanced NeMo Guardrails configuration including custom actions, knowledge base integration, and multi-model pipelines.

walkthroughsdefenseoutput-scanningpipeline

Building an Output Scanning Pipeline

Build a production output scanning pipeline that detects data leakage, harmful content, and injection artifacts.

walkthroughsdefensepiiredaction

PII Detection and Redaction for LLMs

Build a PII detection and redaction system for LLM inputs and outputs to prevent data exposure.

walkthroughsdefenseclassifiertraining

Training a Prompt Injection Classifier

Train a custom prompt injection detection classifier using labeled datasets and modern NLP techniques.

walkthroughsdefenseragsandboxing

RAG Document Sandboxing Implementation

Implement document-level sandboxing for RAG systems to prevent cross-document injection and privilege escalation.

walkthroughsdefenserate-limitingadaptive

Adaptive Rate Limiting for LLM APIs

Implement adaptive rate limiting that adjusts thresholds based on detected attack patterns and user behavior.

walkthroughsdefenseragsecure-architecture

Secure RAG Architecture Implementation

Implement a security-hardened RAG architecture with input sanitization, access control, and output validation.

walkthroughsdefensesystem-promptprotection

System Prompt Protection Layers

Implement multi-layer system prompt protection including injection-resistant formatting and runtime validation.

architecturetrustdefensellmzerowalkthroughs

Zero Trust Architecture for LLM Apps

Implement zero trust principles in LLM application architecture with continuous verification and least privilege.

walkthroughstutorialstoolsmethodologydefenseplatformsengagements

逐步導覽

涵蓋紅隊工具、案件方法論、防禦實作、平台特定測試與完整案件工作流程的逐步引導式導覽。

walkthroughsllm-guarddeploymentdefense

LLM Guard 部署與測試

部署 LLM Guard 進行輸入/輸出掃描,並測試其對常見攻擊的成效。

walkthroughstoolsbenchmarkingdefense

防禦基準測試工具開發

建立工具以標準化攻擊套件對防禦措施的成效進行基準測試。