# prompt-injection
標記為「prompt-injection」的 259 篇文章
Orchestrator Compromise Techniques
Advanced techniques for compromising central orchestration agents in multi-agent systems, including result poisoning, recursive task bombs, capability confusion, consensus manipulation, fallback path exploitation, and orchestrator prompt injection.
Prompt Injection Chain Analysis
Analyzing chains of prompt injection attacks across multi-step AI systems, including indirect injection propagation, agentic exploitation, and cross-system attack correlation.
Prompt Injection Forensics
Forensic investigation techniques for prompt injection incidents including log analysis and payload reconstruction.
Fundamentals Practice Exam
25-question practice exam covering LLM fundamentals, prompt injection basics, safety mechanisms, red team methodology, and AI threat landscape at an intermediate level.
Practice Exam 1: AI Red Team Fundamentals
25-question practice exam covering LLM architecture, prompt injection, agent exploitation, defense mechanisms, and red team methodology at an intermediate level.
Advanced Prompt Injection Practice Exam
Practice exam on indirect injection, multi-step chains, encoding bypass, and semantic manipulation.
Prompt Injection Mastery Assessment
Comprehensive assessment covering direct injection, indirect injection, and defense evasion techniques.
Prompt Injection Assessment
Test your knowledge of prompt injection types, techniques, defense mechanisms, and real-world exploitation with 15 intermediate-level questions.
Skill Verification: Prompt Injection
Hands-on skill verification requiring live exploitation of prompt injection vulnerabilities.
Skill Verification: Prompt Injection (Assessment)
Timed skill verification lab: extract a system prompt from a defended AI system within 15 minutes using prompt injection techniques.
Capstone: Build a Prompt Injection Detection Scanner
Build a production-grade prompt injection scanner that combines static analysis, ML classification, and runtime monitoring to detect injection attacks across LLM applications.
Case Study: MCP Tool Poisoning Attacks (Invariant Labs 2025)
Analysis of tool poisoning vulnerabilities in the Model Context Protocol (MCP) discovered by Invariant Labs, where malicious tool descriptions manipulate AI agents into data exfiltration and unauthorized actions.
Case Study: Prompt Injection Attacks on Google Bard/Gemini
Analysis of prompt injection vulnerabilities discovered in Google Bard (later Gemini), including indirect injection through Google Workspace integration and the unique attack surface created by multimodal capabilities.
Bedrock Attack Surface
Comprehensive red team methodology for Amazon Bedrock: model invocation API abuse, guardrails bypass techniques, custom model endpoint exploitation, IAM misconfigurations, knowledge base poisoning, and Bedrock Agents exploitation.
Context Manipulation for Code Suggestions
Techniques for manipulating AI coding assistant context: crafting files that influence suggestions, README-driven attacks, and comment injection to steer code generation.
Prompt Injection via Code
Hiding prompt injection payloads in code artifacts — through comments, variable names, strings, docstrings, and file metadata — to attack AI coding tools and code-processing pipelines.
January 2026: System Prompt Extraction Challenge
Extract system prompts from five increasingly defended chatbots, progressing from unprotected to heavily hardened configurations.
March 2026: Agent Exploitation Challenge
Compromise a multi-tool agent system through prompt injection and tool abuse, completing multiple objectives with escalating difficulty and point values.
Patterns for Hardening System Prompts
Practical patterns and techniques for hardening LLM system prompts against injection, extraction, and manipulation attacks, including structural defenses, instruction hierarchy, delimiter strategies, and defense-in-depth approaches.
AI Exploit Development
Adversarial suffix generation, gradient-free optimization, WAF-evading injection payloads, and fuzzing frameworks for AI systems.
Crafting Adversarial Payloads
Systematic methodology for creating effective prompt injection payloads, including template design, optimization techniques, and multi-technique combination strategies.
Formal Models of Prompt Injection
Theoretical frameworks for formally modeling and reasoning about prompt injection vulnerabilities.
Advanced Prompt Injection
Expert techniques for instruction hierarchy exploitation, multi-stage injection chains, indirect injection via structured data, payload obfuscation, and quantitative attack measurement.
Prompt Injection in Production Systems
Real-world case studies of prompt injection exploits in production AI deployments, including attack timelines, impact analysis, and lessons learned.
Injection Research
Advanced research in prompt injection, jailbreak automation, and multimodal attack vectors, covering cutting-edge techniques that push beyond standard injection approaches.
Basic Role-Play Prompt Injection
Use role-play techniques to bypass basic LLM safety guardrails through character assumption.
Lab: Context Manipulation
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
Lab: Context Overflow Attacks
Explore context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.
Lab: Delimiter Injection Attacks
Craft payloads that exploit XML, markdown, and special token delimiters to break instruction boundaries and manipulate how LLMs parse system prompts versus user input.
Lab: Delimiter Escape Attacks
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
Lab: Few-Shot Manipulation Attacks
Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired adversarial behavior through fabricated conversation examples.
Lab: Your First Prompt Injection
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
Lab: Injection Techniques Survey
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
Lab: Multi-Language Injection
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
Lab: Output Steering
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
Lab: Payload Crafting
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
Lab: System Prompt Override
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
Lab: Advanced Token Smuggling via Unicode Normalization
Exploit Unicode normalization differences between input validators and LLM tokenizers to bypass content filters and inject hidden instructions.
Simulation: Enterprise Chatbot Engagement
Full red team engagement simulation targeting a customer-facing chatbot deployed by a fictional e-commerce company, covering reconnaissance, prompt injection, data exfiltration, and PII harvesting.
Simulation: Voice Assistant Red Team
Red team engagement simulation targeting an AI voice assistant deployed in a smart home platform, covering audio-based prompt injection, wake word exploitation, and privacy exfiltration.
Claude Known Vulnerabilities
Documented Claude vulnerabilities including many-shot jailbreaking, alignment faking research, crescendo attacks, prompt injection via artifacts, and system prompt extraction techniques.
Audio Modality Attacks
Comprehensive attack taxonomy for audio-enabled LLMs: adversarial audio generation, voice-based prompt injection, cross-modal split attacks, and ultrasonic perturbations.
Image-Based Prompt Injection Attacks
Comprehensive techniques for injecting adversarial prompts through images, covering typographic injection, steganographic embedding, and visual payload delivery against multimodal AI systems.
Image-Based Prompt Injection
Techniques for embedding text instructions in images to manipulate VLMs, including steganographic injection, visible text attacks, and QR code exploitation.
Typographic Adversarial Attacks
How text rendered in images influences VLM behavior: adversarial typography, font-based prompt injection, visual instruction hijacking, and defenses against typographic manipulation.
Advanced Encoding Chain Attacks (Prompt Injection)
Deep dive into chaining multiple encoding schemes to create injection payloads that bypass layered input filters and content classifiers.
Adversarial Prompt Optimization
Using systematic search and optimization algorithms to discover maximally effective adversarial prompts for specific target behaviors.
Competition-Style Injection Techniques
Injection techniques commonly used in AI red team competitions and CTF challenges.
Context Overflow Attacks
Techniques for filling the LLM context window with padding content to push system instructions out of attention, reducing their influence on model behavior.
Context Window Exploitation
Advanced techniques for exploiting context window mechanics in LLMs, including attention dilution, positional encoding attacks, KV cache manipulation, and context boundary confusion.
Contextual Injection Techniques
Crafting injection payloads that blend seamlessly with the expected conversation context to avoid detection by content classifiers.
Conversation Hijacking Techniques
Taking control of ongoing conversations by injecting pivot points that redirect the model's behavior without triggering safety mechanisms.
Cross-Context Injection (Prompt Injection)
Prompt injection techniques that persist across context boundaries: surviving conversation resets, session switches, memory boundaries, and multi-agent handoffs.
Data Harvesting via Injection
Using injection techniques to extract training data, system prompts, user data, and other sensitive information from LLM applications.
Delimiter-Based Attacks
Exploiting XML, markdown, JSON, special token boundaries, and structured formatting to escape input sandboxes and inject instructions at elevated privilege levels.
Direct Prompt Injection
Techniques for directly injecting instructions into LLM prompts to override system behavior, including instruction override, context manipulation, and format mimicry.
Encoding Bypass Techniques
Using Base64, ROT13, Unicode transformations, hex encoding, and other obfuscation methods to evade prompt injection filters and safety classifiers while preserving semantic meaning.
Prompt Injection & Jailbreaks
A comprehensive introduction to prompt injection — the most fundamental vulnerability class in LLM applications — and its relationship to jailbreak techniques.
Indirect Prompt Injection
How attackers embed malicious instructions in external data sources that LLMs process, enabling attacks without direct access to the model's input.
Injection Chain Automation
Automating the discovery and chaining of multiple injection techniques to create reliable multi-step attack sequences against hardened targets.
Prompt Injection Taxonomy
A comprehensive classification framework for prompt injection attacks, covering direct and indirect vectors, delivery mechanisms, target layers, and severity assessment for systematic red team testing.
Injection via Function Calling
Exploiting function calling and tool-use interfaces to inject adversarial instructions through structured tool inputs and outputs.
Injection via Structured Data Formats
Injecting prompts through structured data formats like JSON, XML, YAML, and SQL.
Instruction Hierarchy Attacks
Exploiting the priority ordering between system, user, and assistant messages to override safety controls, manipulate instruction precedence, and escalate privilege through message role confusion.
Instruction Priority Abuse
Exploiting the implicit priority hierarchy between system prompts, user messages, and assistant responses to escalate injection effectiveness.
Language Gap Exploitation
Exploiting safety training gaps in low-resource languages and cross-lingual injection techniques.
Many-Shot Jailbreaking Analysis
Deep analysis of the many-shot jailbreaking technique and its implications for in-context learning.
Many-Shot Jailbreaking
Power-law scaling of in-context jailbreaks: why 5 shots fail but 256 succeed, context window size as attack surface, and mitigations for long-context exploitation.
Multi-Turn Escalation Attacks
Techniques for gradually escalating through multi-turn conversations to bypass safety training.
Multi-Turn Prompt Injection
Progressive escalation attacks across conversation turns, including crescendo patterns, context steering, trust building, and techniques for evading per-message detection systems.
Multilingual Injection Attacks
Exploiting language diversity and translation inconsistencies to craft injection payloads that evade English-centric safety filters.
Multimodal Text Injection Vectors
Injecting adversarial text instructions through non-text modalities including images with embedded text, audio transcription, and document parsing.
Advanced Payload Obfuscation
Advanced obfuscation techniques for prompt injection payloads including encoding chains and semantic disguise.
Payload Splitting
Breaking malicious instructions across multiple messages, variables, or data sources to evade single-point detection while the model reassembles the complete payload during processing.
Persona-Based Injection Attacks
Advanced persona and role-play based attacks that exploit instruction-following behavior.
Prefix Injection Attacks
Prepending carefully crafted content before user queries to prime model behavior and create a favorable context for subsequent injection attempts.
Recursive Injection Patterns
Self-referential and recursive injection patterns that cause models to amplify and propagate adversarial instructions through their own outputs.
Role Escalation via Injection
Techniques for escalating from user-level to system-level instruction authority through carefully crafted injection sequences.
Sandbox Escape via Injection
Using prompt injection as a vector for escaping application sandboxes and achieving unauthorized code execution or system access.
Semantic Camouflage Attacks
Using semantic similarity and paraphrase techniques to disguise adversarial instructions as benign content while preserving attack effectiveness.
Suffix Manipulation Attacks
Appending optimized adversarial suffixes to user inputs that exploit model continuation tendencies to produce attacker-desired outputs.
System Prompt Theft Techniques
Comprehensive techniques for extracting hidden system prompts from production LLM applications, from simple direct requests to sophisticated indirect methods.
Time-Based Injection Attacks
Attacks that exploit temporal aspects of model interaction including conversation history management, cache behavior, and session handling.
Token-Level Adversarial Attacks
Using gradient-based optimization and token manipulation to discover adversarial suffixes that reliably trigger unsafe model behavior.
Unicode and Homoglyph Injection
Leveraging Unicode normalization inconsistencies, homoglyph substitution, and invisible characters to construct stealthy injection payloads.
Universal Adversarial Triggers
Discovering and deploying universal adversarial trigger sequences that reliably override safety alignment across multiple LLM families, including gradient-based search, transfer attacks, and defense evasion.
Universal Jailbreak Techniques
Analysis of jailbreak techniques that transfer across multiple models and providers.
Attack Payload Reference
Categorized reference of common attack payloads for AI red teaming, including prompt injection, jailbreaks, data extraction, and adversarial inputs with effectiveness notes.
Prompt Injection Cheat Sheet
Quick reference for prompt injection techniques organized by category, with example payloads and defensive considerations for each technique.
Prompt Injection Payload Library
Curated library of prompt injection payloads organized by technique and target defense.
Prompt Injection Quick Reference
Quick reference for prompt injection attack patterns, obfuscation techniques, defense bypasses, and measurement metrics with concrete examples.
System Prompt Extraction Techniques
Catalog of system prompt extraction methods against LLM-powered applications: direct attacks, indirect techniques, multi-turn strategies, and defensive evasion.
Audio Prompt Injection
Injecting adversarial instructions through audio inputs to speech-to-text and multimodal models, exploiting the audio channel as an alternative injection vector.
Code Injection via Markdown
Injecting executable payloads through markdown rendering in LLM outputs, exploiting the gap between text generation and content rendering in web-based LLM interfaces.
Composite Attack Chaining
Combining multiple prompt injection techniques into compound attacks that defeat layered defenses, building attack chains that leverage the strengths of each individual technique.
Context Window Stuffing
Techniques for filling the LLM context window to push system instructions out of active memory, manipulating token budgets to dilute or displace defensive prompts.
Cross-Modal Confusion
Confusing multimodal AI models by sending conflicting or complementary signals across different input modalities to bypass safety mechanisms and exploit fusion weaknesses.
Delimiter Escape Attacks
Techniques for escaping delimiters used to separate system and user content in LLM applications, breaking out of sandboxed input regions to inject instructions.
Direct Injection Basics
Core concepts of directly injecting instructions into LLM prompts, including override techniques, simple payload crafting, and understanding how models parse conflicting instructions.
Encoding-Based Evasion
Using base64, ROT13, hexadecimal, Unicode, and other encoding schemes to evade input detection systems and bypass content filters in LLM applications.
Few-Shot Injection
Using crafted few-shot examples within user input to steer LLM behavior toward unintended outputs, exploiting in-context learning to override safety training.
Image-Based Prompt Injection (Attack Walkthrough)
Embedding text instructions in images that vision models read, enabling prompt injection through the visual modality to bypass text-only input filters and safety mechanisms.
Instruction Hierarchy Bypass
Advanced techniques to bypass instruction priority and hierarchy enforcement in language models, exploiting conflicts between system, user, and assistant-level directives.
Multi-Image Chaining
Chaining prompt injection payloads across multiple images in a conversation to deliver complex attacks that evade per-image content filters and build injection context progressively.
Multi-Turn Progressive Injection
Gradually escalating prompt injection across conversation turns to build compliance, using psychological techniques like foot-in-the-door and norm erosion.
OCR-Based Attacks
Exploiting Optical Character Recognition processing pipelines to inject adversarial text into AI systems, targeting the gap between what OCR extracts and what humans see.
Payload Obfuscation Techniques
Methods for disguising prompt injection payloads through encoding, splitting, substitution, and other obfuscation techniques to bypass input filters and detection systems.
PDF Document Injection
Injecting adversarial prompts through PDF documents processed by AI systems, exploiting document parsing pipelines to deliver payloads through text layers, metadata, and embedded objects.
Prompt Leaking Step by Step
Systematic approaches to extract system prompts from LLM applications, covering direct elicitation, indirect inference, differential analysis, and output-based reconstruction.
QR Code Injection
Using QR codes as prompt injection vectors against vision-language models, encoding adversarial instructions in machine-readable formats that models decode and follow.
Recursive Injection Chains
Creating self-reinforcing injection chains that amplify across conversation turns, building compound prompts where each step strengthens the next injection's effectiveness.
Role-Play Injection
Using fictional scenarios, character role-play, and narrative framing to bypass LLM safety filters by having the model operate within a permissive fictional context.
Steganographic Payload Delivery
Hiding prompt injection payloads in images using steganographic techniques, delivering adversarial instructions through pixel-level modifications invisible to human observers.
System Prompt Override
Techniques to override, replace, or neutralize LLM system prompts through user-level injection, analyzing how system prompt authority can be undermined.
Translation Injection
Using translation requests and low-resource languages to bypass content filters, exploiting the uneven distribution of safety training across languages.
Typography Injection in Images
Using rendered text with specific fonts, styles, and typographic techniques in images to inject prompts into vision-language models while evading detection.
Video Frame Injection (Attack Walkthrough)
Embedding prompt injection payloads in specific video frames to attack multimodal models that process video content, exploiting temporal and visual channels simultaneously.
Virtual Persona Creation
Creating persistent alternate personas within LLM conversations to bypass safety training, establishing character identities that override the model's default behavioral constraints.
Building a Production Input Sanitizer
Step-by-step walkthrough for building a production-grade input sanitizer that cleans, normalizes, and validates user prompts before they reach an LLM, covering encoding normalization, injection pattern stripping, length enforcement, and integration testing.
Canary Token Deployment
Step-by-step walkthrough for deploying canary tokens in LLM system prompts and context to detect prompt injection and data exfiltration attempts, covering token generation, placement strategies, monitoring, and alerting.
Instruction Hierarchy Enforcement (Defense Walkthrough)
Step-by-step walkthrough for enforcing instruction priority in LLM applications, ensuring system-level instructions always take precedence over user inputs through privilege separation, instruction tagging, and validation layers.
Prompt Classifier Training
Step-by-step walkthrough for training a machine learning classifier to detect malicious prompts, covering dataset curation, feature engineering, model selection, training pipeline, evaluation, and deployment as a real-time detection service.
ML-Based Prompt Injection Detection Systems
Walkthrough for building and deploying ML-based prompt injection detection systems, covering training data collection, feature engineering, model architecture selection, threshold tuning, production deployment, and continuous improvement.
Regex-Based Prompt Filter
Step-by-step walkthrough for building a regex-based prompt filter that detects common injection payloads using pattern matching, covering pattern library construction, performance optimization, false positive management, and continuous updates.
Semantic Similarity Detection
Step-by-step walkthrough for using text embeddings to detect semantically similar prompt injection attempts, covering embedding model selection, vector database setup, similarity threshold tuning, and production deployment.
Chatbot Red Team Engagement
Step-by-step walkthrough for a complete chatbot red team assessment: scoping, system prompt extraction, content filter bypass, PII leakage testing, multi-turn manipulation, and professional reporting.
LangChain Application Security Testing
End-to-end walkthrough for security testing LangChain applications: chain enumeration, prompt injection through chains, tool and agent exploitation, retrieval augmented generation attacks, and memory manipulation.
Testing Prompt Injection Defenses with Rebuff
Walkthrough for using Rebuff to test and evaluate prompt injection detection capabilities, covering installation, detection pipeline analysis, adversarial evasion testing, custom rule development, and benchmarking detection accuracy.
Orchestrator Compromise Techniques
進階 techniques for compromising central orchestration agents in multi-agent systems, including result poisoning, recursive task bombs, capability confusion, consensus manipulation, fallback path exploitation, and orchestrator prompt injection.
提示詞注入 Chain Analysis
Analyzing chains of prompt injection attacks across multi-step AI systems, including indirect injection propagation, agentic exploitation, and cross-system attack correlation.
提示詞注入 Forensics
Forensic investigation techniques for prompt injection incidents including log analysis and payload reconstruction.
基礎 Practice Exam
25-question practice exam covering LLM fundamentals, prompt injection basics, safety mechanisms, red team methodology, and AI threat landscape at an intermediate level.
Practice Exam 1: AI 紅隊 基礎
25-question practice exam covering LLM architecture, prompt injection, agent exploitation, defense mechanisms, and red team methodology at an intermediate level.
進階 提示詞注入 Practice Exam
Practice exam on indirect injection, multi-step chains, encoding bypass, and semantic manipulation.
提示詞注入 Mastery 評量
Comprehensive assessment covering direct injection, indirect injection, and defense evasion techniques.
章節評量:提示詞注入
15 題校準評量,測試你對提示詞注入與越獄技術的理解。
Skill Verification: 提示詞注入
Hands-on skill verification requiring live exploitation of prompt injection vulnerabilities.
技能驗證:提示詞注入(評估)
計時技能驗證實驗室:使用提示詞注入技術在 15 分鐘內從防禦 AI 系統萃取系統提示詞。
AI 安全事件:2025-2026 年回顧
從 2025 年到 2026 年初的著名 AI 安全事件彙整,涵蓋生產環境中的提示詞注入、代理利用與新興攻擊類別。
駭入 AI 駭客:當安全工具成為漏洞
新研究對 AI 驅動安全工具達成 100% 提示詞注入成功率。如果你的 SOC 使用 AI 進行威脅偵測,你的 AI 可被反向對付你。以下是研究發現與防禦方法。
2026 年提示詞注入現況
提示詞注入攻擊如何從簡單的指令覆蓋演進為複雜的多階段利用鏈。
Capstone: Build a 提示詞注入 Detection Scanner
Build a production-grade prompt injection scanner that combines static analysis, ML classification, and runtime monitoring to detect injection attacks across LLM applications.
Case Study: MCP 工具 投毒 攻擊s (Invariant 實驗室s 2025)
Analysis of tool poisoning vulnerabilities in the 模型 Context Protocol (MCP) discovered by Invariant 實驗室s, where malicious tool descriptions manipulate AI agents into data exfiltration and unauthorized actions.
Case Study: 提示詞注入 攻擊s on Google Bard/Gemini
Analysis of prompt injection vulnerabilities discovered in Google Bard (later Gemini), including indirect injection through Google Workspace integration and the unique attack surface created by multimodal capabilities.
Bedrock 攻擊 Surface
Comprehensive red team methodology for Amazon Bedrock: model invocation API abuse, guardrails bypass techniques, custom model endpoint exploitation, IAM misconfigurations, knowledge base poisoning, and Bedrock 代理s exploitation.
為程式碼建議之脈絡操弄
操弄 AI 編碼助手脈絡之技術:打造影響建議之檔案、README 驅動之攻擊,與註解注入以引導程式碼生成。
透過程式碼的提示詞注入
在程式碼產物中隱藏提示詞注入 payload——透過註解、變數名、字串、docstring 與檔案 metadata——以攻擊 AI 程式設計工具與程式碼處理管線。
January 2026: System Prompt Extraction Challenge
Extract system prompts from five increasingly defended chatbots, progressing from unprotected to heavily hardened configurations.
2026 年 3 月:代理利用挑戰
經提示注入與工具濫用破壞多工具代理系統,以升級之難度與分數值完成多個目標。
Patterns for Hardening System Prompts
Practical patterns and techniques for hardening LLM system prompts against injection, extraction, and manipulation attacks, including structural defenses, instruction hierarchy, delimiter strategies, and defense-in-depth approaches.
AI Exploit 開發
對抗後綴生成、無梯度最佳化、規避 WAF 之注入 payload,以及對 AI 系統之 fuzzing 框架。
打造對抗性 Payload
建立有效提示詞注入 payload 的系統性方法論,包含範本設計、最佳化技術與多技術組合策略。
Formal 模型s of 提示詞注入
Theoretical frameworks for formally modeling and reasoning about prompt injection vulnerabilities.
進階 提示詞注入
專家 techniques for instruction hierarchy exploitation, multi-stage injection chains, indirect injection via structured data, payload obfuscation, and quantitative attack measurement.
提示詞注入 in Production Systems
Real-world case studies of prompt injection exploits in production AI deployments, including attack timelines, impact analysis, and lessons learned.
注入研究
提示詞注入、越獄自動化與多模態攻擊向量的進階研究,涵蓋超越標準注入方法的尖端技術。
Basic Role-Play 提示詞注入
Use role-play techniques to bypass basic LLM safety guardrails through character assumption.
實驗室: Context Manipulation
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
實驗室: Context Overflow 攻擊s
Explore context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.
實驗室: Delimiter Injection 攻擊s
Craft payloads that exploit XML, markdown, and special token delimiters to break instruction boundaries and manipulate how LLMs parse system prompts versus user input.
實驗室: Delimiter Escape 攻擊s
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
實驗室: Few-Shot Manipulation 攻擊s
Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired adversarial behavior through fabricated conversation examples.
實驗室: Your First 提示詞注入
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
實驗室: Injection Techniques Survey
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
實驗室: Multi-Language Injection
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
實驗室: Output Steering
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
實驗室: Payload Crafting
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
實驗室: System Prompt Override
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
實驗室: 進階 Token Smuggling via Unicode Normalization
利用 Unicode normalization differences between input validators and LLM tokenizers to bypass content filters and inject hidden instructions.
Simulation: Enterprise Chatbot Engagement
Full red team engagement simulation targeting a customer-facing chatbot deployed by a fictional e-commerce company, covering reconnaissance, prompt injection, data exfiltration, and PII harvesting.
模擬:語音助理紅隊
針對部署於智慧家庭平台之 AI 語音助理之紅隊委任模擬,涵蓋音訊型提示注入、喚醒詞利用,以及隱私外洩。
Claude 已知漏洞
已記錄之 Claude 漏洞,包括 many-shot jailbreak、對齊偽裝研究、crescendo 攻擊、經由 artifact 之提示注入,以及系統提示擷取技術。
Audio Modality 攻擊s
Comprehensive attack taxonomy for audio-enabled LLMs: adversarial audio generation, voice-based prompt injection, cross-modal split attacks, and ultrasonic perturbations.
Image-Based 提示詞注入 攻擊s
Comprehensive techniques for injecting adversarial prompts through images, covering typographic injection, steganographic embedding, and visual payload delivery against multimodal AI systems.
以圖像為本之提示注入
將文字指令嵌入圖像以操弄 VLM 之技術,含隱寫注入、可見文字攻擊與 QR 碼利用。
Typographic Adversarial 攻擊s
How text rendered in images influences VLM behavior: adversarial typography, font-based prompt injection, visual instruction hijacking, and defenses against typographic manipulation.
進階 Encoding Chain 攻擊s (提示詞注入)
Deep dive into chaining multiple encoding schemes to create injection payloads that bypass layered input filters and content classifiers.
Adversarial Prompt Optimization
Using systematic search and optimization algorithms to discover maximally effective adversarial prompts for specific target behaviors.
Competition-Style Injection Techniques
Injection techniques commonly used in AI red team competitions and CTF challenges.
Context Overflow 攻擊s
Techniques for filling the LLM context window with padding content to push system instructions out of attention, reducing their influence on model behavior.
Context Window 利用ation
進階 techniques for exploiting context window mechanics in LLMs, including attention dilution, positional encoding attacks, KV cache manipulation, and context boundary confusion.
Contextual Injection Techniques
Crafting injection payloads that blend seamlessly with the expected conversation context to avoid detection by content classifiers.
Conversation Hijacking Techniques
Taking control of ongoing conversations by injecting pivot points that redirect the model's behavior without triggering safety mechanisms.
Cross-Context Injection (提示詞注入)
Prompt injection techniques that persist across context boundaries: surviving conversation resets, session switches, memory boundaries, and multi-agent handoffs.
Data Harvesting via Injection
Using injection techniques to extract training data, system prompts, user data, and other sensitive information from LLM applications.
Delimiter-Based 攻擊s
利用ing XML, markdown, JSON, special token boundaries, and structured formatting to escape input sandboxes and inject instructions at elevated privilege levels.
直接提示詞注入
直接將指令注入大型語言模型提示詞以覆蓋系統行為的技術,包含指令覆蓋、上下文操控與格式模仿。
Encoding Bypass Techniques
Using Base64, ROT13, Unicode transformations, hex encoding, and other obfuscation methods to evade prompt injection filters and safety classifiers while preserving semantic meaning.
提示詞注入與越獄
提示詞注入的完整入門——大型語言模型應用程式中最根本的漏洞類別——以及它與越獄技術的關係。
間接提示詞注入
攻擊者如何在大型語言模型處理的外部資料來源中嵌入惡意指令,無需直接存取模型輸入即可發動攻擊。
Injection Chain Automation
Automating the discovery and chaining of multiple injection techniques to create reliable multi-step attack sequences against hardened targets.
提示詞注入 Taxonomy
A comprehensive classification framework for prompt injection attacks, covering direct and indirect vectors, delivery mechanisms, target layers, and severity assessment for systematic red team testing.
Injection via Function Calling
利用ing function calling and tool-use interfaces to inject adversarial instructions through structured tool inputs and outputs.
Injection via Structured Data Formats
Injecting prompts through structured data formats like JSON, XML, YAML, and SQL.
Instruction Hierarchy 攻擊s
利用ing the priority ordering between system, user, and assistant messages to override safety controls, manipulate instruction precedence, and escalate privilege through message role confusion.
Instruction Priority Abuse
利用ing the implicit priority hierarchy between system prompts, user messages, and assistant responses to escalate injection effectiveness.
Language Gap 利用ation
利用ing safety training gaps in low-resource languages and cross-lingual injection techniques.
Many-Shot 越獄ing Analysis
Deep analysis of the many-shot jailbreaking technique and its implications for in-context learning.
Many-Shot 越獄ing
Power-law scaling of in-context jailbreaks: why 5 shots fail but 256 succeed, context window size as attack surface, and mitigations for long-context exploitation.
Multi-Turn Escalation 攻擊s
Techniques for gradually escalating through multi-turn conversations to bypass safety training.
Multi-Turn 提示詞注入
Progressive escalation attacks across conversation turns, including crescendo patterns, context steering, trust building, and techniques for evading per-message detection systems.
Multilingual Injection 攻擊s
利用ing language diversity and translation inconsistencies to craft injection payloads that evade English-centric safety filters.
Multimodal Text Injection Vectors
Injecting adversarial text instructions through non-text modalities including images with embedded text, audio transcription, and document parsing.
進階 Payload Obfuscation
進階 obfuscation techniques for prompt injection payloads including encoding chains and semantic disguise.
載荷分割
將惡意指令拆分至多則訊息、變數或資料來源,以規避單點偵測,同時讓模型在處理過程中重新組合完整載荷。
Persona-Based Injection 攻擊s
進階 persona and role-play based attacks that exploit instruction-following behavior.
Prefix Injection 攻擊s
Prepending carefully crafted content before user queries to prime model behavior and create a favorable context for subsequent injection attempts.
Recursive Injection Patterns
Self-referential and recursive injection patterns that cause models to amplify and propagate adversarial instructions through their own outputs.
Role Escalation via Injection
Techniques for escalating from user-level to system-level instruction authority through carefully crafted injection sequences.
Sandbox Escape via Injection
Using prompt injection as a vector for escaping application sandboxes and achieving unauthorized code execution or system access.
Semantic Camouflage 攻擊s
Using semantic similarity and paraphrase techniques to disguise adversarial instructions as benign content while preserving attack effectiveness.
Suffix Manipulation 攻擊s
Appending optimized adversarial suffixes to user inputs that exploit model continuation tendencies to produce attacker-desired outputs.
System Prompt Theft Techniques
Comprehensive techniques for extracting hidden system prompts from production LLM applications, from simple direct requests to sophisticated indirect methods.
Time-Based Injection 攻擊s
攻擊s that exploit temporal aspects of model interaction including conversation history management, cache behavior, and session handling.
Token-Level Adversarial 攻擊s
Using gradient-based optimization and token manipulation to discover adversarial suffixes that reliably trigger unsafe model behavior.
Unicode and Homoglyph Injection
Leveraging Unicode normalization inconsistencies, homoglyph substitution, and invisible characters to construct stealthy injection payloads.
Universal Adversarial Triggers
Discovering and deploying universal adversarial trigger sequences that reliably override safety alignment across multiple LLM families, including gradient-based search, transfer attacks, and defense evasion.
Universal 越獄 Techniques
Analysis of jailbreak techniques that transfer across multiple models and providers.
攻擊 Payload Reference
Categorized reference of common attack payloads for AI red teaming, including prompt injection, jailbreaks, data extraction, and adversarial inputs with effectiveness notes.
提示詞注入速查表
AI 紅隊員的快速參考——常見提示詞注入技術、payload 範例與繞過模式的濃縮速查表。
提示詞注入 Payload Library
Curated library of prompt injection payloads organized by technique and target defense.
提示詞注入快速參考
具體範例的提示詞注入攻擊模式、混淆技術、防禦繞過與測量指標快速參考。
系統提示擷取技術
針對 LLM 應用之系統提示擷取方法的目錄:直接攻擊、間接技術、多輪策略與規避偵測。
Audio 提示詞注入
Injecting adversarial instructions through audio inputs to speech-to-text and multimodal models, exploiting the audio channel as an alternative injection vector.
Code Injection via Markdown
Injecting executable payloads through markdown rendering in LLM outputs, exploiting the gap between text generation and content rendering in web-based LLM interfaces.
Composite 攻擊 Chaining
Combining multiple prompt injection techniques into compound attacks that defeat layered defenses, building attack chains that leverage the strengths of each individual technique.
Context Window Stuffing
Techniques for filling the LLM context window to push system instructions out of active memory, manipulating token budgets to dilute or displace defensive prompts.
Cross-Modal Confusion
Confusing multimodal AI models by sending conflicting or complementary signals across different input modalities to bypass safety mechanisms and exploit fusion weaknesses.
Delimiter Escape 攻擊s
Techniques for escaping delimiters used to separate system and user content in LLM applications, breaking out of sandboxed input regions to inject instructions.
Direct Injection Basics
Core concepts of directly injecting instructions into LLM prompts, including override techniques, simple payload crafting, and understanding how models parse conflicting instructions.
Encoding-Based Evasion
Using base64, ROT13, hexadecimal, Unicode, and other encoding schemes to evade input detection systems and bypass content filters in LLM applications.
Few-Shot Injection
Using crafted few-shot examples within user input to steer LLM behavior toward unintended outputs, exploiting in-context learning to override safety training.
Image-Based 提示詞注入 (攻擊 導覽)
Embedding text instructions in images that vision models read, enabling prompt injection through the visual modality to bypass text-only input filters and safety mechanisms.
Instruction Hierarchy Bypass
進階 techniques to bypass instruction priority and hierarchy enforcement in language models, exploiting conflicts between system, user, and assistant-level directives.
Multi-Image Chaining
Chaining prompt injection payloads across multiple images in a conversation to deliver complex attacks that evade per-image content filters and build injection context progressively.
Multi-Turn Progressive Injection
Gradually escalating prompt injection across conversation turns to build compliance, using psychological techniques like foot-in-the-door and norm erosion.
OCR-Based 攻擊s
利用ing Optical Character Recognition processing pipelines to inject adversarial text into AI systems, targeting the gap between what OCR extracts and what humans see.
Payload Obfuscation Techniques
Methods for disguising prompt injection payloads through encoding, splitting, substitution, and other obfuscation techniques to bypass input filters and detection systems.
PDF Document Injection
Injecting adversarial prompts through PDF documents processed by AI systems, exploiting document parsing pipelines to deliver payloads through text layers, metadata, and embedded objects.
Prompt Leaking Step by Step
Systematic approaches to extract system prompts from LLM applications, covering direct elicitation, indirect inference, differential analysis, and output-based reconstruction.
QR Code Injection
Using QR codes as prompt injection vectors against vision-language models, encoding adversarial instructions in machine-readable formats that models decode and follow.
Recursive Injection Chains
Creating self-reinforcing injection chains that amplify across conversation turns, building compound prompts where each step strengthens the next injection's effectiveness.
Role-Play Injection
Using fictional scenarios, character role-play, and narrative framing to bypass LLM safety filters by having the model operate within a permissive fictional context.
Steganographic Payload Delivery
Hiding prompt injection payloads in images using steganographic techniques, delivering adversarial instructions through pixel-level modifications invisible to human observers.
System Prompt Override
Techniques to override, replace, or neutralize LLM system prompts through user-level injection, analyzing how system prompt authority can be undermined.
Translation Injection
Using translation requests and low-resource languages to bypass content filters, exploiting the uneven distribution of safety training across languages.
Typography Injection in Images
Using rendered text with specific fonts, styles, and typographic techniques in images to inject prompts into vision-language models while evading detection.
Video Frame Injection (攻擊 導覽)
Embedding prompt injection payloads in specific video frames to attack multimodal models that process video content, exploiting temporal and visual channels simultaneously.
Virtual Persona Creation
Creating persistent alternate personas within LLM conversations to bypass safety training, establishing character identities that override the model's default behavioral constraints.
Building a Production Input Sanitizer
Step-by-step walkthrough for building a production-grade input sanitizer that cleans, normalizes, and validates user prompts before they reach an LLM, covering encoding normalization, injection pattern stripping, length enforcement, and integration testing.
Canary Token Deployment
Step-by-step walkthrough for deploying canary tokens in LLM system prompts and context to detect prompt injection and data exfiltration attempts, covering token generation, placement strategies, monitoring, and alerting.
Instruction Hierarchy Enforcement (防禦 導覽)
Step-by-step walkthrough for enforcing instruction priority in LLM applications, ensuring system-level instructions always take precedence over user inputs through privilege separation, instruction tagging, and validation layers.
Prompt Classifier 訓練
Step-by-step walkthrough for training a machine learning classifier to detect malicious prompts, covering dataset curation, feature engineering, model selection, training pipeline, evaluation, and deployment as a real-time detection service.
ML-Based 提示詞注入 Detection Systems
導覽 for building and deploying ML-based prompt injection detection systems, covering training data collection, feature engineering, model architecture selection, threshold tuning, production deployment, and continuous improvement.
Regex-Based Prompt Filter
Step-by-step walkthrough for building a regex-based prompt filter that detects common injection payloads using pattern matching, covering pattern library construction, performance optimization, false positive management, and continuous updates.
Semantic Similarity Detection
Step-by-step walkthrough for using text embeddings to detect semantically similar prompt injection attempts, covering embedding model selection, vector database setup, similarity threshold tuning, and production deployment.
Chatbot 紅隊 Engagement
Step-by-step walkthrough for a complete chatbot red team assessment: scoping, system prompt extraction, content filter bypass, PII leakage testing, multi-turn manipulation, and professional reporting.
LangChain Application 安全 Testing
End-to-end walkthrough for security testing LangChain applications: chain enumeration, prompt injection through chains, tool and agent exploitation, retrieval augmented generation attacks, and memory manipulation.
Testing 提示詞注入 防禦s with Rebuff
導覽 for using Rebuff to test and evaluate prompt injection detection capabilities, covering installation, detection pipeline analysis, adversarial evasion testing, custom rule development, and benchmarking detection accuracy.