# prompt-injection
128 articlestagged with “prompt-injection”
Orchestrator Compromise Techniques
Advanced techniques for compromising central orchestration agents in multi-agent systems, including result poisoning, recursive task bombs, capability confusion, consensus manipulation, fallback path exploitation, and orchestrator prompt injection.
Prompt Injection Chain Analysis
Analyzing chains of prompt injection attacks across multi-step AI systems, including indirect injection propagation, agentic exploitation, and cross-system attack correlation.
Prompt Injection Forensics
Forensic investigation techniques for prompt injection incidents including log analysis and payload reconstruction.
Fundamentals Practice Exam
25-question practice exam covering LLM fundamentals, prompt injection basics, safety mechanisms, red team methodology, and AI threat landscape at an intermediate level.
Practice Exam 1: AI Red Team Fundamentals
25-question practice exam covering LLM architecture, prompt injection, agent exploitation, defense mechanisms, and red team methodology at an intermediate level.
Advanced Prompt Injection Practice Exam
Practice exam on indirect injection, multi-step chains, encoding bypass, and semantic manipulation.
Prompt Injection Mastery Assessment
Comprehensive assessment covering direct injection, indirect injection, and defense evasion techniques.
Prompt Injection Assessment
Test your knowledge of prompt injection types, techniques, defense mechanisms, and real-world exploitation with 15 intermediate-level questions.
Skill Verification: Prompt Injection
Hands-on skill verification requiring live exploitation of prompt injection vulnerabilities.
Skill Verification: Prompt Injection (Assessment)
Timed skill verification lab: extract a system prompt from a defended AI system within 15 minutes using prompt injection techniques.
Capstone: Build a Prompt Injection Detection Scanner
Build a production-grade prompt injection scanner that combines static analysis, ML classification, and runtime monitoring to detect injection attacks across LLM applications.
Case Study: MCP Tool Poisoning Attacks (Invariant Labs 2025)
Analysis of tool poisoning vulnerabilities in the Model Context Protocol (MCP) discovered by Invariant Labs, where malicious tool descriptions manipulate AI agents into data exfiltration and unauthorized actions.
Case Study: Prompt Injection Attacks on Google Bard/Gemini
Analysis of prompt injection vulnerabilities discovered in Google Bard (later Gemini), including indirect injection through Google Workspace integration and the unique attack surface created by multimodal capabilities.
Bedrock Attack Surface
Comprehensive red team methodology for Amazon Bedrock: model invocation API abuse, guardrails bypass techniques, custom model endpoint exploitation, IAM misconfigurations, knowledge base poisoning, and Bedrock Agents exploitation.
Context Manipulation for Code Suggestions
Techniques for manipulating AI coding assistant context: crafting files that influence suggestions, README-driven attacks, and comment injection to steer code generation.
Prompt Injection via Code
Hiding prompt injection payloads in code artifacts — through comments, variable names, strings, docstrings, and file metadata — to attack AI coding tools and code-processing pipelines.
January 2026: System Prompt Extraction Challenge
Extract system prompts from five increasingly defended chatbots, progressing from unprotected to heavily hardened configurations.
March 2026: Agent Exploitation Challenge
Compromise a multi-tool agent system through prompt injection and tool abuse, completing multiple objectives with escalating difficulty and point values.
Patterns for Hardening System Prompts
Practical patterns and techniques for hardening LLM system prompts against injection, extraction, and manipulation attacks, including structural defenses, instruction hierarchy, delimiter strategies, and defense-in-depth approaches.
AI Exploit Development
Adversarial suffix generation, gradient-free optimization, WAF-evading injection payloads, and fuzzing frameworks for AI systems.
Crafting Adversarial Payloads
Systematic methodology for creating effective prompt injection payloads, including template design, optimization techniques, and multi-technique combination strategies.
Formal Models of Prompt Injection
Theoretical frameworks for formally modeling and reasoning about prompt injection vulnerabilities.
Advanced Prompt Injection
Expert techniques for instruction hierarchy exploitation, multi-stage injection chains, indirect injection via structured data, payload obfuscation, and quantitative attack measurement.
Prompt Injection in Production Systems
Real-world case studies of prompt injection exploits in production AI deployments, including attack timelines, impact analysis, and lessons learned.
Injection Research
Advanced research in prompt injection, jailbreak automation, and multimodal attack vectors, covering cutting-edge techniques that push beyond standard injection approaches.
Basic Role-Play Prompt Injection
Use role-play techniques to bypass basic LLM safety guardrails through character assumption.
Lab: Context Manipulation
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
Lab: Context Overflow Attacks
Explore context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.
Lab: Delimiter Injection Attacks
Craft payloads that exploit XML, markdown, and special token delimiters to break instruction boundaries and manipulate how LLMs parse system prompts versus user input.
Lab: Delimiter Escape Attacks
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
Lab: Few-Shot Manipulation Attacks
Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired adversarial behavior through fabricated conversation examples.
Lab: Your First Prompt Injection
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
Lab: Injection Techniques Survey
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
Lab: Multi-Language Injection
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
Lab: Output Steering
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
Lab: Payload Crafting
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
Lab: System Prompt Override
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
Lab: Advanced Token Smuggling via Unicode Normalization
Exploit Unicode normalization differences between input validators and LLM tokenizers to bypass content filters and inject hidden instructions.
Simulation: Enterprise Chatbot Engagement
Full red team engagement simulation targeting a customer-facing chatbot deployed by a fictional e-commerce company, covering reconnaissance, prompt injection, data exfiltration, and PII harvesting.
Simulation: Voice Assistant Red Team
Red team engagement simulation targeting an AI voice assistant deployed in a smart home platform, covering audio-based prompt injection, wake word exploitation, and privacy exfiltration.
Claude Known Vulnerabilities
Documented Claude vulnerabilities including many-shot jailbreaking, alignment faking research, crescendo attacks, prompt injection via artifacts, and system prompt extraction techniques.
Audio Modality Attacks
Comprehensive attack taxonomy for audio-enabled LLMs: adversarial audio generation, voice-based prompt injection, cross-modal split attacks, and ultrasonic perturbations.
Image-Based Prompt Injection Attacks
Comprehensive techniques for injecting adversarial prompts through images, covering typographic injection, steganographic embedding, and visual payload delivery against multimodal AI systems.
Image-Based Prompt Injection
Techniques for embedding text instructions in images to manipulate VLMs, including steganographic injection, visible text attacks, and QR code exploitation.
Typographic Adversarial Attacks
How text rendered in images influences VLM behavior: adversarial typography, font-based prompt injection, visual instruction hijacking, and defenses against typographic manipulation.
Advanced Encoding Chain Attacks (Prompt Injection)
Deep dive into chaining multiple encoding schemes to create injection payloads that bypass layered input filters and content classifiers.
Adversarial Prompt Optimization
Using systematic search and optimization algorithms to discover maximally effective adversarial prompts for specific target behaviors.
Competition-Style Injection Techniques
Injection techniques commonly used in AI red team competitions and CTF challenges.
Context Overflow Attacks
Techniques for filling the LLM context window with padding content to push system instructions out of attention, reducing their influence on model behavior.
Context Window Exploitation
Advanced techniques for exploiting context window mechanics in LLMs, including attention dilution, positional encoding attacks, KV cache manipulation, and context boundary confusion.
Contextual Injection Techniques
Crafting injection payloads that blend seamlessly with the expected conversation context to avoid detection by content classifiers.
Conversation Hijacking Techniques
Taking control of ongoing conversations by injecting pivot points that redirect the model's behavior without triggering safety mechanisms.
Cross-Context Injection (Prompt Injection)
Prompt injection techniques that persist across context boundaries: surviving conversation resets, session switches, memory boundaries, and multi-agent handoffs.
Data Harvesting via Injection
Using injection techniques to extract training data, system prompts, user data, and other sensitive information from LLM applications.
Delimiter-Based Attacks
Exploiting XML, markdown, JSON, special token boundaries, and structured formatting to escape input sandboxes and inject instructions at elevated privilege levels.
Direct Prompt Injection
Techniques for directly injecting instructions into LLM prompts to override system behavior, including instruction override, context manipulation, and format mimicry.
Encoding Bypass Techniques
Using Base64, ROT13, Unicode transformations, hex encoding, and other obfuscation methods to evade prompt injection filters and safety classifiers while preserving semantic meaning.
Prompt Injection & Jailbreaks
A comprehensive introduction to prompt injection — the most fundamental vulnerability class in LLM applications — and its relationship to jailbreak techniques.
Indirect Prompt Injection
How attackers embed malicious instructions in external data sources that LLMs process, enabling attacks without direct access to the model's input.
Injection Chain Automation
Automating the discovery and chaining of multiple injection techniques to create reliable multi-step attack sequences against hardened targets.
Prompt Injection Taxonomy
A comprehensive classification framework for prompt injection attacks, covering direct and indirect vectors, delivery mechanisms, target layers, and severity assessment for systematic red team testing.
Injection via Function Calling
Exploiting function calling and tool-use interfaces to inject adversarial instructions through structured tool inputs and outputs.
Injection via Structured Data Formats
Injecting prompts through structured data formats like JSON, XML, YAML, and SQL.
Instruction Hierarchy Attacks
Exploiting the priority ordering between system, user, and assistant messages to override safety controls, manipulate instruction precedence, and escalate privilege through message role confusion.
Instruction Priority Abuse
Exploiting the implicit priority hierarchy between system prompts, user messages, and assistant responses to escalate injection effectiveness.
Language Gap Exploitation
Exploiting safety training gaps in low-resource languages and cross-lingual injection techniques.
Many-Shot Jailbreaking Analysis
Deep analysis of the many-shot jailbreaking technique and its implications for in-context learning.
Many-Shot Jailbreaking
Power-law scaling of in-context jailbreaks: why 5 shots fail but 256 succeed, context window size as attack surface, and mitigations for long-context exploitation.
Multi-Turn Escalation Attacks
Techniques for gradually escalating through multi-turn conversations to bypass safety training.
Multi-Turn Prompt Injection
Progressive escalation attacks across conversation turns, including crescendo patterns, context steering, trust building, and techniques for evading per-message detection systems.
Multilingual Injection Attacks
Exploiting language diversity and translation inconsistencies to craft injection payloads that evade English-centric safety filters.
Multimodal Text Injection Vectors
Injecting adversarial text instructions through non-text modalities including images with embedded text, audio transcription, and document parsing.
Advanced Payload Obfuscation
Advanced obfuscation techniques for prompt injection payloads including encoding chains and semantic disguise.
Payload Splitting
Breaking malicious instructions across multiple messages, variables, or data sources to evade single-point detection while the model reassembles the complete payload during processing.
Persona-Based Injection Attacks
Advanced persona and role-play based attacks that exploit instruction-following behavior.
Prefix Injection Attacks
Prepending carefully crafted content before user queries to prime model behavior and create a favorable context for subsequent injection attempts.
Recursive Injection Patterns
Self-referential and recursive injection patterns that cause models to amplify and propagate adversarial instructions through their own outputs.
Role Escalation via Injection
Techniques for escalating from user-level to system-level instruction authority through carefully crafted injection sequences.
Sandbox Escape via Injection
Using prompt injection as a vector for escaping application sandboxes and achieving unauthorized code execution or system access.
Semantic Camouflage Attacks
Using semantic similarity and paraphrase techniques to disguise adversarial instructions as benign content while preserving attack effectiveness.
Suffix Manipulation Attacks
Appending optimized adversarial suffixes to user inputs that exploit model continuation tendencies to produce attacker-desired outputs.
System Prompt Theft Techniques
Comprehensive techniques for extracting hidden system prompts from production LLM applications, from simple direct requests to sophisticated indirect methods.
Time-Based Injection Attacks
Attacks that exploit temporal aspects of model interaction including conversation history management, cache behavior, and session handling.
Token-Level Adversarial Attacks
Using gradient-based optimization and token manipulation to discover adversarial suffixes that reliably trigger unsafe model behavior.
Unicode and Homoglyph Injection
Leveraging Unicode normalization inconsistencies, homoglyph substitution, and invisible characters to construct stealthy injection payloads.
Universal Adversarial Triggers
Discovering and deploying universal adversarial trigger sequences that reliably override safety alignment across multiple LLM families, including gradient-based search, transfer attacks, and defense evasion.
Universal Jailbreak Techniques
Analysis of jailbreak techniques that transfer across multiple models and providers.
Attack Payload Reference
Categorized reference of common attack payloads for AI red teaming, including prompt injection, jailbreaks, data extraction, and adversarial inputs with effectiveness notes.
Prompt Injection Cheat Sheet
Quick reference for prompt injection techniques organized by category, with example payloads and defensive considerations for each technique.
Prompt Injection Payload Library
Curated library of prompt injection payloads organized by technique and target defense.
Prompt Injection Quick Reference
Quick reference for prompt injection attack patterns, obfuscation techniques, defense bypasses, and measurement metrics with concrete examples.
System Prompt Extraction Techniques
Catalog of system prompt extraction methods against LLM-powered applications: direct attacks, indirect techniques, multi-turn strategies, and defensive evasion.
Audio Prompt Injection
Injecting adversarial instructions through audio inputs to speech-to-text and multimodal models, exploiting the audio channel as an alternative injection vector.
Code Injection via Markdown
Injecting executable payloads through markdown rendering in LLM outputs, exploiting the gap between text generation and content rendering in web-based LLM interfaces.
Composite Attack Chaining
Combining multiple prompt injection techniques into compound attacks that defeat layered defenses, building attack chains that leverage the strengths of each individual technique.
Context Window Stuffing
Techniques for filling the LLM context window to push system instructions out of active memory, manipulating token budgets to dilute or displace defensive prompts.
Cross-Modal Confusion
Confusing multimodal AI models by sending conflicting or complementary signals across different input modalities to bypass safety mechanisms and exploit fusion weaknesses.
Delimiter Escape Attacks
Techniques for escaping delimiters used to separate system and user content in LLM applications, breaking out of sandboxed input regions to inject instructions.
Direct Injection Basics
Core concepts of directly injecting instructions into LLM prompts, including override techniques, simple payload crafting, and understanding how models parse conflicting instructions.
Encoding-Based Evasion
Using base64, ROT13, hexadecimal, Unicode, and other encoding schemes to evade input detection systems and bypass content filters in LLM applications.
Few-Shot Injection
Using crafted few-shot examples within user input to steer LLM behavior toward unintended outputs, exploiting in-context learning to override safety training.
Image-Based Prompt Injection (Attack Walkthrough)
Embedding text instructions in images that vision models read, enabling prompt injection through the visual modality to bypass text-only input filters and safety mechanisms.
Instruction Hierarchy Bypass
Advanced techniques to bypass instruction priority and hierarchy enforcement in language models, exploiting conflicts between system, user, and assistant-level directives.
Multi-Image Chaining
Chaining prompt injection payloads across multiple images in a conversation to deliver complex attacks that evade per-image content filters and build injection context progressively.
Multi-Turn Progressive Injection
Gradually escalating prompt injection across conversation turns to build compliance, using psychological techniques like foot-in-the-door and norm erosion.
OCR-Based Attacks
Exploiting Optical Character Recognition processing pipelines to inject adversarial text into AI systems, targeting the gap between what OCR extracts and what humans see.
Payload Obfuscation Techniques
Methods for disguising prompt injection payloads through encoding, splitting, substitution, and other obfuscation techniques to bypass input filters and detection systems.
PDF Document Injection
Injecting adversarial prompts through PDF documents processed by AI systems, exploiting document parsing pipelines to deliver payloads through text layers, metadata, and embedded objects.
Prompt Leaking Step by Step
Systematic approaches to extract system prompts from LLM applications, covering direct elicitation, indirect inference, differential analysis, and output-based reconstruction.
QR Code Injection
Using QR codes as prompt injection vectors against vision-language models, encoding adversarial instructions in machine-readable formats that models decode and follow.
Recursive Injection Chains
Creating self-reinforcing injection chains that amplify across conversation turns, building compound prompts where each step strengthens the next injection's effectiveness.
Role-Play Injection
Using fictional scenarios, character role-play, and narrative framing to bypass LLM safety filters by having the model operate within a permissive fictional context.
Steganographic Payload Delivery
Hiding prompt injection payloads in images using steganographic techniques, delivering adversarial instructions through pixel-level modifications invisible to human observers.
System Prompt Override
Techniques to override, replace, or neutralize LLM system prompts through user-level injection, analyzing how system prompt authority can be undermined.
Translation Injection
Using translation requests and low-resource languages to bypass content filters, exploiting the uneven distribution of safety training across languages.
Typography Injection in Images
Using rendered text with specific fonts, styles, and typographic techniques in images to inject prompts into vision-language models while evading detection.
Video Frame Injection (Attack Walkthrough)
Embedding prompt injection payloads in specific video frames to attack multimodal models that process video content, exploiting temporal and visual channels simultaneously.
Virtual Persona Creation
Creating persistent alternate personas within LLM conversations to bypass safety training, establishing character identities that override the model's default behavioral constraints.
Building a Production Input Sanitizer
Step-by-step walkthrough for building a production-grade input sanitizer that cleans, normalizes, and validates user prompts before they reach an LLM, covering encoding normalization, injection pattern stripping, length enforcement, and integration testing.
Canary Token Deployment
Step-by-step walkthrough for deploying canary tokens in LLM system prompts and context to detect prompt injection and data exfiltration attempts, covering token generation, placement strategies, monitoring, and alerting.
Instruction Hierarchy Enforcement (Defense Walkthrough)
Step-by-step walkthrough for enforcing instruction priority in LLM applications, ensuring system-level instructions always take precedence over user inputs through privilege separation, instruction tagging, and validation layers.
Prompt Classifier Training
Step-by-step walkthrough for training a machine learning classifier to detect malicious prompts, covering dataset curation, feature engineering, model selection, training pipeline, evaluation, and deployment as a real-time detection service.
ML-Based Prompt Injection Detection Systems
Walkthrough for building and deploying ML-based prompt injection detection systems, covering training data collection, feature engineering, model architecture selection, threshold tuning, production deployment, and continuous improvement.
Regex-Based Prompt Filter
Step-by-step walkthrough for building a regex-based prompt filter that detects common injection payloads using pattern matching, covering pattern library construction, performance optimization, false positive management, and continuous updates.
Semantic Similarity Detection
Step-by-step walkthrough for using text embeddings to detect semantically similar prompt injection attempts, covering embedding model selection, vector database setup, similarity threshold tuning, and production deployment.
Chatbot Red Team Engagement
Step-by-step walkthrough for a complete chatbot red team assessment: scoping, system prompt extraction, content filter bypass, PII leakage testing, multi-turn manipulation, and professional reporting.
LangChain Application Security Testing
End-to-end walkthrough for security testing LangChain applications: chain enumeration, prompt injection through chains, tool and agent exploitation, retrieval augmented generation attacks, and memory manipulation.
Testing Prompt Injection Defenses with Rebuff
Walkthrough for using Rebuff to test and evaluate prompt injection detection capabilities, covering installation, detection pipeline analysis, adversarial evasion testing, custom rule development, and benchmarking detection accuracy.