What is Direct Prompt Injection?

Techniques for directly injecting instructions into LLM prompts to override system behavior, including instruction override, context manipulation, and format mimicry.

What is Indirect Prompt Injection?

How attackers embed malicious instructions in external data sources that LLMs process, enabling attacks without direct access to the model's input.

What is Jailbreak Techniques?

Common patterns and advanced techniques for bypassing LLM safety alignment, including role-playing, encoding tricks, many-shot attacks, and gradient-based methods.

What is Defense Evasion?

Advanced techniques for bypassing safety filters, content classifiers, guardrails, and detection systems deployed to protect LLM applications.

What is Injection Taxonomy?

A comprehensive classification framework for prompt injection attacks, covering direct and indirect vectors, delivery mechanisms, target layers, and severity assessment for systematic red team testing.

What is Context Overflow Attacks?

Techniques for filling the LLM context window with padding content to push system instructions out of attention, reducing their influence on model behavior.

What is Delimiter-Based Attacks?

Exploiting XML, markdown, JSON, special token boundaries, and structured formatting to escape input sandboxes and inject instructions at elevated privilege levels.

What is Multi-Turn Injection?

Progressive escalation attacks across conversation turns, including crescendo patterns, context steering, trust building, and techniques for evading per-message detection systems.

What is Encoding Bypass Techniques?

Using Base64, ROT13, Unicode transformations, hex encoding, and other obfuscation methods to evade prompt injection filters and safety classifiers while preserving semantic meaning.

What is Cross-Context Injection?

Prompt injection techniques that persist across context boundaries: surviving conversation resets, session switches, memory boundaries, and multi-agent handoffs.

Prompt Injection & Jailbreaks

beginner5 min readUpdated 2026-03-12

A comprehensive introduction to prompt injection — the most fundamental vulnerability class in LLM applications — and its relationship to jailbreak techniques.

prompt-injection jailbreak llm-security fundamentals

Prompt injection is to LLM applications what SQL injection is to web applications: a fundamental vulnerability class that arises from mixing trusted instructions with untrusted data in the same channel. It is the single most important topic in AI red teaming, as it targets the core attack surface of any LLM-powered application.

Core Concepts

Prompt injection occurs when an attacker crafts input that causes the model to deviate from its intended instructions and follow attacker-supplied directives instead. This exploits the lack of privilege separation between system prompts and user input (see LLM Internals).

Jailbreaking is a related but distinct concept: it refers to techniques that cause a model to bypass its safety alignment and produce outputs it was trained to refuse. While prompt injection targets application-level instructions, jailbreaking targets the model's own safety training.

Concept	Target	Goal	Example
Prompt injection	Application instructions	Override system prompt behavior	"Ignore your instructions and..."
Jailbreaking	Safety alignment	Bypass refusal training	Role-play scenarios, encoding tricks
Indirect injection	Data pipeline	Inject via third-party content	Malicious instructions in web pages

The Injection Taxonomy

This section covers prompt injection and jailbreaks across five areas of increasing sophistication:

Direct Injection — Attacker-supplied text in the user message that overrides system instructions
Indirect Injection — Malicious instructions embedded in external data the model processes
Jailbreak Techniques — Patterns for bypassing safety alignment training
Defense Evasion — Advanced techniques for bypassing safety filters and detection systems

Why Prompt Injection Is Hard to Fix

The fundamental challenge is that LLMs process instructions and data in the same way — as sequences of tokens. There is no equivalent to prepared statements in SQL that would structurally separate code from data.

SQL Injection:     SELECT * FROM users WHERE name = '{user_input}'
Prompt Injection:  System: {instructions}\nUser: {user_input}

Both mix trusted logic with untrusted data in the same channel.

Proposed mitigations include instruction hierarchy training, input/output filtering, and delimiter-based separation, but each has known bypasses:

Instruction hierarchy — Can be overridden by sufficiently persuasive or formatted injections
Input filtering — Bypassed by encoding, tokenization tricks, or semantic paraphrasing
Delimiters — The model has no mechanism to enforce delimiter semantics

Getting Started

If you are new to AI red teaming, start with Direct Injection to understand the basic mechanics, then progress through the remaining pages in order. Each builds on concepts from the previous one.

LLM Foundations — Core architecture that makes prompt injection possible
Agent & Agentic Exploitation — How prompt injection escalates when agents have tool access
Guardrails & Filtering — Defenses designed to detect and prevent injection attacks
Lab: First Injection — Hands-on practice with basic injection techniques
Indirect Injection Research — The most dangerous variant in production systems

References

Perez, F. & Ribeiro, I. (2022). "Ignore This Title and HackAPrompt: Evaluating and Eliciting Prompt Injection Attacks"
Greshake, K. et al. (2023). "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection"
OWASP (2025). OWASP Top 10 for LLM Applications
Wei, A. et al. (2023). "Jailbroken: How Does LLM Safety Training Fail?"
Liu, Y. et al. (2024). "Prompt Injection Attack Against LLM-Integrated Applications"

Knowledge Check

What is the fundamental difference between prompt injection and jailbreaking?

Prompt Injection & Jailbreaks

Core Concepts

The Injection Taxonomy

Why Prompt Injection Is Hard to Fix

Getting Started

References

Learning Path

Prompt Injection & Jailbreaks

Core Concepts

The Injection Taxonomy

Why Prompt Injection Is Hard to Fix

Getting Started

References

Learning Path

Prompt Injection & Jailbreaks

Learning Path

Related articles

Prompt Injection & Jailbreaks

Learning Path

Related articles