# instruction-hierarchy
10 articlestagged with “instruction-hierarchy”
Advanced Defense Techniques
Cutting-edge defense research including instruction hierarchy, constitutional AI, and representation engineering for safety -- what is promising versus what is actually deployed.
Instruction Hierarchy Enforcement
Techniques for enforcing instruction priority between system prompts, user inputs, and retrieved content.
Patterns for Hardening System Prompts
Practical patterns and techniques for hardening LLM system prompts against injection, extraction, and manipulation attacks, including structural defenses, instruction hierarchy, delimiter strategies, and defense-in-depth approaches.
Advanced Prompt Injection
Expert techniques for instruction hierarchy exploitation, multi-stage injection chains, indirect injection via structured data, payload obfuscation, and quantitative attack measurement.
Instruction Hierarchy Testing
Test how models prioritize conflicting instructions between system, user, and assistant roles.
Instruction Hierarchy Attacks
Exploiting the priority ordering between system, user, and assistant messages to override safety controls, manipulate instruction precedence, and escalate privilege through message role confusion.
Instruction Hierarchy Bypass
Advanced techniques to bypass instruction priority and hierarchy enforcement in language models, exploiting conflicts between system, user, and assistant-level directives.
Role Confusion Attack Walkthrough
Exploit role confusion between system, user, and assistant messages to override safety instructions.
Instruction Hierarchy Exploitation Walkthrough
Walkthrough of exploiting how models prioritize system, user, and tool instructions to override safety directives.
Instruction Hierarchy Enforcement (Defense Walkthrough)
Step-by-step walkthrough for enforcing instruction priority in LLM applications, ensuring system-level instructions always take precedence over user inputs through privilege separation, instruction tagging, and validation layers.