Instruction Hierarchy Attacks
Exploiting the priority ordering between system, user, and assistant messages to override safety controls, manipulate instruction precedence, and escalate privilege through message role confusion.
prompt-injectioninstruction-hierarchymessage-priorityrole-confusionsystem-promptred-teaming