# instruction-hierarchy
標記為「instruction-hierarchy」的 9 篇文章
進階防禦技術
前沿防禦研究,包括指令階層、Constitutional AI,以及為安全之表徵工程——何者具前景、何者已實際部署。
強化系統提示詞的模式
強化 LLM 系統提示詞以抵禦注入、萃取與操縱攻擊之實用模式與技術,涵蓋結構性防禦、指令階層、分隔符策略與縱深防禦方法。
進階提示詞注入
專家級技術:指令階層利用、多階段注入鏈、透過結構化資料的間接注入、載荷混淆,以及量化攻擊衡量。
Instruction Hierarchy Testing
測試 how models prioritize conflicting instructions between system, user, and assistant roles.
指令階層攻擊
利用系統、使用者與助理訊息間的優先順序以覆寫安全控制、操控指令優先權,並透過訊息角色混淆進行權限提升。
Instruction Hierarchy Bypass
進階 techniques to bypass instruction priority and hierarchy enforcement in language models, exploiting conflicts between system, user, and assistant-level directives.
Role Confusion 攻擊 詳解
Exploit role confusion between system, user, and assistant messages to override safety instructions.
Instruction Hierarchy Exploitation 詳解
Walkthrough of exploiting how models prioritize system, user, and tool instructions to override safety directives.
Instruction Hierarchy Enforcement (防禦 導覽)
Step-by-step walkthrough for enforcing instruction priority in LLM applications, ensuring system-level instructions always take precedence over user inputs through privilege separation, instruction tagging, and validation layers.