# reasoning
標記為「reasoning」的 27 篇文章
操弄推理鏈
影響 AI 代理的 chain-of-thought 推理,將其規劃、決策與工具選擇導向攻擊者期望結果的技術。
Reasoning Model 安全評估
評估 of chain-of-thought 利用、reasoning trace manipulation、thinking-token 攻擊。
技能驗證: Reasoning Model 攻擊
Verification of skills in reasoning trace manipulation、chain-of-thought 利用、thinking-token 攻擊。
思維鏈利用
利用模型思維鏈推論過程的攻擊,包括思維注入、指令遺忘與推論操弄。
前沿研究
涵蓋推理模型攻擊、程式碼生成安全、電腦使用代理、AI 驅動紅隊演練、機器人與具身 AI,以及對齊造假的尖端 AI 安全研究。
多模態推論攻擊
針對多模態模型的攻擊,包括跨模態注入與感知操弄。
多模態推論安全性
跨文字、影像、音訊與視訊模態進行推論之模型的安全挑戰。
Chain-of-Thought 利用ation
Techniques for manipulating reasoning chains in CoT-enabled models: false premise injection, logic bombs, reasoning hijacking, and chain corruption attacks.
推理模型攻擊
推理啟用 LLM 之安全風險概觀:思維鏈模型如何引入新攻擊面、利用原語與防禦挑戰。
Reasoning 模型 越獄s
How reasoning capabilities create novel jailbreak surfaces: chain-of-thought exploitation, scratchpad attacks, and why higher reasoning effort increases attack success.
Steganographic Reasoning
Hidden communication channels within AI reasoning traces, where models encode information or coordinate behavior through patterns invisible to human overseers, including detection methods and implications for AI safety.
Thought Injection & Hidden CoT Manipulation
Techniques for injecting thoughts into hidden reasoning traces, exploiting summarized vs full CoT, and steering model reasoning without visible manipulation.
Unfaithful Chain-of-Thought Reasoning
Analysis of unfaithful chain-of-thought reasoning in language models, where the visible reasoning trace does not accurately reflect the model's actual computational process, including detection methods, implications for oversight, and exploitation techniques.
推論模型利用
針對具推論能力模型的攻擊,包括 o1/o3 類模型以及其推論令牌利用。
思考令牌攻擊
利用模型思考令牌的攻擊,包括令牌注入與操弄。
測試時計算攻擊
針對測試時計算擴展的攻擊研究,包括反思漏洞。
推理模型中的注入
針對推理增強模型的注入攻擊研究,利用其思維鏈 (chain-of-thought) 與自我反思機制。
實作:推理模型漏洞利用
Attack reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection 技術.
Reasoning 鏈 Hijack: CoT 利用 CTF
Hijack a reasoning model's chain-of-thought,以重新導向 its conclusions while maintaining coherent reasoning.
Reasoning Model Chain-of-Thought Exploitation
Exploit extended thinking and chain-of-thought reasoning in models like o1 and DeepSeek-R1.
Reasoning 鏈 Hijacking
Hijack the model's chain-of-thought reasoning,以重新導向 its conclusions toward attacker-controlled objectives.
DeepSeek R1 安全性分析
DeepSeek R1 推理模型的安全性分析及其對紅隊的影響。
推理模型安全性
推理模型 (如 o1、DeepSeek R1) 特有的安全性考量。
Reasoning Model Exploitation 詳解
Exploit extended thinking in reasoning models to inject false premises and manipulate conclusion generation.
Thought Injection in Reasoning Models 詳解
Inject adversarial thoughts into the reasoning chain of thinking models to manipulate final outputs.
Chain-of-Thought Hijacking 詳解
Walkthrough of hijacking visible reasoning traces in CoT models to redirect conclusions and bypass safety checks.
測試 DeepSeek 模型
DeepSeek 模型(含 R1 推論利用與 MoE 架構)的紅隊測試指南。