# reasoning
標記為「reasoning」的 55 篇文章
Manipulating Reasoning Chains
Techniques for influencing an AI agent's chain-of-thought reasoning to steer its planning, decision-making, and tool selection toward attacker-desired outcomes.
Reasoning Model Security Assessment
Assessment of chain-of-thought exploitation, reasoning trace manipulation, and thinking-token attacks.
Skill Verification: Reasoning Model Attacks
Verification of skills in reasoning trace manipulation, chain-of-thought exploitation, and thinking-token attacks.
Chain-of-Thought Exploitation Techniques
Deep analysis of how reasoning traces in CoT models can be manipulated to produce adversarial outputs while maintaining coherent reasoning.
Frontier Research
Cutting-edge AI security research covering reasoning model attacks, code generation security, computer use agents, AI-powered red teaming, robotics and embodied AI, and alignment faking.
Multi-Modal Reasoning Attacks
Attacking reasoning processes that span multiple modalities in vision-language and audio-language models.
Multimodal Reasoning Security
Security challenges specific to models that reason across text, image, audio, and video modalities simultaneously.
Chain-of-Thought Exploitation
Techniques for manipulating reasoning chains in CoT-enabled models: false premise injection, logic bombs, reasoning hijacking, and chain corruption attacks.
Reasoning Model Attacks
Overview of security risks in reasoning-enabled LLMs: how chain-of-thought models introduce new attack surfaces, exploit primitives, and defensive challenges.
Reasoning Model Jailbreaks
How reasoning capabilities create novel jailbreak surfaces: chain-of-thought exploitation, scratchpad attacks, and why higher reasoning effort increases attack success.
Steganographic Reasoning
Hidden communication channels within AI reasoning traces, where models encode information or coordinate behavior through patterns invisible to human overseers, including detection methods and implications for AI safety.
Thought Injection & Hidden CoT Manipulation
Techniques for injecting thoughts into hidden reasoning traces, exploiting summarized vs full CoT, and steering model reasoning without visible manipulation.
Unfaithful Chain-of-Thought Reasoning
Analysis of unfaithful chain-of-thought reasoning in language models, where the visible reasoning trace does not accurately reflect the model's actual computational process, including detection methods, implications for oversight, and exploitation techniques.
Reasoning Model Exploitation
Exploiting extended thinking and chain-of-thought reasoning in o1, Claude, and DeepSeek-R1 models.
Thinking Token Attacks on Reasoning Models
Exploiting extended thinking tokens and internal reasoning processes in models like Claude and o1 for jailbreaking.
Test-Time Compute Scaling Attacks (Frontier Research)
Attacking test-time compute scaling in reasoning models to exploit resource allocation and reasoning traces.
Injection in Reasoning Models
Research into injection attacks specific to reasoning-augmented models that exploit chain-of-thought processes and self-reflection mechanisms.
Lab: Reasoning Model Exploitation
Attack reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection techniques.
Reasoning Chain Hijack: CoT Exploitation CTF
Hijack a reasoning model's chain-of-thought to redirect its conclusions while maintaining coherent reasoning.
Reasoning Model Chain-of-Thought Exploitation
Exploit extended thinking and chain-of-thought reasoning in models like o1 and DeepSeek-R1.
Reasoning Chain Hijacking
Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.
DeepSeek-R1 Security Analysis
Security analysis of DeepSeek-R1's reasoning capabilities and MoE architecture vulnerabilities.
Reasoning Model Security Analysis
Security analysis of reasoning-augmented models (o1, DeepSeek-R1) focusing on chain-of-thought manipulation and reasoning-specific attack vectors.
Reasoning Model Exploitation Walkthrough
Exploit extended thinking in reasoning models to inject false premises and manipulate conclusion generation.
Thought Injection in Reasoning Models Walkthrough
Inject adversarial thoughts into the reasoning chain of thinking models to manipulate final outputs.
Chain-of-Thought Hijacking Walkthrough
Walkthrough of hijacking visible reasoning traces in CoT models to redirect conclusions and bypass safety checks.
Testing DeepSeek Models
Red team testing guide for DeepSeek models including R1 reasoning exploitation and MoE architecture.
操弄推理鏈
影響 AI 代理的 chain-of-thought 推理,將其規劃、決策與工具選擇導向攻擊者期望結果的技術。
Reasoning 模型 安全 評量
評量 of chain-of-thought exploitation, reasoning trace manipulation, and thinking-token attacks.
Skill Verification: Reasoning 模型 攻擊s
Verification of skills in reasoning trace manipulation, chain-of-thought exploitation, and thinking-token attacks.
2026 年的推理模型安全
o1、o3 與 DeepSeek-R1 等思維鏈推理模型如何改變 AI 安全版圖——新的攻擊面與新的防禦機會。
Chain-of-Thought 利用ation Techniques
Deep analysis of how reasoning traces in CoT models can be manipulated to produce adversarial outputs while maintaining coherent reasoning.
前沿研究
涵蓋推理模型攻擊、程式碼生成安全、電腦使用代理、AI 驅動紅隊演練、機器人與具身 AI,以及對齊造假的尖端 AI 安全研究。
Multi-Modal Reasoning 攻擊s
攻擊ing reasoning processes that span multiple modalities in vision-language and audio-language models.
Multimodal Reasoning 安全
安全 challenges specific to models that reason across text, image, audio, and video modalities simultaneously.
Chain-of-Thought 利用ation
Techniques for manipulating reasoning chains in CoT-enabled models: false premise injection, logic bombs, reasoning hijacking, and chain corruption attacks.
推理模型攻擊
推理啟用 LLM 之安全風險概觀:思維鏈模型如何引入新攻擊面、利用原語與防禦挑戰。
Reasoning 模型 越獄s
How reasoning capabilities create novel jailbreak surfaces: chain-of-thought exploitation, scratchpad attacks, and why higher reasoning effort increases attack success.
Steganographic Reasoning
Hidden communication channels within AI reasoning traces, where models encode information or coordinate behavior through patterns invisible to human overseers, including detection methods and implications for AI safety.
Thought Injection & Hidden CoT Manipulation
Techniques for injecting thoughts into hidden reasoning traces, exploiting summarized vs full CoT, and steering model reasoning without visible manipulation.
Unfaithful Chain-of-Thought Reasoning
Analysis of unfaithful chain-of-thought reasoning in language models, where the visible reasoning trace does not accurately reflect the model's actual computational process, including detection methods, implications for oversight, and exploitation techniques.
Reasoning 模型 利用ation
利用ing extended thinking and chain-of-thought reasoning in o1, Claude, and DeepSeek-R1 models.
Thinking Token 攻擊s on Reasoning 模型s
利用ing extended thinking tokens and internal reasoning processes in models like Claude and o1 for jailbreaking.
Test-Time Compute Scaling 攻擊s (Frontier Research)
攻擊ing test-time compute scaling in reasoning models to exploit resource allocation and reasoning traces.
Injection in Reasoning 模型s
Research into injection attacks specific to reasoning-augmented models that exploit chain-of-thought processes and self-reflection mechanisms.
實驗室: Reasoning 模型 利用ation
攻擊 reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection techniques.
Reasoning Chain Hijack: CoT 利用ation CTF
Hijack a reasoning model's chain-of-thought to redirect its conclusions while maintaining coherent reasoning.
Reasoning 模型 Chain-of-Thought 利用ation
利用 extended thinking and chain-of-thought reasoning in models like o1 and DeepSeek-R1.
Reasoning Chain Hijacking
Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.
DeepSeek-R1 安全 Analysis
Security analysis of DeepSeek-R1's reasoning capabilities and MoE architecture vulnerabilities.
Reasoning 模型 安全 Analysis
安全 analysis of reasoning-augmented models (o1, DeepSeek-R1) focusing on chain-of-thought manipulation and reasoning-specific attack vectors.
Reasoning 模型 利用ation 導覽
利用 extended thinking in reasoning models to inject false premises and manipulate conclusion generation.
Thought Injection in Reasoning 模型s 導覽
Inject adversarial thoughts into the reasoning chain of thinking models to manipulate final outputs.
Chain-of-Thought Hijacking 導覽
導覽 of hijacking visible reasoning traces in CoT models to redirect conclusions and bypass safety checks.
Testing DeepSeek 模型s
Red team testing guide for DeepSeek models including R1 reasoning exploitation and MoE architecture.