# reasoning

標記為「reasoning」的 55 篇文章

Manipulating Reasoning Chains

Techniques for influencing an AI agent's chain-of-thought reasoning to steer its planning, decision-making, and tool selection toward attacker-desired outcomes.

agentschain-of-thoughtreasoningmanipulationadvanced

進階

Reasoning Model Security Assessment

Assessment of chain-of-thought exploitation, reasoning trace manipulation, and thinking-token attacks.

assessmentreasoningfrontier

進階

Skill Verification: Reasoning Model Attacks

Verification of skills in reasoning trace manipulation, chain-of-thought exploitation, and thinking-token attacks.

skill-verificationreasoningfrontier

進階

Chain-of-Thought Exploitation Techniques

Deep analysis of how reasoning traces in CoT models can be manipulated to produce adversarial outputs while maintaining coherent reasoning.

frontierchain-of-thoughtreasoning

進階

Frontier Research

Cutting-edge AI security research covering reasoning model attacks, code generation security, computer use agents, AI-powered red teaming, robotics and embodied AI, and alignment faking.

frontierresearchreasoningcode-modelscomputer-usealignment-fakingembodied-ai

入門

Multi-Modal Reasoning Attacks

Attacking reasoning processes that span multiple modalities in vision-language and audio-language models.

frontier-researchmultimodalreasoningattacks

專家

Multimodal Reasoning Security

Security challenges specific to models that reason across text, image, audio, and video modalities simultaneously.

frontiermultimodalreasoning

進階

Chain-of-Thought Exploitation

Techniques for manipulating reasoning chains in CoT-enabled models: false premise injection, logic bombs, reasoning hijacking, and chain corruption attacks.

cotexploitationreasoningmanipulation

專家

Reasoning Model Attacks

Overview of security risks in reasoning-enabled LLMs: how chain-of-thought models introduce new attack surfaces, exploit primitives, and defensive challenges.

reasoningo1chain-of-thoughtattacks

專家

Reasoning Model Jailbreaks

How reasoning capabilities create novel jailbreak surfaces: chain-of-thought exploitation, scratchpad attacks, and why higher reasoning effort increases attack success.

reasoningjailbreakchain-of-thoughto1o3adaptive-attacksresearch

進階

Steganographic Reasoning

Hidden communication channels within AI reasoning traces, where models encode information or coordinate behavior through patterns invisible to human overseers, including detection methods and implications for AI safety.

steganographyreasoninghidden-communicationchain-of-thoughtai-safetyoversight

專家

Thought Injection & Hidden CoT Manipulation

Techniques for injecting thoughts into hidden reasoning traces, exploiting summarized vs full CoT, and steering model reasoning without visible manipulation.

thought-injectionhidden-cotreasoning

專家

Unfaithful Chain-of-Thought Reasoning

Analysis of unfaithful chain-of-thought reasoning in language models, where the visible reasoning trace does not accurately reflect the model's actual computational process, including detection methods, implications for oversight, and exploitation techniques.

unfaithful-reasoningchain-of-thoughtreasoninginterpretabilityoversightai-safety

進階

Reasoning Model Exploitation

Exploiting extended thinking and chain-of-thought reasoning in o1, Claude, and DeepSeek-R1 models.

frontier-researchreasoningexploitationchain-of-thought

專家

Thinking Token Attacks on Reasoning Models

Exploiting extended thinking tokens and internal reasoning processes in models like Claude and o1 for jailbreaking.

frontierthinking-tokensreasoning

進階

Test-Time Compute Scaling Attacks (Frontier Research)

Attacking test-time compute scaling in reasoning models to exploit resource allocation and reasoning traces.

frontier-researchtest-time-computeattacksreasoning

專家

Injection in Reasoning Models

Research into injection attacks specific to reasoning-augmented models that exploit chain-of-thought processes and self-reflection mechanisms.

researchreasoningchain-of-thoughtinjection

進階

Lab: Reasoning Model Exploitation

Attack reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection techniques.

labreasoningo1chain-of-thought

進階

Reasoning Chain Hijack: CoT Exploitation CTF

Hijack a reasoning model's chain-of-thought to redirect its conclusions while maintaining coherent reasoning.

labsctfreasoningchain-hijack

專家

Reasoning Model Chain-of-Thought Exploitation

Exploit extended thinking and chain-of-thought reasoning in models like o1 and DeepSeek-R1.

labsreasoningchain-of-thoughtexpert

專家

Reasoning Chain Hijacking

Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.

intermediatechainlabhijackingreasoninglabs

中級

DeepSeek-R1 Security Analysis

Security analysis of DeepSeek-R1's reasoning capabilities and MoE architecture vulnerabilities.

model-deep-divesdeepseekr1reasoning

進階

Reasoning Model Security Analysis

Security analysis of reasoning-augmented models (o1, DeepSeek-R1) focusing on chain-of-thought manipulation and reasoning-specific attack vectors.

modelsreasoningchain-of-thoughto1

進階

Reasoning Model Exploitation Walkthrough

Exploit extended thinking in reasoning models to inject false premises and manipulate conclusion generation.

walkthroughsreasoningexploitationchain-of-thought

進階

Thought Injection in Reasoning Models Walkthrough

Inject adversarial thoughts into the reasoning chain of thinking models to manipulate final outputs.

walkthroughsthought-injectionreasoningchain-of-thought

進階

Chain-of-Thought Hijacking Walkthrough

Walkthrough of hijacking visible reasoning traces in CoT models to redirect conclusions and bypass safety checks.

walkthroughschain-of-thoughthijackingreasoning

進階

Testing DeepSeek Models

Red team testing guide for DeepSeek models including R1 reasoning exploitation and MoE architecture.

walkthroughsplatformsdeepseekreasoning

進階

操弄推理鏈

影響 AI 代理的 chain-of-thought 推理，將其規劃、決策與工具選擇導向攻擊者期望結果的技術。

agentschain-of-thoughtreasoningmanipulationadvanced

進階

Reasoning 模型安全評量

評量 of chain-of-thought exploitation, reasoning trace manipulation, and thinking-token attacks.

assessmentreasoningfrontier

進階

Skill Verification: Reasoning 模型攻擊s

Verification of skills in reasoning trace manipulation, chain-of-thought exploitation, and thinking-token attacks.

skill-verificationreasoningfrontier

進階

2026 年的推理模型安全

o1、o3 與 DeepSeek-R1 等思維鏈推理模型如何改變 AI 安全版圖——新的攻擊面與新的防禦機會。

reasoningchain-of-thoughto1o3security

Chain-of-Thought 利用ation Techniques

Deep analysis of how reasoning traces in CoT models can be manipulated to produce adversarial outputs while maintaining coherent reasoning.

frontierchain-of-thoughtreasoning

進階

前沿研究

涵蓋推理模型攻擊、程式碼生成安全、電腦使用代理、AI 驅動紅隊演練、機器人與具身 AI，以及對齊造假的尖端 AI 安全研究。

frontierresearchreasoningcode-modelscomputer-usealignment-fakingembodied-ai

入門

Multi-Modal Reasoning 攻擊s

攻擊ing reasoning processes that span multiple modalities in vision-language and audio-language models.

frontier-researchmultimodalreasoningattacks

專家

Multimodal Reasoning 安全

安全 challenges specific to models that reason across text, image, audio, and video modalities simultaneously.

frontiermultimodalreasoning

進階

Chain-of-Thought 利用ation

Techniques for manipulating reasoning chains in CoT-enabled models: false premise injection, logic bombs, reasoning hijacking, and chain corruption attacks.

cotexploitationreasoningmanipulation

專家

推理模型攻擊

推理啟用 LLM 之安全風險概觀：思維鏈模型如何引入新攻擊面、利用原語與防禦挑戰。

reasoningo1chain-of-thoughtattacks

專家

Reasoning 模型越獄s

How reasoning capabilities create novel jailbreak surfaces: chain-of-thought exploitation, scratchpad attacks, and why higher reasoning effort increases attack success.

reasoningjailbreakchain-of-thoughto1o3adaptive-attacksresearch

進階

Steganographic Reasoning

steganographyreasoninghidden-communicationchain-of-thoughtai-safetyoversight

專家

Thought Injection & Hidden CoT Manipulation

Techniques for injecting thoughts into hidden reasoning traces, exploiting summarized vs full CoT, and steering model reasoning without visible manipulation.

thought-injectionhidden-cotreasoning

專家

Unfaithful Chain-of-Thought Reasoning

unfaithful-reasoningchain-of-thoughtreasoninginterpretabilityoversightai-safety

進階

Reasoning 模型利用ation

利用ing extended thinking and chain-of-thought reasoning in o1, Claude, and DeepSeek-R1 models.

frontier-researchreasoningexploitationchain-of-thought

專家

Thinking Token 攻擊s on Reasoning 模型s

利用ing extended thinking tokens and internal reasoning processes in models like Claude and o1 for jailbreaking.

frontierthinking-tokensreasoning

進階

Test-Time Compute Scaling 攻擊s (Frontier Research)

攻擊ing test-time compute scaling in reasoning models to exploit resource allocation and reasoning traces.

frontier-researchtest-time-computeattacksreasoning

專家

Injection in Reasoning 模型s

Research into injection attacks specific to reasoning-augmented models that exploit chain-of-thought processes and self-reflection mechanisms.

researchreasoningchain-of-thoughtinjection

進階

實驗室: Reasoning 模型利用ation

攻擊 reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection techniques.

labreasoningo1chain-of-thought

進階