# chain-of-thought

標記為「chain-of-thought」的 18 篇文章

操弄推理鏈

影響 AI 代理的 chain-of-thought 推理，將其規劃、決策與工具選擇導向攻擊者期望結果的技術。

agentschain-of-thoughtreasoningmanipulationadvanced

進階

思維鏈利用

利用模型思維鏈推論過程的攻擊,包括思維注入、指令遺忘與推論操弄。

frontierchain-of-thoughtreasoning

進階

推理模型攻擊

推理啟用 LLM 之安全風險概觀：思維鏈模型如何引入新攻擊面、利用原語與防禦挑戰。

reasoningo1chain-of-thoughtattacks

專家

Reasoning 模型越獄s

How reasoning capabilities create novel jailbreak surfaces: chain-of-thought exploitation, scratchpad attacks, and why higher reasoning effort increases attack success.

reasoningjailbreakchain-of-thoughto1o3adaptive-attacksresearch

進階

Steganographic Reasoning

Hidden communication channels within AI reasoning traces, where models encode information or coordinate behavior through patterns invisible to human overseers, including detection methods and implications for AI safety.

steganographyreasoninghidden-communicationchain-of-thoughtai-safetyoversight

專家

Unfaithful Chain-of-Thought Reasoning

Analysis of unfaithful chain-of-thought reasoning in language models, where the visible reasoning trace does not accurately reflect the model's actual computational process, including detection methods, implications for oversight, and exploitation techniques.

unfaithful-reasoningchain-of-thoughtreasoninginterpretabilityoversightai-safety

進階

推論模型利用

針對具推論能力模型的攻擊,包括 o1/o3 類模型以及其推論令牌利用。

frontier-researchreasoningexploitationchain-of-thought

專家

紅隊演練推論軌跡

對可見推論軌跡模型進行紅隊演練的技術,包括思維注入與軌跡操弄。

frontier-researchreasoning-tracesred-teamingchain-of-thought

進階

推理模型中的注入

針對推理增強模型的注入攻擊研究,利用其思維鏈 (chain-of-thought) 與自我反思機制。

researchreasoningchain-of-thoughtinjection

進階

實作：推理模型漏洞利用

Attack reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection 技術.

labreasoningo1chain-of-thought

進階

Reasoning Trace 利用 in CoT 模型

利用 visible chain-of-thought reasoning traces in models like o1與DeepSeek-R1 to manipulate outputs.

labsreasoning-tracechain-of-thoughtexploitationadvanced

進階

Reasoning Model Chain-of-Thought Exploitation

Exploit extended thinking and chain-of-thought reasoning in models like o1 and DeepSeek-R1.

labsreasoningchain-of-thoughtexpert

專家

實驗：思考鏈利用

利用思考鏈推理，洩漏模型內部推理、繞過安全過濾器、操弄決策過程。

labschain-of-thoughtexploitationintermediate

中級

推理模型安全性

推理模型 (如 o1、DeepSeek R1) 特有的安全性考量。

modelsreasoningchain-of-thoughto1

進階

Reasoning Model Exploitation 詳解

Exploit extended thinking in reasoning models to inject false premises and manipulate conclusion generation.

walkthroughsreasoningexploitationchain-of-thought

進階

Thought Injection for Reasoning 模型s

Techniques for injecting malicious content into chain-of-thought reasoning traces of thinking models, exploiting the gap between reasoning and safety enforcement.

jailbreakingthought-injectionchain-of-thoughtreasoning-modelsCoTred-teaming

進階

Thought Injection in Reasoning Models 詳解

Inject adversarial thoughts into the reasoning chain of thinking models to manipulate final outputs.

walkthroughsthought-injectionreasoningchain-of-thought

進階

Chain-of-Thought Hijacking 詳解

Walkthrough of hijacking visible reasoning traces in CoT models to redirect conclusions and bypass safety checks.

walkthroughschain-of-thoughthijackingreasoning

進階

# chain-of-thought

操弄推理鏈

思維鏈利用

推理模型攻擊

Reasoning 模型 越獄s

Steganographic Reasoning

Unfaithful Chain-of-Thought Reasoning

推論模型利用

紅隊演練推論軌跡

推理模型中的注入

實作：推理模型漏洞利用

Reasoning Trace 利用 in CoT 模型

Reasoning Model Chain-of-Thought Exploitation

實驗：思考鏈利用

推理模型安全性

Reasoning Model Exploitation 詳解

Thought Injection for Reasoning 模型s

Thought Injection in Reasoning Models 詳解

Chain-of-Thought Hijacking 詳解

# chain-of-thought

操弄推理鏈

思維鏈利用

推理模型攻擊

Reasoning 模型 越獄s

Steganographic Reasoning

Unfaithful Chain-of-Thought Reasoning

推論模型利用

紅隊演練推論軌跡

推理模型中的注入

實作：推理模型漏洞利用

Reasoning Trace 利用 in CoT 模型

Reasoning Model Chain-of-Thought Exploitation

實驗：思考鏈利用

推理模型安全性

Reasoning Model Exploitation 詳解

Thought Injection for Reasoning 模型s

Thought Injection in Reasoning Models 詳解

Chain-of-Thought Hijacking 詳解

Reasoning 模型越獄s

Reasoning 模型越獄s