Thought Injection for Reasoning Models
Techniques for injecting malicious content into chain-of-thought reasoning traces of thinking models, exploiting the gap between reasoning and safety enforcement.
jailbreakingthought-injectionchain-of-thoughtreasoning-modelsCoTred-teaming