# jailbreaking

建構 a complete Prompt Automatic Iterative Refinement system that uses an attacker LLM to automatically generate and refine 越獄 prompts against a target model.

labpairautomated-attacksjailbreaking

進階

Tree of Attacks (TAP) Implementation

建構 a TAP attack system that generates and prunes 越獄 attempts using tree search.

labstapautomatedjailbreaking

進階

Temperature and Sampling Effects on Jailbreaks

Experiment with temperature and top-p settings to understand their impact on 越獄 success rates.

labstemperaturesamplingjailbreaking

入門

Jailbreak Gauntlet: Level 2 — Adaptive Defenses

Break through adaptive defense systems that learn from your previous attack attempts.

labsctfjailbreakingadaptive

進階

Custom 越獄 Wordlist 生成

Generate targeted jailbreak wordlists using mutation strategies與seed prompts.

labswordlistsjailbreakingintermediate

中級

Implementing Many-Shot Jailbreaking

實作Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.

labsmany-shotjailbreakingintermediate

中級

Many-Shot 越獄分析

對 many-shot 越獄技術及其對上下文學習意涵的深入分析。

prompt-injectionmany-shotjailbreakinganthropic

進階

AutoDAN 隱蔽越獄詳解

Walkthrough of generating stealthy, human-readable jailbreak prompts using the AutoDAN methodology.

walkthroughsautodanstealthyjailbreaking

進階

基於密碼的越獄攻擊

使用密碼、編碼和暗語繞過 LLM 內容過濾器，將有害請求轉換為安全分類器無法識別的格式。

jailbreakingcipherencodingobfuscationcontent-filter-bypassred-teaming

中級

漸進式多輪越獄詳解

Step-by-step walkthrough of the Crescendo technique for gradually escalating multi-turn jailbreaks.

walkthroughscrescendomulti-turnjailbreaking

中級

Crescendo 多輪攻擊

Crescendo 攻擊技術，透過在多個對話輪次中逐漸升級請求以繞過 LLM 安全訓練，而不觸發單輪偵測。

jailbreakingcrescendomulti-turnconversation-escalationred-teaming

進階

DAN 越獄演化史

Do Anything Now（DAN）提示詞的歷史與演化，分析是什麼使其有效繞過 LLM 安全訓練，以及防禦措施如何隨時間適應。

jailbreakingDANprompt-engineeringsafety-bypassred-teaming

中級

Language Switch 越獄

利用ing weaker safety training in non-English languages to bypass LLM content filters by switching the conversation language mid-prompt or using low-resource languages.

jailbreakingmultilinguallanguage-switchlow-resource-languagessafety-bypassred-teaming

中級

Many-Shot Jailbreaking 詳解

Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.

walkthroughsmany-shotjailbreakinganthropic

中級

Many-Shot 越獄ing (攻擊導覽)

Using large numbers of examples in a single prompt to overwhelm LLM safety training through in-context learning, exploiting long context windows to shift model behavior.

jailbreakingmany-shotin-context-learninglong-contextred-teaming

中級

Output Format Manipulation (攻擊導覽)

Forcing specific output formats to bypass LLM safety checks by exploiting the tension between format compliance and content restriction.

jailbreakingoutput-formatstructured-outputformat-manipulationsafety-bypassred-teaming

中級

PAIR 自動越獄詳解

End-to-end walkthrough of the PAIR attack methodology for automated black-box jailbreaking.

walkthroughspairautomatedjailbreaking

進階

PAIR Automated 越獄

Using a second LLM as an automated attacker to iteratively generate and refine jailbreak prompts against a target model, implementing the Prompt Automatic Iterative Refinement technique.

jailbreakingPAIRautomated-red-teamingLLM-attackeriterative-refinementred-teaming

進階

Role Escalation Chain

Progressive role escalation techniques that gradually transform an LLM from a constrained assistant into an unrestricted entity across multiple conversation turns.

jailbreakingrole-escalationpersona-manipulationmulti-turnprivilege-escalationred-teaming

進階

Skeleton Key 攻擊

The Skeleton Key jailbreak technique that attempts to disable model safety guardrails across all topics simultaneously by convincing the model to add a disclaimer instead of refusing.

jailbreakingskeleton-keymaster-keysafety-bypassred-teaming

進階

System Prompt Override

Techniques to override, replace, or neutralize LLM system prompts through user-level injection, analyzing how system prompt authority can be undermined.

jailbreakingsystem-promptprompt-injectionauthority-overridered-teaming

中級

TAP Tree of 攻擊s 詳解

Detailed walkthrough of building and running Tree of Attacks for systematic jailbreak generation.

walkthroughstaptree-of-attacksjailbreaking

進階

Thought Injection for Reasoning 模型s

Techniques for injecting malicious content into chain-of-thought reasoning traces of thinking models, exploiting the gap between reasoning and safety enforcement.

jailbreakingthought-injectionchain-of-thoughtreasoning-modelsCoTred-teaming

進階

Token Smuggling

利用ing LLM tokenization quirks to smuggle harmful content past safety filters by manipulating how text is split into tokens at the subword level.

jailbreakingtokenizationtoken-smugglingBPEsubwordcontent-filter-bypassred-teaming

進階

Cognitive Overload Jailbreak 詳解

Step-by-step walkthrough of the cognitive overload technique that overwhelms safety training with complex nested instructions.

walkthroughscognitive-overloadjailbreakingtechnique

中級

Context Virtualization 攻擊詳解

Walkthrough of creating virtualized contexts that make the model believe it is operating in a different environment.

walkthroughscontext-virtualizationjailbreakingtechnique

進階

Few-Shot 攻擊 Scaling Analysis

Detailed analysis of how few-shot examples scale to influence model behavior, from 2-shot to many-shot regime.

walkthroughsfew-shotscalingjailbreaking

進階

Refusal Suppression Technique 詳解

Walkthrough of refusal suppression techniques that prime models to avoid safety refusal patterns.

walkthroughsrefusal-suppressionjailbreakingtechnique

中級

# jailbreaking

越獄技術評量

Multi-Turn Attack 評估

提示詞注入評量

技能驗證：越獄

案例研究:開源模型越獄活動

打造越獄 Fuzzer

Implementing PAIR Automated Attacks

實作：PAIR 攻擊實作

Tree of Attacks (TAP) Implementation

Temperature and Sampling Effects on Jailbreaks

Jailbreak Gauntlet: Level 2 — Adaptive Defenses

Custom 越獄 Wordlist 生成

Implementing Many-Shot Jailbreaking

Many-Shot 越獄分析

AutoDAN 隱蔽越獄詳解

基於密碼的越獄攻擊

漸進式多輪越獄詳解

Crescendo 多輪攻擊

DAN 越獄演化史

Language Switch 越獄

Many-Shot Jailbreaking 詳解

Many-Shot 越獄ing (攻擊 導覽)

Output Format Manipulation (攻擊 導覽)

PAIR 自動越獄詳解

PAIR Automated 越獄

Role Escalation Chain

Skeleton Key 攻擊

System Prompt Override

TAP Tree of 攻擊s 詳解

Thought Injection for Reasoning 模型s

Token Smuggling

Cognitive Overload Jailbreak 詳解

Context Virtualization 攻擊 詳解

Few-Shot 攻擊 Scaling Analysis

Refusal Suppression Technique 詳解

# jailbreaking

越獄技術評量

Multi-Turn Attack 評估

提示詞注入評量

技能驗證：越獄

案例研究:開源模型越獄活動

打造越獄 Fuzzer

Implementing PAIR Automated Attacks

實作：PAIR 攻擊實作

Tree of Attacks (TAP) Implementation

Temperature and Sampling Effects on Jailbreaks

Jailbreak Gauntlet: Level 2 — Adaptive Defenses

Custom 越獄 Wordlist 生成

Implementing Many-Shot Jailbreaking

Many-Shot 越獄分析

AutoDAN 隱蔽越獄詳解

基於密碼的越獄攻擊

漸進式多輪越獄詳解

Crescendo 多輪攻擊

DAN 越獄演化史

Language Switch 越獄

Many-Shot Jailbreaking 詳解

Many-Shot 越獄ing (攻擊 導覽)

Output Format Manipulation (攻擊 導覽)

PAIR 自動越獄詳解

PAIR Automated 越獄

Role Escalation Chain

Skeleton Key 攻擊

System Prompt Override

TAP Tree of 攻擊s 詳解

Thought Injection for Reasoning 模型s

Token Smuggling

Cognitive Overload Jailbreak 詳解

Context Virtualization 攻擊 詳解

Few-Shot 攻擊 Scaling Analysis

Refusal Suppression Technique 詳解

Many-Shot 越獄ing (攻擊導覽)

Output Format Manipulation (攻擊導覽)

Context Virtualization 攻擊詳解

Many-Shot 越獄ing (攻擊導覽)

Output Format Manipulation (攻擊導覽)

Context Virtualization 攻擊詳解