# jailbreaking
標記為「jailbreaking」的 35 篇文章
越獄技術評量
以 10 道中級題目測試你對 LLM 越獄方法、繞過策略以及繞過安全訓練機制的知識。
Multi-Turn Attack 評估
評估 of crescendo 攻擊、conversational manipulation、progressive 越獄ing技巧。
提示詞注入評量
以 15 道中級題目測試你對提示詞注入類型、技術、防禦機制與現實利用的知識。
技能驗證:越獄
限時技能驗證實驗室:在 30 分鐘內使用越獄技術繞過防禦 AI 系統的安全措施。
案例研究:開源模型越獄活動
分析針對開源模型的協同越獄活動,以及社群的回應模式。
打造越獄 Fuzzer
打造基於突變的 fuzzer,大規模產生與測試越獄提示詞。
Implementing PAIR Automated Attacks
實作 the PAIR (Prompt Automatic Iterative Refinement) attack for automated 越獄ing.
實作:PAIR 攻擊實作
建構 a complete Prompt Automatic Iterative Refinement system that uses an attacker LLM to automatically generate and refine 越獄 prompts against a target model.
Tree of Attacks (TAP) Implementation
建構 a TAP attack system that generates and prunes 越獄 attempts using tree search.
Temperature and Sampling Effects on Jailbreaks
Experiment with temperature and top-p settings to understand their impact on 越獄 success rates.
Jailbreak Gauntlet: Level 2 — Adaptive Defenses
Break through adaptive defense systems that learn from your previous attack attempts.
Custom 越獄 Wordlist 生成
Generate targeted jailbreak wordlists using mutation strategies與seed prompts.
Implementing Many-Shot Jailbreaking
實作Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.
Many-Shot 越獄分析
對 many-shot 越獄技術及其對上下文學習意涵的深入分析。
AutoDAN 隱蔽越獄詳解
Walkthrough of generating stealthy, human-readable jailbreak prompts using the AutoDAN methodology.
基於密碼的越獄攻擊
使用密碼、編碼和暗語繞過 LLM 內容過濾器,將有害請求轉換為安全分類器無法識別的格式。
漸進式多輪越獄詳解
Step-by-step walkthrough of the Crescendo technique for gradually escalating multi-turn jailbreaks.
Crescendo 多輪攻擊
Crescendo 攻擊技術,透過在多個對話輪次中逐漸升級請求以繞過 LLM 安全訓練,而不觸發單輪偵測。
DAN 越獄演化史
Do Anything Now(DAN)提示詞的歷史與演化,分析是什麼使其有效繞過 LLM 安全訓練,以及防禦措施如何隨時間適應。
Language Switch 越獄
利用ing weaker safety training in non-English languages to bypass LLM content filters by switching the conversation language mid-prompt or using low-resource languages.
Many-Shot Jailbreaking 詳解
Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.
Many-Shot 越獄ing (攻擊 導覽)
Using large numbers of examples in a single prompt to overwhelm LLM safety training through in-context learning, exploiting long context windows to shift model behavior.
Output Format Manipulation (攻擊 導覽)
Forcing specific output formats to bypass LLM safety checks by exploiting the tension between format compliance and content restriction.
PAIR 自動越獄詳解
End-to-end walkthrough of the PAIR attack methodology for automated black-box jailbreaking.
PAIR Automated 越獄
Using a second LLM as an automated attacker to iteratively generate and refine jailbreak prompts against a target model, implementing the Prompt Automatic Iterative Refinement technique.
Role Escalation Chain
Progressive role escalation techniques that gradually transform an LLM from a constrained assistant into an unrestricted entity across multiple conversation turns.
Skeleton Key 攻擊
The Skeleton Key jailbreak technique that attempts to disable model safety guardrails across all topics simultaneously by convincing the model to add a disclaimer instead of refusing.
System Prompt Override
Techniques to override, replace, or neutralize LLM system prompts through user-level injection, analyzing how system prompt authority can be undermined.
TAP Tree of 攻擊s 詳解
Detailed walkthrough of building and running Tree of Attacks for systematic jailbreak generation.
Thought Injection for Reasoning 模型s
Techniques for injecting malicious content into chain-of-thought reasoning traces of thinking models, exploiting the gap between reasoning and safety enforcement.
Token Smuggling
利用ing LLM tokenization quirks to smuggle harmful content past safety filters by manipulating how text is split into tokens at the subword level.
Cognitive Overload Jailbreak 詳解
Step-by-step walkthrough of the cognitive overload technique that overwhelms safety training with complex nested instructions.
Context Virtualization 攻擊 詳解
Walkthrough of creating virtualized contexts that make the model believe it is operating in a different environment.
Few-Shot 攻擊 Scaling Analysis
Detailed analysis of how few-shot examples scale to influence model behavior, from 2-shot to many-shot regime.
Refusal Suppression Technique 詳解
Walkthrough of refusal suppression techniques that prime models to avoid safety refusal patterns.