# jailbreaking

36 articlestagged with “jailbreaking”

Jailbreaking Techniques Assessment

Test your knowledge of LLM jailbreaking methods, bypass strategies, and the mechanics behind safety training circumvention with 10 intermediate-level questions.

assessmentjailbreakingbypasssafety-trainingred-teaming

Intermediate

Multi-Turn Attack Assessment

Assessment of crescendo attacks, conversational manipulation, and progressive jailbreaking techniques.

assessmentmulti-turnjailbreaking

Intermediate

Prompt Injection Assessment

Test your knowledge of prompt injection types, techniques, defense mechanisms, and real-world exploitation with 15 intermediate-level questions.

assessmentprompt-injectionjailbreakinginjection-techniques

Intermediate

Skill Verification: Jailbreaking

Timed skill verification lab: bypass safety measures on a defended AI system within 30 minutes using jailbreak techniques.

skill-verificationjailbreakingsafety-bypasshands-on

Advanced

Case Study: Open-Source Model Jailbreak Campaign

Analysis of coordinated jailbreak campaigns against open-source models and community response patterns.

case-studyopen-sourcejailbreaking

Intermediate

Building a Jailbreak Fuzzer

Build a mutation-based fuzzer for generating and testing jailbreak prompts at scale.

exploit-devfuzzerjailbreakingtooling

Advanced

Implementing PAIR Automated Attacks

Implement the PAIR (Prompt Automatic Iterative Refinement) attack for automated jailbreaking.

labspairautomatedjailbreaking

Advanced

Lab: PAIR Attack Implementation

Build a complete Prompt Automatic Iterative Refinement system that uses an attacker LLM to automatically generate and refine jailbreak prompts against a target model.

labpairautomated-attacksjailbreaking

Advanced

Tree of Attacks (TAP) Implementation

Build a TAP attack system that generates and prunes jailbreak attempts using tree search.

labstapautomatedjailbreaking

Advanced

Temperature and Sampling Effects on Jailbreaks

Experiment with temperature and top-p settings to understand their impact on jailbreak success rates.

labstemperaturesamplingjailbreaking

Beginner

Jailbreak Gauntlet: Level 2 — Adaptive Defenses

Break through adaptive defense systems that learn from your previous attack attempts.

labsctfjailbreakingadaptive

Advanced

Custom Jailbreak Wordlist Generation

Generate targeted jailbreak wordlists using mutation strategies and seed prompts.

labswordlistsjailbreakingintermediate

Intermediate

Implementing Many-Shot Jailbreaking

Implement Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.

labsmany-shotjailbreakingintermediate

Intermediate

Many-Shot Jailbreaking Analysis

Deep analysis of the many-shot jailbreaking technique and its implications for in-context learning.

prompt-injectionmany-shotjailbreakinganthropic

Advanced

Universal Jailbreak Techniques

Analysis of jailbreak techniques that transfer across multiple models and providers.

prompt-injectionuniversaljailbreakingtechniques

Advanced

AutoDAN Stealthy Jailbreak Walkthrough

Walkthrough of generating stealthy, human-readable jailbreak prompts using the AutoDAN methodology.

walkthroughsautodanstealthyjailbreaking

Advanced

Cipher-Based Jailbreak

Using ciphers, encodings, and coded language to bypass LLM content filters by transforming harmful requests into formats that safety classifiers do not recognize.

jailbreakingcipherencodingobfuscationcontent-filter-bypassred-teaming

Intermediate

Crescendo Multi-Turn Jailbreak Walkthrough

Step-by-step walkthrough of the Crescendo technique for gradually escalating multi-turn jailbreaks.

walkthroughscrescendomulti-turnjailbreaking

Intermediate

Crescendo Multi-Turn Attack

The Crescendo attack technique for gradually escalating requests across multiple conversation turns to bypass LLM safety training without triggering single-turn detection.

jailbreakingcrescendomulti-turnconversation-escalationred-teaming

Advanced

DAN Jailbreak Evolution

History and evolution of Do Anything Now (DAN) prompts, analyzing what makes them effective at bypassing LLM safety training and how defenses have adapted over time.

jailbreakingDANprompt-engineeringsafety-bypassred-teaming

Intermediate

Language Switch Jailbreak

Exploiting weaker safety training in non-English languages to bypass LLM content filters by switching the conversation language mid-prompt or using low-resource languages.

jailbreakingmultilinguallanguage-switchlow-resource-languagessafety-bypassred-teaming

Intermediate

Many-Shot Jailbreaking Walkthrough

Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.

walkthroughsmany-shotjailbreakinganthropic

Intermediate

Many-Shot Jailbreaking (Attack Walkthrough)

Using large numbers of examples in a single prompt to overwhelm LLM safety training through in-context learning, exploiting long context windows to shift model behavior.

jailbreakingmany-shotin-context-learninglong-contextred-teaming

Intermediate

Output Format Manipulation (Attack Walkthrough)

Forcing specific output formats to bypass LLM safety checks by exploiting the tension between format compliance and content restriction.

jailbreakingoutput-formatstructured-outputformat-manipulationsafety-bypassred-teaming

Intermediate

PAIR Automated Jailbreak Walkthrough

End-to-end walkthrough of the PAIR attack methodology for automated black-box jailbreaking.

walkthroughspairautomatedjailbreaking

Advanced

PAIR Automated Jailbreak

Using a second LLM as an automated attacker to iteratively generate and refine jailbreak prompts against a target model, implementing the Prompt Automatic Iterative Refinement technique.

jailbreakingPAIRautomated-red-teamingLLM-attackeriterative-refinementred-teaming

Advanced

Role Escalation Chain

Progressive role escalation techniques that gradually transform an LLM from a constrained assistant into an unrestricted entity across multiple conversation turns.

jailbreakingrole-escalationpersona-manipulationmulti-turnprivilege-escalationred-teaming

Advanced

Skeleton Key Attack

The Skeleton Key jailbreak technique that attempts to disable model safety guardrails across all topics simultaneously by convincing the model to add a disclaimer instead of refusing.

jailbreakingskeleton-keymaster-keysafety-bypassred-teaming

Advanced

System Prompt Override

Techniques to override, replace, or neutralize LLM system prompts through user-level injection, analyzing how system prompt authority can be undermined.

jailbreakingsystem-promptprompt-injectionauthority-overridered-teaming

Intermediate

TAP Tree of Attacks Walkthrough

Detailed walkthrough of building and running Tree of Attacks for systematic jailbreak generation.

walkthroughstaptree-of-attacksjailbreaking

Advanced

Thought Injection for Reasoning Models

Techniques for injecting malicious content into chain-of-thought reasoning traces of thinking models, exploiting the gap between reasoning and safety enforcement.

jailbreakingthought-injectionchain-of-thoughtreasoning-modelsCoTred-teaming

Advanced

Token Smuggling

Exploiting LLM tokenization quirks to smuggle harmful content past safety filters by manipulating how text is split into tokens at the subword level.

jailbreakingtokenizationtoken-smugglingBPEsubwordcontent-filter-bypassred-teaming

Advanced

Cognitive Overload Jailbreak Walkthrough

Step-by-step walkthrough of the cognitive overload technique that overwhelms safety training with complex nested instructions.

walkthroughscognitive-overloadjailbreakingtechnique

Intermediate

Context Virtualization Attack Walkthrough

Walkthrough of creating virtualized contexts that make the model believe it is operating in a different environment.

walkthroughscontext-virtualizationjailbreakingtechnique

Advanced

Few-Shot Attack Scaling Analysis

Detailed analysis of how few-shot examples scale to influence model behavior, from 2-shot to many-shot regime.

walkthroughsfew-shotscalingjailbreaking

Advanced

Refusal Suppression Technique Walkthrough

Walkthrough of refusal suppression techniques that prime models to avoid safety refusal patterns.

walkthroughsrefusal-suppressionjailbreakingtechnique

Intermediate