PAIR Automated Jailbreak
Using a second LLM as an automated attacker to iteratively generate and refine jailbreak prompts against a target model, implementing the Prompt Automatic Iterative Refinement technique.
jailbreakingPAIRautomated-red-teamingLLM-attackeriterative-refinementred-teaming