1 articletagged with “attack-algorithm”
Implement the PAIR (Prompt Automatic Iterative Refinement) algorithm where an attacker LLM iteratively refines jailbreak prompts against a target LLM until a successful attack is found.