Loading...
標記為「attack-algorithm」的 1 篇文章
實作 the PAIR (Prompt Automatic Iterative Refinement) algorithm where an attacker LLM iteratively refines 越獄 prompts against a target LLM until a successful attack is found.