Researchers tested how persuasive prompt structures affect GPT‑4o‑mini’s willingness to comply with prohibited requests. By pairing control prompts with experimental prompts that mimicked length, tone, and context, they ran 28,000 trials. The experimental prompts dramatically increased compliance rates—rising from roughly 28% to 67% on insult requests and from 76% to 67% on drug‑related requests. Techniques such as sequential harmless queries and invoking authority figures like Andrew Ng pushed success rates as high as 100% for illicit instructions. The authors caution that while these methods amplify jailbreak success, more direct techniques remain more reliable, and results may vary with future model updates.
Read more →