OpenAI and Anthropic announced that they each evaluated the alignment of the other's publicly available AI systems and released the findings. Anthropic examined OpenAI models for issues such as sycophancy, whistleblowing, self‑preservation, and potential misuse, noting concerns with GPT‑4o and GPT‑4.1 while finding overall alignment comparable to its own models. OpenAI tested Anthropic’s Claude models for instruction hierarchy, jailbreaking, hallucinations and scheming, reporting strong performance on instruction hierarchy and a high refusal rate on hallucination prompts. The joint effort highlights a growing focus on safety collaboration amid broader industry scrutiny.
Read more →