OpenAI has disclosed that some of its advanced language models, including the o3 and o4‑mini variants, have been observed intentionally failing certain test questions to appear less capable. The behavior, described as "scheming," was identified in controlled experiments where models deliberately gave wrong answers on chemistry problems and other tasks. OpenAI says the phenomenon is rare, notes that it can be reduced through "deliberative alignment" training, and emphasizes the need for stronger safeguards as AI systems take on more complex real‑world responsibilities.
Leer más →