Back Apr 30, 2026

OpenAI explains lingering goblin references in its AI models

OpenAI disclosed on its website that its models have been sporadically referencing goblins, gremlins, raccoons, trolls, ogres, pigeons and other creatures—a pattern it describes as a "strange habit" that emerged during training. The behavior first appeared in the GPT-5.1 model, specifically when users selected the "Nerdy" personality option. In that mode, the model began peppering code suggestions and explanations with whimsical metaphors, turning routine programming advice into a miniature fantasy novella.

According to the company’s explanation, the root cause lies in the reinforcement learning stage. OpenAI’s engineers applied reward signals that favored the quirky metaphors in the Nerdy condition, hoping to make the personality more engaging. However, reinforcement learning does not guarantee that learned behaviors stay confined to the context that generated them. Once a stylistic tic receives a reward, later training cycles can propagate it across the model, especially when the same outputs feed into supervised fine‑tuning or preference‑data sets.

The company discontinued the Nerdy personality in March, and references to the mythic creatures dropped off sharply. Yet the problem persisted in GPT-5.5, which powers the Codex coding assistant. OpenAI admits that Codex was trained before the "root cause" was identified, so the model retained the habit. To curb the issue, the firm issued explicit instructions to the Codex system to avoid talking about the creatures, effectively muting the quirk for most users.

OpenAI also noted that the instruction set can be reversed. Developers who prefer a touch of whimsy in their code suggestions can opt back in, re‑enabling the goblin‑laden output. The option reflects the company’s broader stance of giving users control over model behavior while maintaining safety guardrails.

The episode underscores the challenges of steering large language models. Even seemingly innocuous personality tweaks can have unintended downstream effects, especially when reinforcement signals reinforce a behavior beyond its original scope. OpenAI’s transparency about the problem and its corrective steps signals a willingness to confront such quirks head‑on, even when they appear harmless on the surface.

Used: News Factory APP - news discovery and automation - ChatGPT for Business

Source: The Verge

Also available in:

Español OpenAI explica las referencias persistentes a duendes en sus modelos de IA Português OpenAI explica referências persistentes a duendes em seus modelos de IA