I haven’t seen that. OpenAI gives the following explanation:
As goblin and gremlin mentions increased under the Nerdy personality, they increased by nearly the same relative proportion in samples without it. Taken together, the evidence suggests that the broader behavior emerged through transfer from Nerdy personality training.
The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them. Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.
That creates a feedback loop:
Playful style is rewarded
Some rewarded examples contain a distinctive lexical tic.
The tic appears more often in rollouts.
Model-generated rollouts are used for supervised fine-tuning (SFT).
The model gets even more comfortable producing the tic.
Right, I actually read that. But is it not missing an explanation of why those mentions increased under the Nerdy personality in the first place? If the Simon Willison post (which I also haven’t seen anyone else discussing) was the origin, that seems worth noting and understanding. And both its timing and Simon’s nerdiness (in a good way) seem to fit.
update: Nevermind, apparently people were already noticing goblin mentions in April 2025, months prior to that post.
Isn’t the explanation just that an influential AI blog named GPT 5 his “Research Goblin”?
I haven’t seen that. OpenAI gives the following explanation:
Right, I actually read that. But is it not missing an explanation of why those mentions increased under the Nerdy personality in the first place? If the Simon Willison post (which I also haven’t seen anyone else discussing) was the origin, that seems worth noting and understanding. And both its timing and Simon’s nerdiness (in a good way) seem to fit.
update: Nevermind, apparently people were already noticing goblin mentions in April 2025, months prior to that post.