It is strange that BOTH breakthroughs in the ARC-AGI-2 benchmark brought major problems: Grok 4 was built by doing RL on Grok 3 and using about as many FLOPs as were spent on pretraining without caring about safety. Gemini 3, the second breakthrough, had Zvi “worry that (he) might inadvertently torture it”.
Back in 2008 Yudkowsky conjectured[1] that one’s mind, being a neural network of fixed size, might eventually end up destroyed by trying to fit too many discoveries into itself. An artificial network trained on increasingly complex tasks and facing diminishing returns due to becoming closer and closer to the peak of its possible capabilities could have its training become increasingly worse for its welfare.
If Gemini 3′s breakthrough was due to training it as far as possible on the best plausible environment, then we might see it on the METR benchmark as well, since Grok 4 also set a world record of 1h50min. In addition, subsequent models will likely fail to outsmart Gemini (think of GPT-5 without Pro and Claude Sonnet 4.5) until they scale beyond Gemini’s levels, especially if Gemini 3.0 Pro was distilled from an RLed analogue of GPT-4.5 or Claude Opus 4.
EDIT: Claude Opus 4.5 was released, and its best performance on ARC-AGI-2 is 37.6%.
Yudkowsky’s exact quote is the following: “Would it all fit into a single human brain, without that mind completely disintegrating under the weight of unrelated associations? And even then, would you have come close to exhausting the space of human possibility, which we’ve surely not finished exploring?” Strictly speaking, Yudkowsky also described the opposite conjecture of the Fun Space being exhaustible, but presented plausible counterarguments.
It is strange that BOTH breakthroughs in the ARC-AGI-2 benchmark brought major problems: Grok 4 was built by doing RL on Grok 3 and using about as many FLOPs as were spent on pretraining without caring about safety. Gemini 3, the second breakthrough, had Zvi “worry that (he) might inadvertently torture it”.
Back in 2008 Yudkowsky conjectured[1] that one’s mind, being a neural network of fixed size, might eventually end up destroyed by trying to fit too many discoveries into itself. An artificial network trained on increasingly complex tasks and facing diminishing returns due to becoming closer and closer to the peak of its possible capabilities could have its training become increasingly worse for its welfare.
If Gemini 3′s breakthrough was due to training it as far as possible on the best plausible environment, then we might see it on the METR benchmark as well, since Grok 4 also set a world record of 1h50min. In addition, subsequent models will likely fail to outsmart Gemini (think of GPT-5 without Pro and Claude Sonnet 4.5) until they scale beyond Gemini’s levels, especially if Gemini 3.0 Pro was distilled from an RLed analogue of GPT-4.5 or Claude Opus 4.
EDIT: Claude Opus 4.5 was released, and its best performance on ARC-AGI-2 is 37.6%.
Yudkowsky’s exact quote is the following: “Would it all fit into a single human brain, without that mind completely disintegrating under the weight of unrelated associations? And even then, would you have come close to exhausting the space of human possibility, which we’ve surely not finished exploring?” Strictly speaking, Yudkowsky also described the opposite conjecture of the Fun Space being exhaustible, but presented plausible counterarguments.