You might think that, because LLMs are grown without much understanding and trained only to predict human text, they cannot do anything except regurgitate human utterances. But that would be incorrect. [...]
Furthermore, AIs nowadays are not trained only to predict human-generated text. An AI-grower might give their AI sixteen tries at solving a math problem, thinking aloud in words about how to solve it; then, the “chain-of-thought” for whichever of the sixteen tries went best would get further reinforced by gradient descent, yielding what’s called a reasoning model. That’s a sort of training that can push AIs to think thoughts no human could think.
How does that conclusion follow? If a base model can only regurgitate human utterances, how is generating sixteen utterances and then reinforcing some of them leads to it… not regurgitating human utterances?
In the first sentence, Eliezer and Nate are (explicitly) stating that LLMs can say things that are not just regurgitations of human utterances.
Sure; but the following sections are meant as explanations/justifications of why that is the case. The paragraph I omitted does a good job of explaining why they would need to learn to predict the world at large, not just humans, and would therefore contain more than just human-mimicrky algorithms. To reinforce that with the point about reasoning models, one could perhaps explain how that “generate sixteen CoTs, pick the best” training can push LLMs to recruit those hidden algorithms for the purposes of steering rather than just prediction, or even to incrementally develop entirely new skills.
A full explanation of reinforcement learning is probably not worth it (perhaps it was in the additional 200% of the book Eliezer wrote, but I agree it should’ve been aggressively pruned). But as-is, there are just clearly missing pieces here.
In the first sentence, Eliezer and Nate are (explicitly) stating that LLMs can say things that are not just regurgitations of human utterances.
Sure; but the following sections are meant as explanations/justifications of why that is the case. The paragraph I omitted does a good job of explaining why they would need to learn to predict the world at large, not just humans, and would therefore contain more than just human-mimicrky algorithms. To reinforce that with the point about reasoning models, one could perhaps explain how that “generate sixteen CoTs, pick the best” training can push LLMs to recruit those hidden algorithms for the purposes of steering rather than just prediction, or even to incrementally develop entirely new skills.
A full explanation of reinforcement learning is probably not worth it (perhaps it was in the additional 200% of the book Eliezer wrote, but I agree it should’ve been aggressively pruned). But as-is, there are just clearly missing pieces here.