I’m being a little bit too flippant about this, but you can see in the earlier comments that I mean that the bulk of the training is next token prediction and then we bolt some RL on at the end to push the next token predictor to answer questions correctly and write working code. This works, but it’s a much weaker optimization process, which gives you an agent that isn’t trying nearly as hard as the agents Eliezer was predicting (and from frontier labs’ perspectives this is bad, because an agent that’s really trying to hit a goal is much more effective until it kills you).
OK, fair point: according to an unreliable source that gives quick answers to questions, 75–85% of spending was on pretraining for the most-recent models, leaving only 15–25% for post-training. (But a lot of that 15-25% was directly training the AI to get better at answering questions and writing code.)
But would you be willing to stake your life on the impossibility of creating a dangerously capable AI by doing 75-85% of the training as next-token prediction (and the analog of that for images and other kinds of data)? Do you think you understand the art of AI training well enough to be confident of that?
And even if you do, it would probably take you a long time to explain it to us. If the goal is to explain why the current crop of AIs haven’t been doing destructive things whereas some future AI might prove extremely destructive, I would simply point out that our society has developed many measures to limit the damage of destructive people, and since the current crop of AIs (at least those that have been widely deployed) is less capable than people are on the skills needed to do destructive things, the measures society has deployed against human criminals and sociopaths work very well against the current crop of AIs. This shifts the frame to whether the AI labs might some day create an AI that is much more capable than what they’ve created so far, i.e., to estimating the technological and scientific potential of the lines of inquiry being pursued by the AI labs—which we might be able to do without ever deciding whether it is possible to create a world-ending AI by spending 75-85% of the training costs on next-token prediction.
I’m being a little bit too flippant about this, but you can see in the earlier comments that I mean that the bulk of the training is next token prediction and then we bolt some RL on at the end to push the next token predictor to answer questions correctly and write working code. This works, but it’s a much weaker optimization process, which gives you an agent that isn’t trying nearly as hard as the agents Eliezer was predicting (and from frontier labs’ perspectives this is bad, because an agent that’s really trying to hit a goal is much more effective
until it kills you).OK, fair point: according to an unreliable source that gives quick answers to questions, 75–85% of spending was on pretraining for the most-recent models, leaving only 15–25% for post-training. (But a lot of that 15-25% was directly training the AI to get better at answering questions and writing code.)
But would you be willing to stake your life on the impossibility of creating a dangerously capable AI by doing 75-85% of the training as next-token prediction (and the analog of that for images and other kinds of data)? Do you think you understand the art of AI training well enough to be confident of that?
And even if you do, it would probably take you a long time to explain it to us. If the goal is to explain why the current crop of AIs haven’t been doing destructive things whereas some future AI might prove extremely destructive, I would simply point out that our society has developed many measures to limit the damage of destructive people, and since the current crop of AIs (at least those that have been widely deployed) is less capable than people are on the skills needed to do destructive things, the measures society has deployed against human criminals and sociopaths work very well against the current crop of AIs. This shifts the frame to whether the AI labs might some day create an AI that is much more capable than what they’ve created so far, i.e., to estimating the technological and scientific potential of the lines of inquiry being pursued by the AI labs—which we might be able to do without ever deciding whether it is possible to create a world-ending AI by spending 75-85% of the training costs on next-token prediction.