You seem to think that imitation resulted in LLMs quickly saturating on an S-curve, but relevant metrics (e.g. time-horizon seem like they smoothly advance without a clear reduction in slope from the regime where pretraining was rapidly being scaled up (e.g. up to and through GPT-4) to after (in fact, the slope seems somewhat higher).
Presumably you think some qualitative notion of intelligence (which is hard to measure) has slowed down?
My view is that basically everything is progressing relatively smoothly and there isn’t anything which is clearly stalled in a robust way.
That’s not the relevant metric. The process of training involves a model skyrocketing in capabilities, from a random initialization to a human-ish level (or the surface appearance of it, at least). There’s a simple trick – pretraining – which allows to push a model’s intelligence from zero to that level.
Advancing past this point then slows down to a crawl: each incremental advance requires new incremental research derived by humans, rather than just turning a compute crank.
(Indeed, IIRC a model’s loss curves across training do look like S-curves? Edit: On looking it up, nope, I think.)
The FOOM scenario, on the other hand, assumes a paradigm that grows from random initialization to human level to superintelligence all in one go, as part of the same training loop, without a phase change from “get it to human level incredibly fast, over months” to “painstakingly and manually improve the paradigm past the human level, over years/decades”.
Relevant metrics of performance are roughly linear in log-compute when compute is utilized effectively in the current paradigm for training frontier models.
From my perspective it looks like performance has been steadily advancing as you scale up compute and other resources.
(This isn’t to say that pretraining hasn’t had lower returns recently, but you made a stronger claim.)
I think one of the (many) reasons people have historically tended to miscommunicate/talk past each other so much about AI timelines, is that the perceived suddenness of growth rates depends heavily on your choice of time span. (As Eliezer puts it, “Any process is continuous if you zoom in close enough.”)
It sounds to me like you guys (Thane and Ryan) agree about the growth rate of the training process, but are assessing its perceived suddenness/continuousness relative to different time spans?
You seem to think that imitation resulted in LLMs quickly saturating on an S-curve, but relevant metrics (e.g. time-horizon seem like they smoothly advance without a clear reduction in slope from the regime where pretraining was rapidly being scaled up (e.g. up to and through GPT-4) to after (in fact, the slope seems somewhat higher).
Presumably you think some qualitative notion of intelligence (which is hard to measure) has slowed down?
My view is that basically everything is progressing relatively smoothly and there isn’t anything which is clearly stalled in a robust way.
That’s not the relevant metric. The process of training involves a model skyrocketing in capabilities, from a random initialization to a human-ish level (or the surface appearance of it, at least). There’s a simple trick – pretraining – which allows to push a model’s intelligence from zero to that level.
Advancing past this point then slows down to a crawl: each incremental advance requires new incremental research derived by humans, rather than just turning a compute crank.
(Indeed, IIRC a model’s loss curves across training do look like S-curves? Edit: On looking it up, nope, I think.)
The FOOM scenario, on the other hand, assumes a paradigm that grows from random initialization to human level to superintelligence all in one go, as part of the same training loop, without a phase change from “get it to human level incredibly fast, over months” to “painstakingly and manually improve the paradigm past the human level, over years/decades”.
Relevant metrics of performance are roughly linear in log-compute when compute is utilized effectively in the current paradigm for training frontier models.
From my perspective it looks like performance has been steadily advancing as you scale up compute and other resources.
(This isn’t to say that pretraining hasn’t had lower returns recently, but you made a stronger claim.)
I think one of the (many) reasons people have historically tended to miscommunicate/talk past each other so much about AI timelines, is that the perceived suddenness of growth rates depends heavily on your choice of time span. (As Eliezer puts it, “Any process is continuous if you zoom in close enough.”)
It sounds to me like you guys (Thane and Ryan) agree about the growth rate of the training process, but are assessing its perceived suddenness/continuousness relative to different time spans?