Gotcha, this makes sense to me now, given the assumption that to get AGI we need to train a P-parameter model on the optimal scaling, where P is fixed. Thanks!
...though now I’m confused about why we would assume that. Surely that assumption is wrong?
Humans are very constrained in terms of brain size and data, so we shouldn’t assume that these quantities are scaled optimally in some sense that generalizes to deep learning models.
Anyhow we don’t need to guess the amount of data the human brain needs: we can just estimate it directly, just like we estimate brain-parameter count.
To move to a more general complaint about the bio anchors paradigm: it never made much sense to assume that current scaling laws would hold; clearly scaling will change once we train on new data modalities; we know that human brains have totally different scaling laws than DL models; and an AGI architecture will again have different scaling laws. Going with the GPT-3 scaling law is a very shaky best guess.
So it seems weird to me to put so much weight on this particular estimate, such that someone figuring out how to scale models much more cheaply would update one in the direction of longer timelines! Surely the bio anchor assumptions cannot possibly be strong enough to outweigh the commonsense update of ‘whoa, we can scale much more quickly now’?
The only way that update makes sense is if you actually rely mostly on bio anchors to estimate timelines (rather than taking bio anchors to be a loose prior, and update off the current state and rate of progress in ML), which seems very wrong to me.
Gotcha, this makes sense to me now, given the assumption that to get AGI we need to train a P-parameter model on the optimal scaling, where P is fixed. Thanks!
...though now I’m confused about why we would assume that. Surely that assumption is wrong?
Humans are very constrained in terms of brain size and data, so we shouldn’t assume that these quantities are scaled optimally in some sense that generalizes to deep learning models.
Anyhow we don’t need to guess the amount of data the human brain needs: we can just estimate it directly, just like we estimate brain-parameter count.
To move to a more general complaint about the bio anchors paradigm: it never made much sense to assume that current scaling laws would hold; clearly scaling will change once we train on new data modalities; we know that human brains have totally different scaling laws than DL models; and an AGI architecture will again have different scaling laws. Going with the GPT-3 scaling law is a very shaky best guess.
So it seems weird to me to put so much weight on this particular estimate, such that someone figuring out how to scale models much more cheaply would update one in the direction of longer timelines! Surely the bio anchor assumptions cannot possibly be strong enough to outweigh the commonsense update of ‘whoa, we can scale much more quickly now’?
The only way that update makes sense is if you actually rely mostly on bio anchors to estimate timelines (rather than taking bio anchors to be a loose prior, and update off the current state and rate of progress in ML), which seems very wrong to me.