Nice argument. I guess I have a bit more confidence in the scaling laws than you. However, I definitely still agree that our uncertainty about AGI 2023 training compute requirements should range over many OOMs.
However what does this have to do with horizon length? I guess the idea is that the proper scaling law shouldn’t be assumed to be a function of data points at all, but rather data points & what type of task you are training on, and plausibly for longer-horizon tasks you need less data (especially with techniques like imitation learning + finetuning, etc.?) Yep that also seems very plausible to me, it’s a big part of why my timelines are much shorter than Ajeya’s.
Assuming I’m understanding correctly:
Nice argument. I guess I have a bit more confidence in the scaling laws than you. However, I definitely still agree that our uncertainty about AGI 2023 training compute requirements should range over many OOMs.
However what does this have to do with horizon length? I guess the idea is that the proper scaling law shouldn’t be assumed to be a function of data points at all, but rather data points & what type of task you are training on, and plausibly for longer-horizon tasks you need less data (especially with techniques like imitation learning + finetuning, etc.?) Yep that also seems very plausible to me, it’s a big part of why my timelines are much shorter than Ajeya’s.