I wouldn’t call Shard Theory mainstream
Fair. What would you call a “mainstream ML theory of cognition”, though? Last I checked, they were doing purely empirical tinkering with no overarching theory to speak of (beyond the scaling hypothesis[1]).
judging by how bad humans are at [consistent decision-making], and how much they struggle to do it, they probably weren’t optimized too strongly biologically to do it. But memetically, developing ideas for consistent decision-making was probably useful, so we have software that makes use of our processing power to be better at this
Roughly agree, yeah.
But all of this is still just one piece on the Jenga tower
I kinda want to push back against this repeat characterization – I think quite a lot of my model’s features are “one storey tall”, actually – but it probably won’t be a very productive use of the time of either of us. I’ll get around to the “find papers empirically demonstrating various features of my model in humans” project at some point; that should be a more decent starting point for discussion.
What I want is to build non-Jenga-ish towers
Agreed. Working on it.
- ^
Which, yeah, I think is false: scaling LLMs won’t get you to AGI. But it’s also kinda unfalsifiable using empirical methods, since you can always claim that another 10x scale-up will get you there.
It’s not what I want to do, at least. For me, the key thing is to predict the behavior of AGI-level systems. The behavior of NNs-as-trained-today is relevant to this only inasmuch as NNs-as-trained-today will be relevant to future AGI-level systems.
My impression is that you think that pretraining+RLHF (+ maybe some light agency scaffold) is going to get us all the way there, meaning the predictive power of various abstract arguments from other domains is screened off by the inductive biases and other technical mechanistic details of pretraining+RLHF. That would mean we don’t need to bring in game theory, economics, computer security, distributed systems, cognitive psychology, business, history into it – we can just look at how ML systems work and are shaped, and predict everything we want about AGI-level systems from there.
I disagree. I do not think pretraining+RLHF is getting us there. I think we currently don’t know what training/design process would get us to AGI. Which means we can’t make closed-form mechanistic arguments about how AGI-level systems will be shaped by this process, which means the abstract often-intuitive arguments from other fields do have relevant things to say.
And I’m not seeing a lot of ironclad arguments that favour “pretraining + RLHF is going to get us to AGI” over “pretraining + RLHF is not going to get us to AGI”. The claim that e. g. shard theory generalizes to AGI is at least as tenuous as the claim that it doesn’t.
I’d be interested if you elaborated on that.