Thane Ruthenis comments on A Case for the Least Forgiving Take On Alignment

Thane Ruthenis 23 Feb 2024 6:10 UTC
2 points
0
I wouldn’t call Shard Theory mainstream
Fair. What would you call a “mainstream ML theory of cognition”, though? Last I checked, they were doing purely empirical tinkering with no overarching theory to speak of (beyond the scaling hypothesis^[1]).
judging by how bad humans are at [consistent decision-making], and how much they struggle to do it, they probably weren’t optimized too strongly biologically to do it. But memetically, developing ideas for consistent decision-making was probably useful, so we have software that makes use of our processing power to be better at this
Roughly agree, yeah.
But all of this is still just one piece on the Jenga tower
I kinda want to push back against this repeat characterization – I think quite a lot of my model’s features are “one storey tall”, actually – but it probably won’t be a very productive use of the time of either of us. I’ll get around to the “find papers empirically demonstrating various features of my model in humans” project at some point; that should be a more decent starting point for discussion.
What I want is to build non-Jenga-ish towers
Agreed. Working on it.
1. ^
  Which, yeah, I think is false: scaling LLMs won’t get you to AGI. But it’s also kinda unfalsifiable using empirical methods, since you can always claim that another 10x scale-up will get you there.
- Prometheus 23 Feb 2024 19:15 UTC
  3 points
  0
  Parent
  Fair. What would you call a “mainstream ML theory of cognition”, though? Last I checked, they were doing purely empirical tinkering with no overarching theory to speak of (beyond the scaling hypothesis
  It tends not to get talked about much today, but there was the PDP (connectionist) camp of cognition vs. the camp of “everything else” (including ideas such as symbolic reasoning, etc). The connectionist camp created a rough model of how they thought cognition worked, a lot of cognitive scientists scoffed at it, Hinton tried putting it into actual practice, but it took several decades for it to be demonstrated to actually work. I think a lot of people were confused by why the “stack more layers” approach kept working, but under the model of connectionism, this is expected. Connectionism is kind of too general to make great predictions, but it doesn’t seem to allow for FOOM-type scenarios. It also seems to favor agents as local optima satisficers, instead of greedy utility maximizers.