There’s no reason that a red-thing-ist category can’t be narrowly useful. Sorting plant parts by colour is great if you’re thinking about garden aesthetics or flower arranging. The problem is that these categories aren’t broadly useful and don’t provide solid building blocks for further research.
At the moment, a lot of agendas focus on handling roughly human-level AI. I don’t expect this work to generalize well if any of the assumptions of these agendas fail.
Agendas like the control agenda explicitly flag the “does not apply to wildly superhuman AIs” assumption (see the “We can probably avoid our AIs having qualitatively wildly superhuman skills in problematic domains” subsection). Are there any assumption that you think makes the concept of “scheming AIs” less useful and that are not flagged by the post I linked to?
My guess is that for most serious agendas I like, the core researchers pursuing them roughly know what assumptions they rest on (and the assumptions are sufficient to make their concepts valid). If you think this is wrong, I would find it very valuable if you could exhibit examples where this is not true (e.g. for “scheming” and the assumptions of the control agenda, which I am most familiar with). Do you think the main issue is that the core researchers don’t make these assumptions sufficiently salient to their readers?
There’s no reason that a red-thing-ist category can’t be narrowly useful. Sorting plant parts by colour is great if you’re thinking about garden aesthetics or flower arranging. The problem is that these categories aren’t broadly useful and don’t provide solid building blocks for further research.
At the moment, a lot of agendas focus on handling roughly human-level AI. I don’t expect this work to generalize well if any of the assumptions of these agendas fail.
Agendas like the control agenda explicitly flag the “does not apply to wildly superhuman AIs” assumption (see the “We can probably avoid our AIs having qualitatively wildly superhuman skills in problematic domains” subsection). Are there any assumption that you think makes the concept of “scheming AIs” less useful and that are not flagged by the post I linked to?
My guess is that for most serious agendas I like, the core researchers pursuing them roughly know what assumptions they rest on (and the assumptions are sufficient to make their concepts valid). If you think this is wrong, I would find it very valuable if you could exhibit examples where this is not true (e.g. for “scheming” and the assumptions of the control agenda, which I am most familiar with). Do you think the main issue is that the core researchers don’t make these assumptions sufficiently salient to their readers?