Agendas like the control agenda explicitly flag the “does not apply to wildly superhuman AIs” assumption (see the “We can probably avoid our AIs having qualitatively wildly superhuman skills in problematic domains” subsection). Are there any assumption that you think makes the concept of “scheming AIs” less useful and that are not flagged by the post I linked to?
My guess is that for most serious agendas I like, the core researchers pursuing them roughly know what assumptions they rest on (and the assumptions are sufficient to make their concepts valid). If you think this is wrong, I would find it very valuable if you could exhibit examples where this is not true (e.g. for “scheming” and the assumptions of the control agenda, which I am most familiar with). Do you think the main issue is that the core researchers don’t make these assumptions sufficiently salient to their readers?
Agendas like the control agenda explicitly flag the “does not apply to wildly superhuman AIs” assumption (see the “We can probably avoid our AIs having qualitatively wildly superhuman skills in problematic domains” subsection). Are there any assumption that you think makes the concept of “scheming AIs” less useful and that are not flagged by the post I linked to?
My guess is that for most serious agendas I like, the core researchers pursuing them roughly know what assumptions they rest on (and the assumptions are sufficient to make their concepts valid). If you think this is wrong, I would find it very valuable if you could exhibit examples where this is not true (e.g. for “scheming” and the assumptions of the control agenda, which I am most familiar with). Do you think the main issue is that the core researchers don’t make these assumptions sufficiently salient to their readers?