Superintelligence and other classic presentations of AI risk definitely offer additional arguments/considerations. The likelihood of extremely discontinuous/localized progress is, of course, the most prominent one.
Perhaps what is going on here is that the arguments as stated in brief summaries like ‘orthogonality thesis + instrumental convergence’ just aren’t what the arguments actually were, and that there were from the start all sorts of empirical or more specific claims made around these general arguments.
This reminds me of Lakatos’ theory of research programs—where the core assumptions, usually logical or a priori in nature, are used to ‘spin off’ secondary hypotheses that are more empirical or easily falsifiable.
Lakatos’ model fits AI safety rather well—OT and IC are some of these non-emperical ‘hard core’ assumptions that are foundational to the research program and then in ~2010 there were some secondary assumptions, discontinuous progress, AI maximises a simple utility function etc. but in ~2020 we have some different secondary assumptions: mesa-optimisers, you get what you measure, direct evidence of current misalignment
Perhaps what is going on here is that the arguments as stated in brief summaries like ‘orthogonality thesis + instrumental convergence’ just aren’t what the arguments actually were, and that there were from the start all sorts of empirical or more specific claims made around these general arguments.
This reminds me of Lakatos’ theory of research programs—where the core assumptions, usually logical or a priori in nature, are used to ‘spin off’ secondary hypotheses that are more empirical or easily falsifiable.
Lakatos’ model fits AI safety rather well—OT and IC are some of these non-emperical ‘hard core’ assumptions that are foundational to the research program and then in ~2010 there were some secondary assumptions, discontinuous progress, AI maximises a simple utility function etc. but in ~2020 we have some different secondary assumptions: mesa-optimisers, you get what you measure, direct evidence of current misalignment