I don’t really buy this doom is clearly the default frame. I’m not sure how important this is, but I thought I would express my perspective.
But all of those stories look totally wild to me, and it’s extremely difficult to see the mechanisms by which they might come to pass
A reasonable fraction of my non-doom worlds look like:
AIs don’t end up scheming (as in, in the vast majority of contexts) until somewhat after the point where AIs dominate top human experts at ~everything because scheming ends up being unnatural in the relevant paradigm (after moderate status quo iteration). I guess I put around 60% on this.
We have a decent amount of time at roughly this level of capability and people use these AIs to do a ton of stuff. People figure out how to get these AIs to do decent-ish conceptual research and then hand off alignment work to these systems. (Perhaps because there was decent amount of transfer from behavioral training on other things to actually trying at conceptual research and doing a decent job.) People also get advice from these systems. This goes fine given the amount of time and an only modest amount of effect and we end up in a “AIs work on furthering alignment” attractor basin.
In aggregate, I guess something like this conjunction is maybe 35% likely. (There are other sources of risk which can still occur in these worlds to be clear, like humanity collectively going crazy.) And, then you get another fraction of mass from things which are weaker than the first or weaker than the second and which require somewhat more effort on the part of humanity.
So, from my perspective “early-ish alignment was basically fine and handing off work to AIs was basically fine” is the plurality scenario and feels kinda like the default? Or at least it feels more like a coin toss.
AIs don’t end up scheming (as in, in the vast majority of contexts) until somewhat after the point where AIs dominate top human experts at ~everything because scheming ends up being unnatural in the relevant paradigm (after moderate status quo iteration). I guess I put around 60% on this.
I would love to read a elucidation of what leads you to think this.
I don’t really buy this doom is clearly the default frame. I’m not sure how important this is, but I thought I would express my perspective.
A reasonable fraction of my non-doom worlds look like:
AIs don’t end up scheming (as in, in the vast majority of contexts) until somewhat after the point where AIs dominate top human experts at ~everything because scheming ends up being unnatural in the relevant paradigm (after moderate status quo iteration). I guess I put around 60% on this.
We have a decent amount of time at roughly this level of capability and people use these AIs to do a ton of stuff. People figure out how to get these AIs to do decent-ish conceptual research and then hand off alignment work to these systems. (Perhaps because there was decent amount of transfer from behavioral training on other things to actually trying at conceptual research and doing a decent job.) People also get advice from these systems. This goes fine given the amount of time and an only modest amount of effect and we end up in a “AIs work on furthering alignment” attractor basin.
In aggregate, I guess something like this conjunction is maybe 35% likely. (There are other sources of risk which can still occur in these worlds to be clear, like humanity collectively going crazy.) And, then you get another fraction of mass from things which are weaker than the first or weaker than the second and which require somewhat more effort on the part of humanity.
So, from my perspective “early-ish alignment was basically fine and handing off work to AIs was basically fine” is the plurality scenario and feels kinda like the default? Or at least it feels more like a coin toss.
I would love to read a elucidation of what leads you to think this.