I’m on the optimists discord and I do make the above argument explicitly in this presentation (e.g. slide 4): Reasons for optimism about superalignment (though, fwiw, Idk if I’d go all the way down to 1% p(doom), but I have probably updated something like 10% to <5%, and most of my uncertainty now comes more from the governance / misuse side).
On your points ‘Is massive effective acceleration enough?’ and ‘Will “human level” systems be sufficiently controlled to get enough useful work?’, I think conditioned on aligned-enough ~human-level automated alignment RAs, the answers to the above are very likely yes, because it should be possible to get a very large amount of work out of those systems even in a very brief amount of time—e.g. a couple of months (feasible with e.g. a coordinated pause, or even with a sufficient lead). See e.g. slides 9, 10 of the above presentation (and I’ll note that this argument isn’t new, it’s been made in variously similar forms by e.g. Ajeya Cotra, Lukas Finnveden, Jacob Steinhardt).
I’m generally reasonably optimistic about using human level-ish systems to do a ton of useful work while simultaneously avoiding most risk from these systems. But, I think this requires substantial effort and won’t clearly go well by default.
Mostly the same, perhaps a minor positive update on the technical side (basically, from systems getting somewhat stronger—so e.g. closer to automating AI safety research—while still not showing very dangerous capabilities, like ASL-3, prerequisites to scheming, etc.). My views are even more uncertain / unstable on the governance side though, which probably makes my overall p(doom) (including e.g. stable totalitarianism, s-risks, etc.) more like 20% than 5% (I was probably mostly intuitively thinking of extinction risk only when giving the 5% figure a year ago; overall my median probably hasn’t changed much, but I have more variance, coming from the governance side).
If it’s not a big ask, I’d really like to know your views on more of a control-by-power-hungry-humans side of AI risk.
For example, the first company to create intent-aligned AGI would be wielding incredible power over the rest of us. I don’t think I could trust any of the current leading AI labs to use that power fairly. I don’t think this lab would voluntarily decide to give up control over it either (intuitively, it would take quite something for anyone to give up such a source of power). Is there anything that can be done to prevent such a scenario?
I’m on the optimists discord and I do make the above argument explicitly in this presentation (e.g. slide 4): Reasons for optimism about superalignment (though, fwiw, Idk if I’d go all the way down to 1% p(doom), but I have probably updated something like 10% to <5%, and most of my uncertainty now comes more from the governance / misuse side).
On your points ‘Is massive effective acceleration enough?’ and ‘Will “human level” systems be sufficiently controlled to get enough useful work?’, I think conditioned on aligned-enough ~human-level automated alignment RAs, the answers to the above are very likely yes, because it should be possible to get a very large amount of work out of those systems even in a very brief amount of time—e.g. a couple of months (feasible with e.g. a coordinated pause, or even with a sufficient lead). See e.g. slides 9, 10 of the above presentation (and I’ll note that this argument isn’t new, it’s been made in variously similar forms by e.g. Ajeya Cotra, Lukas Finnveden, Jacob Steinhardt).
I’m generally reasonably optimistic about using human level-ish systems to do a ton of useful work while simultaneously avoiding most risk from these systems. But, I think this requires substantial effort and won’t clearly go well by default.
Have you had any p(doom) updates since then or is it still around 5%?
Mostly the same, perhaps a minor positive update on the technical side (basically, from systems getting somewhat stronger—so e.g. closer to automating AI safety research—while still not showing very dangerous capabilities, like ASL-3, prerequisites to scheming, etc.). My views are even more uncertain / unstable on the governance side though, which probably makes my overall p(doom) (including e.g. stable totalitarianism, s-risks, etc.) more like 20% than 5% (I was probably mostly intuitively thinking of extinction risk only when giving the 5% figure a year ago; overall my median probably hasn’t changed much, but I have more variance, coming from the governance side).
If it’s not a big ask, I’d really like to know your views on more of a control-by-power-hungry-humans side of AI risk.
For example, the first company to create intent-aligned AGI would be wielding incredible power over the rest of us. I don’t think I could trust any of the current leading AI labs to use that power fairly. I don’t think this lab would voluntarily decide to give up control over it either (intuitively, it would take quite something for anyone to give up such a source of power). Is there anything that can be done to prevent such a scenario?
I’m very uncertain and feel somewhat out of depth on this. I do have quite some hope though from arguments like those in https://aiprospects.substack.com/p/paretotopian-goal-alignment.