Yeah it is plausible for the “bad first gets it first” being more important than I’m currently treating it as.
(The problem that “ignore the Bad Guy problem” is trying to solve is “seems like people are basically only capable of thinking about the Bad Guy problem”, or more specifically “people can’t think about illegible problems, and the bad guy problem is legible AND also we separately have a major bias towards thinking about it”. And, idk, just trying to pump against that.
I think a motivation for early CEV / Friendly AI work was to have a target that was clearly good for all the major projects to be working towards to reduce the need to worry about the Bad Guy problem. But, I think even back in the day probably something-like-corrigibility was still a necessary stepping stone? (Not sure what OG Eliezer/MIRI were thinking)
This suggests that we should focus on sane-institutions/governance before even trying to solve alignment. It’s probably necessary for succeeding at it quickly, too.
It is a nice thing that this just seems robustly good. Currently basically I am focused working on projects that are specifically about persuading people about the x-risk problem directly, as opposed to projects trying to go about things in a more “make civilization broadly sane” way. The former seems very fraught, but also seems more like it’ll actually work in time.
If you haver more thoughts on any of this I’m interested.
That intuition is there for a reason. We’re spoiled having grown up in a liberal order within which this risk is mostly overblown. However, ASI is clearly powerful enough to unilaterally over turn any such liberal order (or whatever’s left of it), and puts us into a realm which is even worse than the ancestral environment in terms of how changeable power hierarchies are, and in how bad things can get if you’re at the bottom.
Corrigibility and CEV are trying to solve separate problems? Not sure what your point is here; agreed on that being one of the major points of CEV.
Persuading people about x-risk enough to stop AI capability gains seems like the current best lever to me too.
I think where we disagree is that I do not think that we should immediately jump into alignment when/if that succeeds, but need to focus on good governance and institutions first (and probably worth spending some effort trying to lay the groundwork now, especially since this seems like an especially high-leverage moment in history for making such changes). I have some thoughts on this too if you want to move to DMs.
Corrigibility and CEV are trying to solve separate problems? Not sure what your point is here; agreed on that being one of the major points of CEV.
If every country/person was building CEV, it wouldn’t be particularly scary (from a misuse standpoint). Whereas if every country is focused on corrigibility, there will be a phase where unilateral actors can do bad stuff you need to worry about.
Yeah it is plausible for the “bad first gets it first” being more important than I’m currently treating it as.
(The problem that “ignore the Bad Guy problem” is trying to solve is “seems like people are basically only capable of thinking about the Bad Guy problem”, or more specifically “people can’t think about illegible problems, and the bad guy problem is legible AND also we separately have a major bias towards thinking about it”. And, idk, just trying to pump against that.
I think a motivation for early CEV / Friendly AI work was to have a target that was clearly good for all the major projects to be working towards to reduce the need to worry about the Bad Guy problem. But, I think even back in the day probably something-like-corrigibility was still a necessary stepping stone? (Not sure what OG Eliezer/MIRI were thinking)
It is a nice thing that this just seems robustly good. Currently basically I am focused working on projects that are specifically about persuading people about the x-risk problem directly, as opposed to projects trying to go about things in a more “make civilization broadly sane” way. The former seems very fraught, but also seems more like it’ll actually work in time.
If you haver more thoughts on any of this I’m interested.
That intuition is there for a reason. We’re spoiled having grown up in a liberal order within which this risk is mostly overblown. However, ASI is clearly powerful enough to unilaterally over turn any such liberal order (or whatever’s left of it), and puts us into a realm which is even worse than the ancestral environment in terms of how changeable power hierarchies are, and in how bad things can get if you’re at the bottom.
Corrigibility and CEV are trying to solve separate problems? Not sure what your point is here; agreed on that being one of the major points of CEV.
Persuading people about x-risk enough to stop AI capability gains seems like the current best lever to me too.
I think where we disagree is that I do not think that we should immediately jump into alignment when/if that succeeds, but need to focus on good governance and institutions first (and probably worth spending some effort trying to lay the groundwork now, especially since this seems like an especially high-leverage moment in history for making such changes). I have some thoughts on this too if you want to move to DMs.
If every country/person was building CEV, it wouldn’t be particularly scary (from a misuse standpoint). Whereas if every country is focused on corrigibility, there will be a phase where unilateral actors can do bad stuff you need to worry about.