Re “What is the minimum necessary and sufficient policy that you think would prevent extinction?”
This is phrased in a way that implies preventing extinction is a binary, when in reality for any given policy there is some probability of extinction. I actually don’t know what Eliezer means here, is he asking for a policy that leads to 50%, 10%, 1%, or 0% P(extinction)?
(I think that it’s common for AI safety people to talk too much about totally quashing risks rather than reducing them, in a way that leads them into unproductive lines of reasoning.)
I think that if you want to work on AI takeover prevention, you should probably have some models of the situation that give you a rough prediction for P(AI takeover). For example, I’m constantly thinking in terms of the risks conditional on scheming arising, scheming not arising, a leading AI company being responsible and having three years of lead, etc.
As another note: my biggest problem with talking about P(doom) is that it’s ambiguous between various different events like AI takeover, human extinction, human extinction due to AI takeover, the loss of all value in the future, and it seems pretty important to separate these out because I think the numbers are pretty different.
(I think that it’s common for AI safety people to talk too much about totally quashing risks rather than reducing them, in a way that leads them into unproductive lines of reasoning.)
Especially because we need to take into account non-AI X-risks. So maybe “What is the AI policy that would most reduce X-risks overall?” For people with lower P(X-risk|AGI) (if you don’t like P(doom)), longer timelines, and/or more worried about other X-risks, the answer may be do nothing or even accelerate AI (harkening back to Yudkowsky’s “Artificial Intelligence as a Positive and Negative Factor in Global Risk”.
“p(Doom) if we build superintelligence soon under present conditions using present techniques”
This has the issue that superintelligence won’t be developed using present techniques, and a lot of the question is about whether it’ll be okay to develop superintelligence using the different techniques that will be used at the time, and so this number will potentially drastically overestimate the risk.
I of course have no problem with asking people questions about situations where the risk is higher or lower than you believe actual risk to be. But I think that focusing on this question will lead to repeated frustrating interactions where one person quotes the answer to the stated question, and then someone else conflates that with their answer to what will happen if we continue on the current trajectory, which for some people is extremely different and for some people isn’t.
I guess same question: “do you think there is a better question here one should ask instead? or do you think people should really stop trying to ask questions like this in a systematic way?”.
(Mostly I don’t particularly think there should particularly should be a common pDoom-ish question, but insofar as I was object-level answering your question here, an answer I think feels right-ish is “ensure AI x-risk is no higher than around the background risk of nuclear war”)
Seems somewhat surprising for that to be what Eliezer had in mind, given the many episodes of people saying stuff like “MIRI’s perspective makes sense if we wanted to guarantee that there wasn’t any risk” and Eliezer saying stuff like “no I’d take any solution that gave <=50% doom”. (At least that’s what I remember, though I can’t find sources now.)
I do agree that’s enough evidence to be confused and dissatisfied with my guess. I’m basing my guess more on the phrasing of question, which sounds more like it’s just meaning to be “what a reasonable person would think ‘prevent exinction’ would mean”, and, the fact that Eliezer said that-sort-of-thing in another context doesn’t necessarily mean it’s what he meant here.”
I almost included a somewhat more general disclaimer of “also, this question is much more opinionated on framing”, and then didn’t end up including in the original post. But, I just edited that in.
Broadly, do you think there should be any question that’s filling the niche that p(Doom) is filling right now? (“no” seems super reasonable, but wondering if there is a question you actually think is useful for some kind of barometer-of-consensus that carves reality at the joints?)
I think “What’s your P(AI takeover)” is a totally reasonable question and don’t understand Eliezer’s problem with it (or his problem with asking about timelines). Especially given that he is actually very (and IMO overconfidently) opinionated on this topic! I think it would be crazy to never give bottom-line estimates for P(AI takeover) when talking about AI takeover risk.
(I think talking about timelines is more important because it’s more directly decision relevant (whereas P(AI takeover) is downstream of the decision-relevant variables).)
In practice the numbers I think are valuable to talk about are mostly P(some bad outcome|some setting of the latent facts about misalignment, and some setting of what risk mitigation strategy is employed). Like, I’m constantly thinking about whether to try to intervene on worlds that are more like Plan A worlds or more like Plan D worlds, and to do this you obviously need to think about the effect on P(AI takeover) of your actions in those worlds.
This comment reads as kind of unhelpfully intellectual to me? “What would it take to fix the door?” “Well it would involve replacing the broken hinge.” “Ah, that’s framing the door being fixed as a binary, really you should claim that replacing the broken hinge has a certain probabilityof fixing the door.”
I get that in life we don’t have certainties, and many good & relevant policies mitigate different parts of the risk, but I think the point is to move toward a mechanistic model of what the problem is and what its causes are, and often talking about probabilities isn’t appropriate for that and essentially just adds cognitive overhead.
“What is a policy that could be implemented by a particular government, or an international treaty signed by many governments, such that you would no longer think that working on preventing extinction-or-similar from AGI was the top priority (or close to the top priority) for our civilization?”
Another phrasing: “such that you would no longer be concerned that, this century, humanity would essentially lose control over the future by getting outcompeted by an AGI with totally different values?”
I think I could understand you not getting what the question means if, in your model of the future, all routes are crazy and pass through AGI takeover of some sort, and our full-time job regardless of what happens is to navigate that. Like, there isn’t cleanly a ‘safe’ world, all the worlds involve our full-time job being dealing with alignment problems of AGIs.
I’m surprised that’s the question. I would guess that’s not what Eliezer means because he says Dath Ilan is responding sufficiently to AI risk but also hints at Dath Ilan still spending a significant fraction of its resources on AI safety (I’ve only read a fraction of the work here, maybe wrong). I have a background belief that the largest problems don’t change that much, and it’s rare for problems to go from #1 problem to not-in-top-10 problems, and that most things have diminishing returns such that it’s not worthwhile to solve them so thoroughly. An alternative definition that’s spiritually similar that I like more is; “What policy could governments implement such that the improving the AI x-risk policy would now not be the #1 priority, if the governments were wise.”. This isolates AI / puts it in context of other global problems, such that the AI solution doesn’t need to prevent governments from changing their minds over the next 100 years or whatever needs to happen for the next 100 years to go well.
Fixing doors is so vastly easier than predicting the future that analogies and intuitions don’t transfer.
Compare someone asking in 1875, 1920, 1945, or 2025, “What is the minimum necessary and sufficient policy that you think would prevent Germany invading France in the next 50 years?”. The problem is non-binary, there are no guarantees, and even definitions are treacherous. I wouldn’t ask the question that way
Instead I might ask “what policies best support peace between France and Germany, and how?”. So we can talk mechanistically without the distraction of “minimum”, “necessary”, “sufficient”, and “prevent”.
Separately, I do not want anyone to be thinking of minimum policies here. There is no virtue in doing the minimum necessary to prevent extinction.
Re “What is the minimum necessary and sufficient policy that you think would prevent extinction?”
This is phrased in a way that implies preventing extinction is a binary, when in reality for any given policy there is some probability of extinction. I actually don’t know what Eliezer means here, is he asking for a policy that leads to 50%, 10%, 1%, or 0% P(extinction)?
(I think that it’s common for AI safety people to talk too much about totally quashing risks rather than reducing them, in a way that leads them into unproductive lines of reasoning.)
“Don’t tell me what it is. What would change your p(doom)?”
I think that if you want to work on AI takeover prevention, you should probably have some models of the situation that give you a rough prediction for P(AI takeover). For example, I’m constantly thinking in terms of the risks conditional on scheming arising, scheming not arising, a leading AI company being responsible and having three years of lead, etc.
As another note: my biggest problem with talking about P(doom) is that it’s ambiguous between various different events like AI takeover, human extinction, human extinction due to AI takeover, the loss of all value in the future, and it seems pretty important to separate these out because I think the numbers are pretty different.
Especially because we need to take into account non-AI X-risks. So maybe “What is the AI policy that would most reduce X-risks overall?” For people with lower P(X-risk|AGI) (if you don’t like P(doom)), longer timelines, and/or more worried about other X-risks, the answer may be do nothing or even accelerate AI (harkening back to Yudkowsky’s “Artificial Intelligence as a Positive and Negative Factor in Global Risk”.
This has the issue that superintelligence won’t be developed using present techniques, and a lot of the question is about whether it’ll be okay to develop superintelligence using the different techniques that will be used at the time, and so this number will potentially drastically overestimate the risk.
I of course have no problem with asking people questions about situations where the risk is higher or lower than you believe actual risk to be. But I think that focusing on this question will lead to repeated frustrating interactions where one person quotes the answer to the stated question, and then someone else conflates that with their answer to what will happen if we continue on the current trajectory, which for some people is extremely different and for some people isn’t.
I guess same question: “do you think there is a better question here one should ask instead? or do you think people should really stop trying to ask questions like this in a systematic way?”.
(Mostly I don’t particularly think there should particularly should be a common pDoom-ish question, but insofar as I was object-level answering your question here, an answer I think feels right-ish is “ensure AI x-risk is no higher than around the background risk of nuclear war”)
Seems somewhat surprising for that to be what Eliezer had in mind, given the many episodes of people saying stuff like “MIRI’s perspective makes sense if we wanted to guarantee that there wasn’t any risk” and Eliezer saying stuff like “no I’d take any solution that gave <=50% doom”. (At least that’s what I remember, though I can’t find sources now.)
I do agree that’s enough evidence to be confused and dissatisfied with my guess. I’m basing my guess more on the phrasing of question, which sounds more like it’s just meaning to be “what a reasonable person would think ‘prevent exinction’ would mean”, and, the fact that Eliezer said that-sort-of-thing in another context doesn’t necessarily mean it’s what he meant here.”
I almost included a somewhat more general disclaimer of “also, this question is much more opinionated on framing”, and then didn’t end up including in the original post. But, I just edited that in.
Broadly, do you think there should be any question that’s filling the niche that p(Doom) is filling right now? (“no” seems super reasonable, but wondering if there is a question you actually think is useful for some kind of barometer-of-consensus that carves reality at the joints?)
I think “What’s your P(AI takeover)” is a totally reasonable question and don’t understand Eliezer’s problem with it (or his problem with asking about timelines). Especially given that he is actually very (and IMO overconfidently) opinionated on this topic! I think it would be crazy to never give bottom-line estimates for P(AI takeover) when talking about AI takeover risk.
(I think talking about timelines is more important because it’s more directly decision relevant (whereas P(AI takeover) is downstream of the decision-relevant variables).)
In practice the numbers I think are valuable to talk about are mostly P(some bad outcome|some setting of the latent facts about misalignment, and some setting of what risk mitigation strategy is employed). Like, I’m constantly thinking about whether to try to intervene on worlds that are more like Plan A worlds or more like Plan D worlds, and to do this you obviously need to think about the effect on P(AI takeover) of your actions in those worlds.
This comment reads as kind of unhelpfully intellectual to me? “What would it take to fix the door?” “Well it would involve replacing the broken hinge.” “Ah, that’s framing the door being fixed as a binary, really you should claim that replacing the broken hinge has a certain probability of fixing the door.”
I get that in life we don’t have certainties, and many good & relevant policies mitigate different parts of the risk, but I think the point is to move toward a mechanistic model of what the problem is and what its causes are, and often talking about probabilities isn’t appropriate for that and essentially just adds cognitive overhead.
I just have no idea what Eliezer’s question even means. What do you think it means?
“What is a policy that could be implemented by a particular government, or an international treaty signed by many governments, such that you would no longer think that working on preventing extinction-or-similar from AGI was the top priority (or close to the top priority) for our civilization?”
Another phrasing: “such that you would no longer be concerned that, this century, humanity would essentially lose control over the future by getting outcompeted by an AGI with totally different values?”
I think I could understand you not getting what the question means if, in your model of the future, all routes are crazy and pass through AGI takeover of some sort, and our full-time job regardless of what happens is to navigate that. Like, there isn’t cleanly a ‘safe’ world, all the worlds involve our full-time job being dealing with alignment problems of AGIs.
(Edited down to cut extraneous text.)
I’m surprised that’s the question. I would guess that’s not what Eliezer means because he says Dath Ilan is responding sufficiently to AI risk but also hints at Dath Ilan still spending a significant fraction of its resources on AI safety (I’ve only read a fraction of the work here, maybe wrong). I have a background belief that the largest problems don’t change that much, and it’s rare for problems to go from #1 problem to not-in-top-10 problems, and that most things have diminishing returns such that it’s not worthwhile to solve them so thoroughly. An alternative definition that’s spiritually similar that I like more is; “What policy could governments implement such that the improving the AI x-risk policy would now not be the #1 priority, if the governments were wise.”. This isolates AI / puts it in context of other global problems, such that the AI solution doesn’t need to prevent governments from changing their minds over the next 100 years or whatever needs to happen for the next 100 years to go well.
Fixing doors is so vastly easier than predicting the future that analogies and intuitions don’t transfer.
Compare someone asking in 1875, 1920, 1945, or 2025, “What is the minimum necessary and sufficient policy that you think would prevent Germany invading France in the next 50 years?”. The problem is non-binary, there are no guarantees, and even definitions are treacherous. I wouldn’t ask the question that way
Instead I might ask “what policies best support peace between France and Germany, and how?”. So we can talk mechanistically without the distraction of “minimum”, “necessary”, “sufficient”, and “prevent”.
Separately, I do not want anyone to be thinking of minimum policies here. There is no virtue in doing the minimum necessary to prevent extinction.