I think I have already answered that: I don’t think anyone is going to deliberately build something they can’t control at all. So the probability of mass extinction depends on creating an uncontrollable superintelligence accidentally—for instance, by rapid recursive self improvement. And RRSI , AKA Foom Doom, is a conjunction of claims, all of which are p<1, so it is not high probability.
I agree that probability mostly depends on accidental AGI. I don’t agree that probability mostly depends on (very) hard takeoff. I believe probability mostly depends on just “AGI being smarter than all of humanity”. If you have a kill-switch or whatever, an AGI without Alignment theory being solved is still “the most dangerous technology with the worst safety and the worst potential to control it”.
So, could you go into more cruxes of your beliefs, more context? (More or less full context of my own beliefs is captured by the previous comment. But I’m ready to provide more if needed.) To provide more context to your beliefs, you could try answering “what’s the worst disaster (below everyone being dead) an AGI is likely to cause” or “what’s the best benefit an AGI is likely to give”. To make sure you aren’t treating an AGI as impotent in negative scenarios and as a messiah in positive scenarios. Or not treating humans as incapable of sinking even a safe non-sentient boat and refusing to vaccinate from viruses.
“the most dangerous technology with the worst safety and the worst potential to control it”.
That doesn’t imply a high probability of mass extinction.
To provide more context to your beliefs, you could try answering “what’s the worst disaster (below everyone being dead) an AGI is likely to cause” or “what’s the best benefit an AGI is likely to give”.
Why ? I’m saying p(doom) is not high. I didn’t mention P(otherstuff). You seem to be motte-and-baileying.
Why ? I’m saying p(doom) is not high. I didn’t mention P(otherstuff).
To be able to argue something (/decide how to go about arguing something), I need to have an idea about your overall beliefs.
That doesn’t imply a high probability of mass extinction.
Could you clarify what your own opinion even is? You seem to agree that rapid self-improvement would mean likely doom. But you aren’t worried about gradual self-improvement or AGI being dangerously smart without much (self-)improvement?
To be able to argue something (/decide how to go about arguing something), I need to have an idea about your overall beliefs.
No, for me to argue something I only need to state the premises relevant to the conclusion, which in this case are:
high probability of existential doom is a complex conjunctive argument
laws of probability.
Logic isn’t holistic.
But you aren’t worried about gradual self-improvement or AGI being dangerously smart without much (self-)improvement?
It’s not black and white. I don’t think they are zero risk, and I don’t think it is Certain Doom, so it’s not what I am talking about. Why are you bringing it up? Do you think there is a simpler argument for Certain Doom?
I believe in likely doom and I don’t think the burden of proof is on “doomers”.
Doom meaning what? It’s obvious that there is some level of risk, but some level of risk isn’t Certain Doom. Certain Doom is an extraordinary claim,and the burden of proof therefore is on (certain) doomers. But you seem to be switching between different definitions.
I agree that probability mostly depends on accidental AGI. I don’t agree that probability mostly depends on (very) hard takeoff. I believe probability mostly depends on just “AGI being smarter than all of humanity”. If you have a kill-switch or whatever, an AGI without Alignment theory being solved is still “the most dangerous technology with the worst safety and the worst potential to control it”.
Saying “the most dangerous technology with the worst safety and the worst potential to control it” doesn’t actually imply a high level of doom (p>9) or a high level of risk (> 90% dead)-- it’s only a relative statement.
To make sure you aren’t treating an AGI as impotent in negative scenarios
My stated argument has nothing to do with the AGI being impotent given all the premises in the doom argument.
Informal logic is more holistic than not, I think, because it relies on implicit assumptions.
It’s not black and white. I don’t think they are zero risk, and I don’t think it is Certain Doom, so it’s not what I am talking about. Why are you bringing it up? Do you think there is a simpler argument for Certain Doom?
Could you proactively describe your opinion? Or re-describe it, by adding relevant details. You seemed to say “if hard takeoff, then likely doom; but hard takeoff is unlikely, because hard takeoff requires a conjunction of things to be true”. I answered that I don’t think hard takeoff is required. You didn’t explain that part of your opinion. Now it seems your opinion is more general (not focused on hard takeoff), but you refuse to clarify it. So, what is the actual opinion I’m supposed to argue with? I won’t try to use every word against you, so feel free to write more.
Doom meaning what? It’s obvious that there is some level of risk, but some level of risk isn’t Certain Doom. Certain Doom is an extraordinary claim,and the burden of proof therefore is on (certain) doomers. But you seem to be switching between different definitions.
I think “AGI is possible” or “AGI can achieve extraordinary things” is the extraordinary claim. The worry about its possible extraordinary danger is natural. Therefore, I think AGI optimists bear the burden of proving that a) likely risk of AGI is bounded by something and b) AGI can’t amplify already existing dangers.
By “likely doom” I mean likely (near-)extinction. “Likely” doesn’t have to be 90%.
Saying “the most dangerous technology with the worst safety and the worst potential to control it” doesn’t actually imply a high level of doom (p>9) or a high level of risk (> 90% dead)-- it’s only a relative statement.
I think it does imply so, modulo “p > 90%”. Here’s a list of the most dangerous phenomena: (L1)
Nuclear warfare. World wars.
An evil and/or suicidal world-leader.
Deadly pandemics.
Crazy ideologies, e.g. fascism. Misinformation. Addictions. People being divided on everything. (Problems of people’s minds.)
And a list of the most dangerous qualities: (L2)
Being superintelligent.
Wanting, planning to kill everyone.
Having a cult-following. Humanity being dependent on you.
Having direct killing power (like a deadly pandemic or a set of atomic bombs).
Multiplicity/simultaneity. E.g. if we had TWO suicidal world-leaders at the same time.
Things from L1 can barely scrap two points from L2, yet they can cause mass disruptions and claim many victims and also trigger each other. Narrow AI could secure three points from the list (narrow superintelligence + cult-following, dependency + multiplicity/simultaneity) — weakly, but potentially better than a powerful human ever could. However, AGI can easily secure three points from L3 in full. Four points, if AGI is developed more than in a single place. And I expect you to grant that general superintelligence presents a special, unpredictable danger.
Given that, I don’t see what should bound the risk from AGI or prevent it from amplifying already existing dangers.
Informal logic is more holistic than not, I think, because it relies on implicit assumptions.
Straining after implicit meanings can cause you to miss explicit ones.
You seemed to say “if hard takeoff, then likely doom; but hard takeoff is unlikely, because hard takeoff requires a conjunction of things to be true”. I answered that I don’t think hard takeoff is required.
I think its needed for the “likely”. Slow takeoff gives humans more time to notice and fix problems, so the likelihood of bad outcomes goes down. Wasn’t that obvious?
By “likely doom” I mean likely (near-)extinction. “Likely” doesn’t have to be 90%.
I do mean >90%. If you mean something else, you are probably talking past me.
[lists of various catastrophes. many of which have nothing to do with AI]
Why are you doing this? I did not say there is zero risk of anything.
Given that, I don’t see what should bound the risk from AGI or prevent it from amplifying already existing dangers.
Are you using “risk” to mean the probability of the outcome , or the impact of the outcome?
[lists of various catastrophes. many of which have nothing to do with AI]
Why are you doing this? I did not say there is zero risk of anything. (...) Are you using “risk” to mean the probability of the outcome , or the impact of the outcome?
My argument is based on comparing the phenomenon of AGI to other dangerous phenomena. The argument is intended to show that bad outcome is likely (if AGI wants to do a bad thing, it can achieve it) and that impact of the outcome can kill most humans.
I think its needed for the “likely”. Slow takeoff gives humans more time to notice and fix problems, so the likelihood of bad outcomes goes down. Wasn’t that obvious?
To me the likelihood doesn’t go down enough (to the tolerable levels).
The argument is intended to show that bad outcome is likely (if AGI wants to do a bad thing, it can achieve it) and that impact of the outcome can kill most humans.
But I am not saying that the doom is unlikely given superintelligence and misalignment, I am saying the argument that gets there—superintelligence + misalignment—is highly conjunctive. The final step., the execution as it were, is no highly conjunctive.
To me the likelihood doesn’t go down enough (to the tolerable levels).
I’ve confused you with people who deny that a misaligned AGI is even capable of killing most humans. Glad to be wrong about you.
But I am not saying that the doom is unlikely given superintelligence and misalignment, I am saying the argument that gets there—superintelligence + misalignment—is highly conjunctive. The final step., the execution as it were, is no highly conjunctive.
But I don’t agree that it’s highly conjunctive.
If AGI is possible, then its superintelligence is a given. Superintelligence isn’t given only if AGI stops at human level of intelligence + can’t think much faster than humans + can’t integrate abilities of narrow AIs naturally. (I.e. if AGI is basically just a simulation of a human and has no natural advantages.) I think most people don’t believe in such AGI.
I don’t think misalignment is highly conjunctive.
I agree that hard takeoff is highly conjunctive, but why is “superintelligence + misalignment” highly conjunctive?
I think its needed for the “likely”. Slow takeoff gives humans more time to notice and fix problems, so the likelihood of bad outcomes goes down. Wasn’t that obvious?
If AGI is AGI, there won’t be any problems to notice. That’s why I think probability doesn’t decrease enough.
...
I hope that Alignment is much easier to solve than it seems. But I’m not sure (a) how much weight to put into my own opinion and (b) how much my probability of being right decreases the risk.
why is “superintelligence + misalignment” highly conjunctive?
In the sense that matters, it needs to be fast, surreptitious, incorrigible, etc.
What opinion are you currently arguing? That the risk is below 90% or something else? What counts as “high probability” for you?
Incorrigible misalignment is at least one extra assumption.
I think “corrigible misalignment” doesn’t exist, corrigble AGI is already aligned (unless AGI can kill everyone very fast by pure accident). But we can have differently defined terms. To avoid confusion, please give examples of scenarios you’re thinking about. The examples can be very abstract.
If AGI is AGI, there won’t be any problems to notice
Huh?
I mean, you haven’t explained what “problems” you’re talking about. AGI suddenly declaring “I think killing humans is good, actually” after looking aligned for 1 year? If you didn’t understand my response, a more respectful answer than “Huh?” would be to clarify your own statement. What noticeable problems did you talk about in the first place?
Please, proactively describe your opinions. Is it too hard to do? Conversation takes two people.
I think corrigibility is “AGI doesn’t try to kill everyone and doesn’t try to prevent/manipulate its modification”. Therefore, in some global sense such AGI is aligned at every point in time. Even if it causes a local disaster.
Over 90% , as I said
Then I agree, thank you for re-explaining your opinion. But I think other probabilities count as high too.
To me, the ingredients of danger (but not “> 90%”) are those:
1st. AGI can be built without Alignment/Interpretability being solved. If that’s true, building AGI slowly or being able to fix visible problems may not matter that much.
2nd and 3rd. AGI can have planning ability. AGI can come up with the goal pursuing which would kill everyone.
2nd (alternative). AIs and AGIs can kill most humans without real intention of doing so, by destabilizing the world/amplifying already existing risks.
If I remember correctly, Eliezer also believes in “intelligence explosion” (AGI won’t be just smarter than humanity, but many-many times smarter than humanity: like humanity is smarter than ants/rats/chimps). Haven’t you forgot to add that assumption?
I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term… “Correctable”. If an AI were fully aligned, there would be no need to correct it.
Yes, there are dangers other than a high probability of killing almost every one. I didn’t say there arent. But it’s motte and baileying to fall back to “what about these lesser risks”.
If I remember correctly, Eliezer also believes in “intelligence explosion” (AGI won’t be just smarter than humanity, but many-many times
Yes, and that’s the specific argument I am addressing,not AI risk in general.
Except that if it’s many many times smarter, it’s ASI, not AGI.
I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term… “Correctable”. If an AI were fully aligned, there would be no need to correct it.
Perhaps I should make a better argument:
It’s possible that AGI is correctable, but (a) we don’t know what needs to be corrected or (b) we cause new, less noticeable problems, while correcting AGI.
So, I think there’s not two assumptions “alignment/interpretability is not solved + AGI is incorrigible”, but only one — “alignment/interpretability is not solved”. (A strong version of corrigibility counts as alignment/interpretability being solved.)
Yes, and that’s the specific argument I am addressing,not AI risk in general.
Except that if it’s many many times smarter, it’s ASI, not AGI.
I disagree that “doom” and “AGI going ASI very fast” are certain (> 90%) too.
I agree that probability mostly depends on accidental AGI. I don’t agree that probability mostly depends on (very) hard takeoff. I believe probability mostly depends on just “AGI being smarter than all of humanity”. If you have a kill-switch or whatever, an AGI without Alignment theory being solved is still “the most dangerous technology with the worst safety and the worst potential to control it”.
So, could you go into more cruxes of your beliefs, more context? (More or less full context of my own beliefs is captured by the previous comment. But I’m ready to provide more if needed.) To provide more context to your beliefs, you could try answering “what’s the worst disaster (below everyone being dead) an AGI is likely to cause” or “what’s the best benefit an AGI is likely to give”. To make sure you aren’t treating an AGI as impotent in negative scenarios and as a messiah in positive scenarios. Or not treating humans as incapable of sinking even a safe non-sentient boat and refusing to vaccinate from viruses.
That doesn’t imply a high probability of mass extinction.
Why ? I’m saying p(doom) is not high. I didn’t mention P(otherstuff). You seem to be motte-and-baileying.
To be able to argue something (/decide how to go about arguing something), I need to have an idea about your overall beliefs.
Could you clarify what your own opinion even is? You seem to agree that rapid self-improvement would mean likely doom. But you aren’t worried about gradual self-improvement or AGI being dangerously smart without much (self-)improvement?
No, for me to argue something I only need to state the premises relevant to the conclusion, which in this case are:
high probability of existential doom is a complex conjunctive argument
laws of probability.
Logic isn’t holistic.
It’s not black and white. I don’t think they are zero risk, and I don’t think it is Certain Doom, so it’s not what I am talking about. Why are you bringing it up? Do you think there is a simpler argument for Certain Doom?
Doom meaning what? It’s obvious that there is some level of risk, but some level of risk isn’t Certain Doom. Certain Doom is an extraordinary claim,and the burden of proof therefore is on (certain) doomers. But you seem to be switching between different definitions.
Saying “the most dangerous technology with the worst safety and the worst potential to control it” doesn’t actually imply a high level of doom (p>9) or a high level of risk (> 90% dead)-- it’s only a relative statement.
My stated argument has nothing to do with the AGI being impotent given all the premises in the doom argument.
Informal logic is more holistic than not, I think, because it relies on implicit assumptions.
Could you proactively describe your opinion? Or re-describe it, by adding relevant details. You seemed to say “if hard takeoff, then likely doom; but hard takeoff is unlikely, because hard takeoff requires a conjunction of things to be true”. I answered that I don’t think hard takeoff is required. You didn’t explain that part of your opinion. Now it seems your opinion is more general (not focused on hard takeoff), but you refuse to clarify it. So, what is the actual opinion I’m supposed to argue with? I won’t try to use every word against you, so feel free to write more.
I think “AGI is possible” or “AGI can achieve extraordinary things” is the extraordinary claim. The worry about its possible extraordinary danger is natural. Therefore, I think AGI optimists bear the burden of proving that a) likely risk of AGI is bounded by something and b) AGI can’t amplify already existing dangers.
By “likely doom” I mean likely (near-)extinction. “Likely” doesn’t have to be 90%.
I think it does imply so, modulo “p > 90%”. Here’s a list of the most dangerous phenomena: (L1)
Nuclear warfare. World wars.
An evil and/or suicidal world-leader.
Deadly pandemics.
Crazy ideologies, e.g. fascism. Misinformation. Addictions. People being divided on everything. (Problems of people’s minds.)
And a list of the most dangerous qualities: (L2)
Being superintelligent.
Wanting, planning to kill everyone.
Having a cult-following. Humanity being dependent on you.
Having direct killing power (like a deadly pandemic or a set of atomic bombs).
Multiplicity/simultaneity. E.g. if we had TWO suicidal world-leaders at the same time.
Things from L1 can barely scrap two points from L2, yet they can cause mass disruptions and claim many victims and also trigger each other. Narrow AI could secure three points from the list (narrow superintelligence + cult-following, dependency + multiplicity/simultaneity) — weakly, but potentially better than a powerful human ever could. However, AGI can easily secure three points from L3 in full. Four points, if AGI is developed more than in a single place. And I expect you to grant that general superintelligence presents a special, unpredictable danger.
Given that, I don’t see what should bound the risk from AGI or prevent it from amplifying already existing dangers.
Straining after implicit meanings can cause you to miss explicit ones.
I think its needed for the “likely”. Slow takeoff gives humans more time to notice and fix problems, so the likelihood of bad outcomes goes down. Wasn’t that obvious?
I do mean >90%. If you mean something else, you are probably talking past me.
Why are you doing this? I did not say there is zero risk of anything.
Are you using “risk” to mean the probability of the outcome , or the impact of the outcome?
Yes, I probably mean something other than “>90%”.
My argument is based on comparing the phenomenon of AGI to other dangerous phenomena. The argument is intended to show that bad outcome is likely (if AGI wants to do a bad thing, it can achieve it) and that impact of the outcome can kill most humans.
To me the likelihood doesn’t go down enough (to the tolerable levels).
But I am not saying that the doom is unlikely given superintelligence and misalignment, I am saying the argument that gets there—superintelligence + misalignment—is highly conjunctive. The final step., the execution as it were, is no highly conjunctive.
Why not?
I’ve confused you with people who deny that a misaligned AGI is even capable of killing most humans. Glad to be wrong about you.
But I don’t agree that it’s highly conjunctive.
If AGI is possible, then its superintelligence is a given. Superintelligence isn’t given only if AGI stops at human level of intelligence + can’t think much faster than humans + can’t integrate abilities of narrow AIs naturally. (I.e. if AGI is basically just a simulation of a human and has no natural advantages.) I think most people don’t believe in such AGI.
I don’t think misalignment is highly conjunctive.
I agree that hard takeoff is highly conjunctive, but why is “superintelligence + misalignment” highly conjunctive?
If AGI is AGI, there won’t be any problems to notice. That’s why I think probability doesn’t decrease enough.
...
I hope that Alignment is much easier to solve than it seems. But I’m not sure (a) how much weight to put into my own opinion and (b) how much my probability of being right decreases the risk.
It needs to happen quickly or surreptitiously to be a problem.
Incorrigible misalignment is at least one extra assumption.
In the sense that matters, it needs to be fast, surreptitious, incorrigible, etc.
Huh?
What opinion are you currently arguing? That the risk is below 90% or something else? What counts as “high probability” for you?
I think “corrigible misalignment” doesn’t exist, corrigble AGI is already aligned (unless AGI can kill everyone very fast by pure accident). But we can have differently defined terms. To avoid confusion, please give examples of scenarios you’re thinking about. The examples can be very abstract.
I mean, you haven’t explained what “problems” you’re talking about. AGI suddenly declaring “I think killing humans is good, actually” after looking aligned for 1 year? If you didn’t understand my response, a more respectful answer than “Huh?” would be to clarify your own statement. What noticeable problems did you talk about in the first place?
Please, proactively describe your opinions. Is it too hard to do? Conversation takes two people.
Over 90% , as I said
It’s not aligned at every possible point in time.
I’m, talking about the Foom scenario that has been discussed endlessly here.
The complete argument for Foom Doom that:-
the AI will have goals/values in the first place (it wont be a tool like GPT*),
the values will be misaligned, however subtly, to be unfavorable to humanity
that the misalignment cannot be detected or corrected
that the AI can achieve value stability under self modification
That the AI will self modify in way too fast to stop
and that most misaligned values in the resulting ASI are highly dangerous.
I think corrigibility is “AGI doesn’t try to kill everyone and doesn’t try to prevent/manipulate its modification”. Therefore, in some global sense such AGI is aligned at every point in time. Even if it causes a local disaster.
Then I agree, thank you for re-explaining your opinion. But I think other probabilities count as high too.
To me, the ingredients of danger (but not “> 90%”) are those:
1st. AGI can be built without Alignment/Interpretability being solved. If that’s true, building AGI slowly or being able to fix visible problems may not matter that much.
2nd and 3rd. AGI can have planning ability. AGI can come up with the goal pursuing which would kill everyone.
2nd (alternative). AIs and AGIs can kill most humans without real intention of doing so, by destabilizing the world/amplifying already existing risks.
If I remember correctly, Eliezer also believes in “intelligence explosion” (AGI won’t be just smarter than humanity, but many-many times smarter than humanity: like humanity is smarter than ants/rats/chimps). Haven’t you forgot to add that assumption?
I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term… “Correctable”. If an AI were fully aligned, there would be no need to correct it.
Yes, there are dangers other than a high probability of killing almost every one. I didn’t say there arent. But it’s motte and baileying to fall back to “what about these lesser risks”.
Yes, and that’s the specific argument I am addressing,not AI risk in general.
Except that if it’s many many times smarter, it’s ASI, not AGI.
Perhaps I should make a better argument:
It’s possible that AGI is correctable, but (a) we don’t know what needs to be corrected or (b) we cause new, less noticeable problems, while correcting AGI.
So, I think there’s not two assumptions “alignment/interpretability is not solved + AGI is incorrigible”, but only one — “alignment/interpretability is not solved”. (A strong version of corrigibility counts as alignment/interpretability being solved.)
I disagree that “doom” and “AGI going ASI very fast” are certain (> 90%) too.