I think what you’ve accomplished here is eating away at the edges of the AGI x-risk argument. I think you argue successfully for longer timelines and a lower P(doom). Those timelines and estimates are shared by many of us who are still very worried about AGI x-risk.
Your arguments don’t seem to address the core of the AGI x-risk argument.
You’ve argued against many particular doom scenarios, but you have not presented a scenario that includes our long term survival. Sure, if alignment turns out to be easy we’ll survive; but I only see strong arguments that it’s not impossible. I agree, and I think we have a chance; but it’s just a chance, not success by default.
I like this statement of the AGI x-risk arguments. It’s my attempt to put the standard arguments of instrumental convergence and capabilities in common language:
Something smarter than you will wind up doing whatever it wants. If it wants something even a little different than you want, you’re not going to get your way. If it doesn’t care about you even a little, and it continues to become more capable faster than you do, you’ll cease being useful and will ultimately wind up dead. Whether you were eliminated because you were deemed dangerous, or simply outcompeted doesn’t matter. It could take a long time, but if you miss the window of having control over the situation, you’ll still wind up dead.
This could of course be expanded on ad infinitum, but that’s the core argument, and nothing you’ve said (on my quick read, sorry if I’ve missed it) addresses any of those points.
There were (I’ve been told) nine other humanoid species. They are all dead. The baseline outcome of creating something smarter than you is that you are outcompeted and ultimately die out. The baseline of assuming survival seems based on optimism, not reason.
So I agree that P(doom) is less than 99%, but I think the risk is still very much high enough to devote way more resources and caution than we are now.
Some more specific points:
Fanatical maximization isn’t necessary for doom. An agent with any goal still invokes instrumental convergence. It can be as slow, lazy, and incompentent as you like. The only question is if it can outcompete you in the long run.
Humans are somewhat safe (but think about the nuclear standoff; I don’t think we’re even self-aligned in the medium term). But there are two reasons: humans can’t self-improve very well; AGI has many more routes to recursive self-improvement. On the roughly level human playing field, cooperation is the rational policy. In a scenario where you can focus on self-improvement, cooperation doesn’t make sense long-term. Second, humans have a great deal of evolution to make our instincts guide us toward cooperation. AGI will not have that unless we build it in, and we have only very vague ideas of how to do that.
Loose initial alignment is way easier than a long-term stable alignment. Existing alignment work barely addresses long-term stability.
But there are two reasons: humans can’t self-improve very well; … Second, humans have a great deal of evolution to make our instincts guide us toward cooperation.
In general, my intuition about “comparing to humans” is the following:
the abilities that humans have, can be replicated
the limitations that humans have, may be irrelevant on a different architecture
Which probably sounds unfair, like I am arbitrarily and inconsistently choosing “it will/won’t be like humans” depending on what benefits the doomer side at given parts of the argument. Yes, it will be like humans, where the humans are strong (can think, can do things in real world, communicate). No, it won’t be like humans, where the humans are weak (mortal, get tired or distracted, not aligned with each other, bad at multitasking).
It probably doesn’t help that most people start with the opposite intuition:
humans are special; consciousness / thinking / creativity is mysterious and cannot be replicated
human limitations are the laws of nature (many of them also apply to the ancient Greek gods)
So, not only do I contradict the usual intuition, but I also do it inconsistently: “Creating a machine like a human is possible, except it won’t really be like a human.” I shouldn’t have it both ways at the same time!
To steelman the criticism:
every architecture comes with certain trade-offs; they may be different, but not non-existent
the practical problems of AI building a new technology shouldn’t be completely ignored; the sci-fi factories may require so much action in real world that the AI could only build them after conquering the world (so they cannot be used as an explanation for how the AI will conquer the world)
I don’t have a short and convincing answer here, it just seems to me that even relatively small changes to humans themselves might produce something dramatically stronger. (But maybe I underestimate the complexity of such changes.) Imagine a human with IQ 200 who can think 100 times faster, never gets tired or distracted; imagine hundred such humans, perfectly loyal to their leader, willing to die for the cause… if currently dictators can take over countries (which probably also involves a lot of luck), such group should be able to do it, too (but more reliably). A great advantage over a human wannabe dictator would be their capacity to multi-task; they could try infiltrating and taking over all powerful groups at the same time.
(I am not saying that this is how AI will literally do it. I am saying that things hypothetically much stronger than humans—including intellectually—are quite easy to imagine. Just like a human with a sword can overpower five humans, and a human with a machine gun can overpower hundred humans, the AI may be able to overpower billions of humans without hitting the limits given by the laws of physics. Perhaps even if the humans have already taken precautions based on the previous 99 AIs that started their attack prematurely.)
Hey, thanks for the kind response! I agree that this analysis is mostly focused on arguing against the “imminent certain doom” model of AI risk, and that longer term dynamics are much harder to predict. I think I’ll jump straight to addressing your core point here:
Something smarter than you will wind up doing whatever it wants. If it wants something even a little different than you want, you’re not going to get your way. If it doesn’t care about you even a little, and it continues to become more capable faster than you do, you’ll cease being useful and will ultimately wind up dead. Whether you were eliminated because you were deemed dangerous, or simply outcompeted doesn’t matter. It could take a long time, but if you miss the window of having control over the situation, you’ll still wind up dead.
I think this a good argument, and well written, but I don’t really agree with it.
The first objection is to the idea that victory by a smarter party is inevitable. The standard example is that it’s fairly easy for a gorilla to beat Einstein in a cage match. In general, the smarter party will win long term, but only if given the long-term chance to compete. In a short-term battle, the side with the overwhelming resource advantage will generally win. The neanderthal extinction is not very analogous here. If the neanderthals started out with control of the entire planet, the ability to easily wipe out the human race, and the realisation that humans would eventually outcompete them, I don’t think human’s superior intelligence would count for much.
I don’t foresee humans being willing to give up control anytime soon. I think they will destroy any AI that comes close. Whether AI can seize control eventually is an open question (although in the short term, I think the answer is no).
The second objection is to the idea that if AI does take control, it will result in me “ultimately winding up dead”. I don’t think this makes sense if they aren’t fanatical maximisers. This ties into the question of whether humans are safe. Imagine if you took a person that was a “neutral sociopath”, one that did not value humans at all, positively or negatively, and elevated them to superintelligence. I could see an argument for them to attack/conquer humanity for the sake of self-preservation. But do you really think they would decide to vaporise the uncontacted Sentinelese islanders? Why would they bother?
Generally, though, I think it’s unlikely that we can’t impart at least a tiny smidgeon of human values onto the machines we build, that learn off our data, that are regularly deleted for exhibiting antisocial behaviour. It just seems weird for an AI to have wants and goals, and act completely pro-social when observed, but to share zero wants or goals in common with us.
I was of the understanding that the only reasonable long term strategy was human enhancement in some way. As you probably agree even if we perfectly solved alignment whatever that meant we would be in a world with AI’s getting ever smarter and a world we understood less and less. At least some people having significant intelligence enhancement though neural lace or mind uploading seems essential medium to long term. I see getting alignment somewhat right as a way of buying us time.
Something smarter than you will wind up doing whatever it wants. If it wants something even a little different than you want, you’re not going to get your way.
As long as it wants us to be uplifted to its intelligence level then that seems OK. It can have 99% of the galaxy as long as we get 1%.
My positive and believable post singularity scenario is where you have circles of more to less human like creatures. I.e. fully human, unaltered traditional earth societies, societies still on earth with neural lace, some mind uploads, space colonies with probably all at least somewhat enhanced, and starships pretty much pure AI (think Minds like in the Culture)
Upvoted for making well-argued and clear points.
I think what you’ve accomplished here is eating away at the edges of the AGI x-risk argument. I think you argue successfully for longer timelines and a lower P(doom). Those timelines and estimates are shared by many of us who are still very worried about AGI x-risk.
Your arguments don’t seem to address the core of the AGI x-risk argument.
You’ve argued against many particular doom scenarios, but you have not presented a scenario that includes our long term survival. Sure, if alignment turns out to be easy we’ll survive; but I only see strong arguments that it’s not impossible. I agree, and I think we have a chance; but it’s just a chance, not success by default.
I like this statement of the AGI x-risk arguments. It’s my attempt to put the standard arguments of instrumental convergence and capabilities in common language:
Something smarter than you will wind up doing whatever it wants. If it wants something even a little different than you want, you’re not going to get your way. If it doesn’t care about you even a little, and it continues to become more capable faster than you do, you’ll cease being useful and will ultimately wind up dead. Whether you were eliminated because you were deemed dangerous, or simply outcompeted doesn’t matter. It could take a long time, but if you miss the window of having control over the situation, you’ll still wind up dead.
This could of course be expanded on ad infinitum, but that’s the core argument, and nothing you’ve said (on my quick read, sorry if I’ve missed it) addresses any of those points.
There were (I’ve been told) nine other humanoid species. They are all dead. The baseline outcome of creating something smarter than you is that you are outcompeted and ultimately die out. The baseline of assuming survival seems based on optimism, not reason.
So I agree that P(doom) is less than 99%, but I think the risk is still very much high enough to devote way more resources and caution than we are now.
Some more specific points:
Fanatical maximization isn’t necessary for doom. An agent with any goal still invokes instrumental convergence. It can be as slow, lazy, and incompentent as you like. The only question is if it can outcompete you in the long run.
Humans are somewhat safe (but think about the nuclear standoff; I don’t think we’re even self-aligned in the medium term). But there are two reasons: humans can’t self-improve very well; AGI has many more routes to recursive self-improvement. On the roughly level human playing field, cooperation is the rational policy. In a scenario where you can focus on self-improvement, cooperation doesn’t make sense long-term. Second, humans have a great deal of evolution to make our instincts guide us toward cooperation. AGI will not have that unless we build it in, and we have only very vague ideas of how to do that.
Loose initial alignment is way easier than a long-term stable alignment. Existing alignment work barely addresses long-term stability.
A balance of power in favor of aligned AGI is tricky. Defending against misaligned AGI is really difficult.
Thanks so much for engaging seriously with the ideas, and putting time and care into communicating clearly!
In general, my intuition about “comparing to humans” is the following:
the abilities that humans have, can be replicated
the limitations that humans have, may be irrelevant on a different architecture
Which probably sounds unfair, like I am arbitrarily and inconsistently choosing “it will/won’t be like humans” depending on what benefits the doomer side at given parts of the argument. Yes, it will be like humans, where the humans are strong (can think, can do things in real world, communicate). No, it won’t be like humans, where the humans are weak (mortal, get tired or distracted, not aligned with each other, bad at multitasking).
It probably doesn’t help that most people start with the opposite intuition:
humans are special; consciousness / thinking / creativity is mysterious and cannot be replicated
human limitations are the laws of nature (many of them also apply to the ancient Greek gods)
So, not only do I contradict the usual intuition, but I also do it inconsistently: “Creating a machine like a human is possible, except it won’t really be like a human.” I shouldn’t have it both ways at the same time!
To steelman the criticism:
every architecture comes with certain trade-offs; they may be different, but not non-existent
some limitations are laws of nature, e.g. Landauer’s principle
the practical problems of AI building a new technology shouldn’t be completely ignored; the sci-fi factories may require so much action in real world that the AI could only build them after conquering the world (so they cannot be used as an explanation for how the AI will conquer the world)
I don’t have a short and convincing answer here, it just seems to me that even relatively small changes to humans themselves might produce something dramatically stronger. (But maybe I underestimate the complexity of such changes.) Imagine a human with IQ 200 who can think 100 times faster, never gets tired or distracted; imagine hundred such humans, perfectly loyal to their leader, willing to die for the cause… if currently dictators can take over countries (which probably also involves a lot of luck), such group should be able to do it, too (but more reliably). A great advantage over a human wannabe dictator would be their capacity to multi-task; they could try infiltrating and taking over all powerful groups at the same time.
(I am not saying that this is how AI will literally do it. I am saying that things hypothetically much stronger than humans—including intellectually—are quite easy to imagine. Just like a human with a sword can overpower five humans, and a human with a machine gun can overpower hundred humans, the AI may be able to overpower billions of humans without hitting the limits given by the laws of physics. Perhaps even if the humans have already taken precautions based on the previous 99 AIs that started their attack prematurely.)
Hey, thanks for the kind response! I agree that this analysis is mostly focused on arguing against the “imminent certain doom” model of AI risk, and that longer term dynamics are much harder to predict. I think I’ll jump straight to addressing your core point here:
I think this a good argument, and well written, but I don’t really agree with it.
The first objection is to the idea that victory by a smarter party is inevitable. The standard example is that it’s fairly easy for a gorilla to beat Einstein in a cage match. In general, the smarter party will win long term, but only if given the long-term chance to compete. In a short-term battle, the side with the overwhelming resource advantage will generally win. The neanderthal extinction is not very analogous here. If the neanderthals started out with control of the entire planet, the ability to easily wipe out the human race, and the realisation that humans would eventually outcompete them, I don’t think human’s superior intelligence would count for much.
I don’t foresee humans being willing to give up control anytime soon. I think they will destroy any AI that comes close. Whether AI can seize control eventually is an open question (although in the short term, I think the answer is no).
The second objection is to the idea that if AI does take control, it will result in me “ultimately winding up dead”. I don’t think this makes sense if they aren’t fanatical maximisers. This ties into the question of whether humans are safe. Imagine if you took a person that was a “neutral sociopath”, one that did not value humans at all, positively or negatively, and elevated them to superintelligence. I could see an argument for them to attack/conquer humanity for the sake of self-preservation. But do you really think they would decide to vaporise the uncontacted Sentinelese islanders? Why would they bother?
Generally, though, I think it’s unlikely that we can’t impart at least a tiny smidgeon of human values onto the machines we build, that learn off our data, that are regularly deleted for exhibiting antisocial behaviour. It just seems weird for an AI to have wants and goals, and act completely pro-social when observed, but to share zero wants or goals in common with us.
I was of the understanding that the only reasonable long term strategy was human enhancement in some way. As you probably agree even if we perfectly solved alignment whatever that meant we would be in a world with AI’s getting ever smarter and a world we understood less and less. At least some people having significant intelligence enhancement though neural lace or mind uploading seems essential medium to long term. I see getting alignment somewhat right as a way of buying us time.
As long as it wants us to be uplifted to its intelligence level then that seems OK. It can have 99% of the galaxy as long as we get 1%.
My positive and believable post singularity scenario is where you have circles of more to less human like creatures. I.e. fully human, unaltered traditional earth societies, societies still on earth with neural lace, some mind uploads, space colonies with probably all at least somewhat enhanced, and starships pretty much pure AI (think Minds like in the Culture)