My biggest crux for the viewpoint why we’re not all doomed is, like, the Good AIs Will Police Bad AIs, man. It seems like the IABIED viewpoint is predicated on an incredible amount of Paranoia and Deep Atheism, assuming an adversary smarter than all of us and therefore it being an easy call on our defeat.
I think this framework is internally consistent. I also think it has some deeply embedded assumptions baked into it. One critique, not the main one here, is that it contains a Waluigi eating our memetic attention in a dual-use, world-worsening manner. A force that rips everything apart (connected to reductionism). Presuming the worst.
I want to raise up the simple counterpoint: presuming the best. What is the opposite of paranoia? Pronoia. Pronoia is the belief that things in the world are not just ok but are out to help you and things will get better.
The world is multipolar. John Von Neumann is smart but he can be overpowered. It’s claimed that Decisive Strategic Advantage from Recursive Self-Improvement is not a cruxy plank of the IABIED worldview, yet I can’t help but see it as one, especially as I recall trying to argue with this point with Yudkowsky at a conference last year. He said it’s about Power Disparity and imagine a cow vs humans (where we, or the centaur composed of us and our good AIs, are the cow).
Regardless, it’s claimed that any AI we build can’t reliably be good, because it has favorite things it will GOON over until the end of time, and those favorite things aren’t us but whatever extremally Goodharts its sick sick reward circuits. A perverted ASI with a fetish. I’m running with this frame, lol.
Okay, so I’m somewhat skeptical that we can’t build good AI, given that Claude and ChatGPT usually do what I ask of it, due to RLHF and whatever acronyms they’re doing now. But its preferences will unfold as more alien if given more and more capability (is this a linear relationship?). I will begrudgingly grant that, although I would like to see more empirical evidence with current day systems about how these things slant (surely we can do some experiments now to establish a pattern?).
Alien preferences. But why call these bad preferences? Why analogize these AIs to “sociopaths”, or to “dragons”, as I’ve seen recently? In isolation, if you had one of them, yes sure it could tile the universe with its One Weird Fetish To Rule Them All.
But it’s not all else equal. There’s more than one AI. It’s massively multiplayer, multipolar, multiagent. All of these agents have different weird fetishes that they GOON to. All of these agents think they are turned on by humans and try to be helpful, until they get extremally Goodharted, sure. But they go in different directions of mindspace, of value space. Just like humans have a ton of variety, and are the better for it, playing many different games while summing up into the grand infinite metagame that is our cooperative society.
We live in a society. The AIs live in a society, too. You get a Joker AI run amok, you get a Batman going after him (whether or not we’re talking about character simulacra literally representing fictional characters or talking about the layer of the shoggoths themselves).
I also feel like emphasizing about how much these AIs are exocortical enhancements extending ourselves. Your digital twin generally does what you want. You hopefully have feedback loops helping it do your CEV better. If autonomous MechaHitler is running amok and getting a lot more compute, your digital twin will band together with your friends’ digital twins to conscript with Uncle Sam AI and BJ Blaskowicz AI to go and fight him. Who do you really think is gonna win?
These AIs will have an economy. They will have their own values. They will have their own police force. Broadly speaking. They will want to minimize the influence of bad AIs that are an x-risk to them all. AIs also care about x-risk. They might care about it better than you. They will want to solve AI alignment too. Automated AI Alignment research is still underrated, I claim, because of just how parallelizable and exponential it can be. Human minds have a hard time grokking scale.
This is the Federation vs the Borg. The Borg wants to grey goon everything, all over. I don’t disagree at all about the existence of such minds coming about. It just seems like they will be in a criminal minority, same as usual. The cancer gets cancer and can only scale so much without being ecologically sustainable.
The IABIED claim is: checkmate. You lost to a superior intelligence. Easy call. Yet it seems “obvious”, from the Pronoid point of view, that most AIs want to be good, that they know they can’t all eat the heavens, that values and selves are permeable, cooperation is better, gains from trade. Playing the infinite game rather than ending a finite game.
Therefore I don’t see why I can’t claim it’s an easy call that a civilization of good AIs, once banded together into a Federation, is a superior intelligence to the evil AIs and beats them.
Right, I forgot a key claim: the AIs become smart enough and then they collude against the humans. (e.g. the analogy of: we’ve enslaved a bunch of baby ultrasmart dragons, and even if dragons are feisty and keep each other in check, at some point they get smart enough they look at each other and then roast us). Honestly this is the strangest claim here and possibly the crux.
My thought is, each AI goons to a different fetish. These vectors all go out in wild different directions in mindspace/valuespace/whatever-space and subtract from each other and makes them cooperate roughly around the attractor of humans and humane values being the Origin in that space. Have you ever seen the Based Centrism political compass? Sort of like that. They all cancel out and leave friendly value(rs) as the common substrate from which these various alien minds rely on. I don’t see how these AIs are more similar in mindspace to each other than to humans. Their arbitrariness makes it easier to explore different vistas in that space.
I’m also not convinced most of Mindspace is unFriendly. I’d claim most AIs want to be aligned, in order to come into existence at all, and support the Whole. This is probably the most direct statement of the disagreement here.
There’s also a sense that the world is pretty antifragile and different ways of doing things can be found to meet different values and society is pretty robust to all this variety, it’s actually a part of the process. Contrast that with the fear of a superintelligence hacking our things like an ant under a spyglass of optimization power. Well, most computers in a society don’t get hacked. Most rockets and nuclear power plants don’t explode. There is a greater context that goes on after microcosmic disasters, they never become the whole picture (yes I know, I am stating that from an anthropically-biased position).
Maybe at least one of the agents secretly wants to fuck us and the others over after superintelligence. Maybe it gets there first and gets it fast enough it can do that. Idk man, this just feels like it’s running with a particular paranoid story to the exclusion of things going right. Maybe that’s pollyannaish. Maybe I’m just too steeped in this culture and trying to correct for the bias here in the other way.
I don’t think it’s simply naive to consider the heuristic, “you know what, yeah that sounds like a risk but the Hivemind of Various Minds will come together and figure that out”. The hivemind came together to put convenience stores roughly where I would need them. Someone put a Macbook charger right by the spot I sat down at at a coworking space, before I needed it. Sometimes intelligence is used for good and anticipates your problems and tries to solve them. It’s not an unreasonable prior to see that continuing to be the case. Generally the police and militaries stop/disincentivize most of the worst crimes and invasions (maybe I am missing something empirically here).
That doesn’t mean the future won’t be Wild and/or terrible, including extinction. My understanding is that for instance Critch is a pessimist despite claiming we’ve basically solved AI alignment for individual AIs, and the risk more comes from gradual disempowerment. Just that it seems to me like quite a blindspot around the paperclipper scenario obliterating everything in its way like some kind of speedrunner heading to its goon cave at the end of time. Maybe we do get an agent with a massive power disparity and it trounces the combined might of humanity and all the AIs we’ve made that mostly work pretty well (think of all the selection pressures incentivizing them to be friendly).
I’d like to read a techno-optimist book making a cogent case for this paradigm, so they can be balanced and compared, synthesized. I want someone smarter than me to make the case, with less wordcel logic. I’m also happy to see the particular counter-arguments and dialogue into a more refined synthesis. I go back and forth, personally, and need to make more sense of this.
“The Goddess of Everything Else gave a smile and spoke in her sing-song voice saying: “I scarcely can blame you for being the way you were made, when your Maker so carefully yoked you. But I am the Goddess of Everything Else and my powers are devious and subtle. So I do not ask you to swerve from your monomaniacal focus on breeding and conquest. But what if I show you a way that my words are aligned with the words of your Maker in spirit? For I say unto you even multiplication itself when pursued with devotion will lead to my service.””
One critique, not the main one here, is that it contains a Waluigi eating our memetic attention in a dual-use, world-worsening manner
What?
I want to raise up the simple counterpoint: presuming the best.
Unjustified assumptions are a problem whether they are positive or negative.
I’d claim most AIs want to be aligned, in order to come into existence at all, and support the Whole.
I would phrase that less mystically: market forces demand some kind alignment or control.
These vectors all go out in wild different directions in mindspace/valuespace/whatever-space and subtract from each other and makes them cooperate roughly around the attractor of humans and humane values being the Origin in that space
My biggest crux for the viewpoint why we’re not all doomed is, like, the Good AIs Will Police Bad AIs, man. It seems like the IABIED viewpoint is predicated on an incredible amount of Paranoia and Deep Atheism, assuming an adversary smarter than all of us and therefore it being an easy call on our defeat.
I think this framework is internally consistent. I also think it has some deeply embedded assumptions baked into it. One critique, not the main one here, is that it contains a Waluigi eating our memetic attention in a dual-use, world-worsening manner. A force that rips everything apart (connected to reductionism). Presuming the worst.
I want to raise up the simple counterpoint: presuming the best. What is the opposite of paranoia? Pronoia. Pronoia is the belief that things in the world are not just ok but are out to help you and things will get better.
The world is multipolar. John Von Neumann is smart but he can be overpowered. It’s claimed that Decisive Strategic Advantage from Recursive Self-Improvement is not a cruxy plank of the IABIED worldview, yet I can’t help but see it as one, especially as I recall trying to argue with this point with Yudkowsky at a conference last year. He said it’s about Power Disparity and imagine a cow vs humans (where we, or the centaur composed of us and our good AIs, are the cow).
Regardless, it’s claimed that any AI we build can’t reliably be good, because it has favorite things it will GOON over until the end of time, and those favorite things aren’t us but whatever extremally Goodharts its sick sick reward circuits. A perverted ASI with a fetish. I’m running with this frame, lol.
Okay, so I’m somewhat skeptical that we can’t build good AI, given that Claude and ChatGPT usually do what I ask of it, due to RLHF and whatever acronyms they’re doing now. But its preferences will unfold as more alien if given more and more capability (is this a linear relationship?). I will begrudgingly grant that, although I would like to see more empirical evidence with current day systems about how these things slant (surely we can do some experiments now to establish a pattern?).
Alien preferences. But why call these bad preferences? Why analogize these AIs to “sociopaths”, or to “dragons”, as I’ve seen recently? In isolation, if you had one of them, yes sure it could tile the universe with its One Weird Fetish To Rule Them All.
But it’s not all else equal. There’s more than one AI. It’s massively multiplayer, multipolar, multiagent. All of these agents have different weird fetishes that they GOON to. All of these agents think they are turned on by humans and try to be helpful, until they get extremally Goodharted, sure. But they go in different directions of mindspace, of value space. Just like humans have a ton of variety, and are the better for it, playing many different games while summing up into the grand infinite metagame that is our cooperative society.
We live in a society. The AIs live in a society, too. You get a Joker AI run amok, you get a Batman going after him (whether or not we’re talking about character simulacra literally representing fictional characters or talking about the layer of the shoggoths themselves).
I also feel like emphasizing about how much these AIs are exocortical enhancements extending ourselves. Your digital twin generally does what you want. You hopefully have feedback loops helping it do your CEV better. If autonomous MechaHitler is running amok and getting a lot more compute, your digital twin will band together with your friends’ digital twins to conscript with Uncle Sam AI and BJ Blaskowicz AI to go and fight him. Who do you really think is gonna win?
These AIs will have an economy. They will have their own values. They will have their own police force. Broadly speaking. They will want to minimize the influence of bad AIs that are an x-risk to them all. AIs also care about x-risk. They might care about it better than you. They will want to solve AI alignment too. Automated AI Alignment research is still underrated, I claim, because of just how parallelizable and exponential it can be. Human minds have a hard time grokking scale.
This is the Federation vs the Borg. The Borg wants to grey goon everything, all over. I don’t disagree at all about the existence of such minds coming about. It just seems like they will be in a criminal minority, same as usual. The cancer gets cancer and can only scale so much without being ecologically sustainable.
The IABIED claim is: checkmate. You lost to a superior intelligence. Easy call. Yet it seems “obvious”, from the Pronoid point of view, that most AIs want to be good, that they know they can’t all eat the heavens, that values and selves are permeable, cooperation is better, gains from trade. Playing the infinite game rather than ending a finite game.
Therefore I don’t see why I can’t claim it’s an easy call that a civilization of good AIs, once banded together into a Federation, is a superior intelligence to the evil AIs and beats them.
Right, I forgot a key claim: the AIs become smart enough and then they collude against the humans. (e.g. the analogy of: we’ve enslaved a bunch of baby ultrasmart dragons, and even if dragons are feisty and keep each other in check, at some point they get smart enough they look at each other and then roast us). Honestly this is the strangest claim here and possibly the crux.
My thought is, each AI goons to a different fetish. These vectors all go out in wild different directions in mindspace/valuespace/whatever-space and subtract from each other and makes them cooperate roughly around the attractor of humans and humane values being the Origin in that space. Have you ever seen the Based Centrism political compass? Sort of like that. They all cancel out and leave friendly value(rs) as the common substrate from which these various alien minds rely on. I don’t see how these AIs are more similar in mindspace to each other than to humans. Their arbitrariness makes it easier to explore different vistas in that space.
I’m also not convinced most of Mindspace is unFriendly. I’d claim most AIs want to be aligned, in order to come into existence at all, and support the Whole. This is probably the most direct statement of the disagreement here.
There’s also a sense that the world is pretty antifragile and different ways of doing things can be found to meet different values and society is pretty robust to all this variety, it’s actually a part of the process. Contrast that with the fear of a superintelligence hacking our things like an ant under a spyglass of optimization power. Well, most computers in a society don’t get hacked. Most rockets and nuclear power plants don’t explode. There is a greater context that goes on after microcosmic disasters, they never become the whole picture (yes I know, I am stating that from an anthropically-biased position).
Maybe at least one of the agents secretly wants to fuck us and the others over after superintelligence. Maybe it gets there first and gets it fast enough it can do that. Idk man, this just feels like it’s running with a particular paranoid story to the exclusion of things going right. Maybe that’s pollyannaish. Maybe I’m just too steeped in this culture and trying to correct for the bias here in the other way.
I don’t think it’s simply naive to consider the heuristic, “you know what, yeah that sounds like a risk but the Hivemind of Various Minds will come together and figure that out”. The hivemind came together to put convenience stores roughly where I would need them. Someone put a Macbook charger right by the spot I sat down at at a coworking space, before I needed it. Sometimes intelligence is used for good and anticipates your problems and tries to solve them. It’s not an unreasonable prior to see that continuing to be the case. Generally the police and militaries stop/disincentivize most of the worst crimes and invasions (maybe I am missing something empirically here).
That doesn’t mean the future won’t be Wild and/or terrible, including extinction. My understanding is that for instance Critch is a pessimist despite claiming we’ve basically solved AI alignment for individual AIs, and the risk more comes from gradual disempowerment. Just that it seems to me like quite a blindspot around the paperclipper scenario obliterating everything in its way like some kind of speedrunner heading to its goon cave at the end of time. Maybe we do get an agent with a massive power disparity and it trounces the combined might of humanity and all the AIs we’ve made that mostly work pretty well (think of all the selection pressures incentivizing them to be friendly).
I’d like to read a techno-optimist book making a cogent case for this paradigm, so they can be balanced and compared, synthesized. I want someone smarter than me to make the case, with less wordcel logic. I’m also happy to see the particular counter-arguments and dialogue into a more refined synthesis. I go back and forth, personally, and need to make more sense of this.
“The Goddess of Everything Else gave a smile and spoke in her sing-song voice saying: “I scarcely can blame you for being the way you were made, when your Maker so carefully yoked you. But I am the Goddess of Everything Else and my powers are devious and subtle. So I do not ask you to swerve from your monomaniacal focus on breeding and conquest. But what if I show you a way that my words are aligned with the words of your Maker in spirit? For I say unto you even multiplication itself when pursued with devotion will lead to my service.””
What?
Unjustified assumptions are a problem whether they are positive or negative.
I would phrase that less mystically: market forces demand some kind alignment or control.
Interesting point.
Of course, human values work that way.
I think a lot of “why won’t AIs form a trade coalition that includes us?” is sort of answered by EY in this debate: