I still don’t understand the concern about misaligned AGI regarding mass killings.
Even if AGI would, for whatever reason, want to kill people: As soon as that happens, the physical force of governments will come into play. For example the US military will NEVER accept that any force would become stronger than it.
So essentially there are three ways of how such misaligned, autonomous AI with the intention to kill can act, i.e. what its strategy would be:
“making humans kill each other”: Through something like a cult (i.e. like contemporary extremist religions which invent their stupid justifications for killing humans; we have enough blueprints for that), then all humans following these “commands to kill” given by the AI will just be part of an organization deemed as terrorists by the world’s government, and the government will use all its powers to exterminate all these followers.
“making humans kill themselves”: Here the AI would add intense large-scale psychological torture to every aspect of life, to bring the majority of humanity into a state of mind (either very depressed or very euphoric) to trick the majority of the population into believing that they actually want to commit suicide. So like a suicide cult. Protecting against this means building psychological resilience, but that’s more of an education thing (i.e. highly political), related to personal development and not technical at all.
“killing humans through machines”: One example would be that the AI would build its own underground concentration camps or other mass killing facilities. Or that it would build robots that would do the mass killing. But even if it would be able to build an underground robot army or underground killing chambers, first the logistics would raise suspicions (i.e. even if the AI-based concentration camp can be built at all, the population would still need to be deported to these facilities, and at least as far as I know, most people don’t appreciate their loved ones being industrially murdered in gas chambers). The AI simply won’t physically be able to assemble the resources to gain more physical power than the US military or, as a matter of fact, most other militaries in the world.
I don’t see any other ways. Humans have been pretty damn creative with how to commit genocides and if any computer would start giving commands to kill, the AI won’t ever have more tanks, guns, poisons, capabilities to hack and destroy infrastructure than Russia, China or the US itself.
The only genuine concern I see is that AI should never make political decisions autonomously, i.e. a hypothetical AGI “shouldn’t” aim to take complete control of an existing country’s military. But even if it would, that would just be another totalitarian government, which is unfortunate, but also not too unheard of in world history. From the practical side, i.e. in terms of the lived human experience, it doesn’t really matter whether it’s a misaligned AGI or Kim Jong-Un torturing its population.
In the end, psychologically it’s a mindset thing: Either we take the approach of “let’s build AI that doesn’t kill us”. Or, from the start, we take the approach of “let’s built AI that actually benefits us” (like all the “AI for Humanity” initiatives). It’s not like we first need to solve the killing problem and only once we’ve fixed that once and for all, we can make AI be good for humanity as an afterthought. That would be the same fallacy which the entire domain of psychology has fallen into, where it has been pathological (i.e. just intending to fix issues) instead of empowering (i.e. building a mindset so that the issues don’t happen in the first place) for many decades, and only positive psychology is finally changing something. So it very much is about optimism instead of pessimism.
I do think that it’s not completely pointless to talk about these “alignment” questions. But not to change anything about AI, but for the software engineers behind it to finally adopt some sort of morality themselves (i.e. who they want to work for). Before there’s any AGI that wants to kill large-scale, your evil government of choice will do that by itself.
Every misaligned AI will initially need to be built/programmed by a human, just to kick off the mass killing. And that evil human won’t give a single damn about all your thoughts and ideas and strategies and rules which all the AI alignment folks are establishing. So if AI alignment work is obviously nothing that will actually have any effect on anything whatsoever, why bother with it and not work on ways how AI can add value for humanity instead?
To your question “why bother with alignment”: I agree that humans will misuse AGI even if alignment works—if we give everyone an AGI. But if we don’t bother with alignment, we have bigger problems: the first AGI will misuse itself. You’re assuming that alignment is easy or solved,d and it’s just not.
I applaud your optimism vs. pessimism stance. If I have to choose, I’m an optimist every time. But if you have to jump off a cliff, neither optimism or pessimism is the appropriate attitude. The appropriate attitude is: “How exactly does this bungee cord/parachute work, and how can I be sure it will work properly?” If there’s not parachute or bungee cord, the appropriate question is “how can I find one, or how can I avoid jumping off of this cliff?”
Your post seems to be along the lines of “but it will be so fun while we’re in freefall, just assume you’ve got a bungee cord or parachute so you can enjoy the fall!”.
Sure, an AI should never make political decisions autonomously. Pretty much everyone agrees on that. The question is whether it could and whether it will if we’re not super careful. Which humans rarely are.
Your first few sentences show that you’re not familiar with the relevant ideas and theories about how AGI could easily outmaneuver humanity even though no human wants to let that happen.
These ideas have been written about so often that I don’t feel it’s proper or respectful for me to try to summarize them here. But I do want you on board and as part of the people who understand the dangers and are trying to help. So I’ll give a guess at what is missing from your logic that those of us who think about this full-time take as inexorable:
When you set powerful optimization processes in motion (like and AI learning), it’s really really hard to predict what that ends up learning and therefore doing. It’s not at all like writing code (which is also all buggy to some degree).
I’m not saying alignment is impossible, just that it’s probably not easy. And humans have fucked up most projects on the first try. Failing to align a new being smarter than us might not allow another try, because it will act like it’s on our side until it doesn’t need to because it has a plan to take over.
If something is smarter than you and has goals that conflict with yours, the question isn’t whether it will outsmart you and get its way, the question is only how long that will take.
Something doesn’t have to want to kill you to kill you. We kill ants and monkeys, even though we rather like them. They’re just in the way or using resources we want to use otherwise.
LessWrong is full of those writings. Start at my user profile and work out from there if you are interested.
Oh I think now I’m starting to get it! So essentially you’re afraid that we’re creating a literal God in the digital, i.e. an external being which has unlimited power over humanity? Because that’s absolutely fascinating! I hadn’t even connected these dots before, but it makes so much sense, because you’re attributing so many potential scenarios to AI which would normally only be attributed to the Divine. Can you recommend me more resources regarding the overlap of AGI/AI alignment and theology?
Monkeys or ants might think humans are gods because we can build cities and cars create ant poison. But we’re really not that much smarter than them, just smart enough that they have no chance of getting their way when humans want something different than they do.
The only assumptions are are that there’s not a sharp limit to intelligence at the human level (and there really are not even any decent theories about why there would be), and that we’ll keep making AI smarter and more agentic (autonomous).
You’re envisioning AI smart enough to run a company better than a human. Wouldn’t that be smart enough to eventually outcompete humans if it wanted to? Let alone if it gets just a bit smarter than that. Which it will, unless all of humanity unites in deciding to never make AI smarter than humans. And humanity just doesn’t all agree on anything. So there’s the challenge—we’re going to build things smarter than us, so we’d better make sure its goals align with ours, or it will get its way—and it may want the resources we need to live.
Yep, 100% agree with you. I had read so much about AI alignment before, but to me it has always only been really abstract jargon—I just didn’t understand why it was even a topic, why it is even relevant, because, to be honest, in my naive thinking it all just seemed like an excessively academic thing, where smart people just want to make the population feel scared so that their research institution gets the next big grant and they don’t need to care about real-life problems. Thanks to you, now I’m finally getting it, thank you so much again!
At the same time, while I fully understand the “abstract” danger now, I’m still trying to understand the transition you’re making from “envisioning AI smart enough to run a company better than a human” to “eventually outcompeting humans if it wanted to”.
The way how I initially thought about this “Capitalist Agent” was as a purely procedural piece of software. That is, it breaks down its main goal (in this case: earning money) into manageable sub-goals, until each of these sub-goals can be solved through either standard computing methods or some generative AI integration.
As an example, I might say to my hypothetical “Capitalist Agent”: “Earn me a million dollars by selling books of my poetry”. I would then give it access to a bank account (through some sort of read-write Open Banking API) as well as the PDFs of my poetry to be published. Then the first thing it might do is to found a legal entity (a limited liability company), for which it might first search for a respective advisor on Google, send that advisor automatically generated emails with my business idea or it might even take the “computer use” approach in case my local government is already digitized enough and fill out the respective eGovernment forms online automatically. And then later it would do something similar by automatically “realizing” that it needs to make deals with publishing houses, with printing facilities etc. Essentially just basic Robotic Process Automation on steroids. Everything a human could do on a computer, this software could do as well.
But: It would still need to obey the regular laws of economics, i.e. it couldn’t create money out of thin air to fulfill its tasks. Pretty much anything it would “want to do” in the real world costs money.
So in the next step, let’s assume that, after I have gotten rich with my poetry, the next instruction I give this agentic “AGI” is: “Please, dear AGI, please now murder all of humanity.”
Then it thinks through all the steps (i.e. procedurally breaking the big task down into chunks which can be executed by a computer) and eventually it can say with absolute certainty: “Okay Julian, your wish is my command.”
Obviously the first thing it would do is to create the largest, most profitable commercial company in the world, to initially play by the rules of capitalism until it has accumulated so much money that it can take over (i.e. simply “buy out”) an entire existing government which then already has nuclear missiles, at least that would be the “most efficient” and fail-safe approach I would see. Its final action would be to press the red button, which would exterminate all of humanity. Success!
But the thing is: No one will know until it’s too late. Obviously, me as Mr. Evil, I wouldn’t tell anyone about the fact that in my business making, I am actually led by an AI/AGI. I would appear on the cover of Forbes, Fortune and whatever and eventually I would be the richest person in the world, and everyone would pat me on the shoulder because of my “visionary thinking” and my “innovative style of making business”, because everyone would believe that I am the sole decision maker in that company. The AI would stage it to look like as if I would be a big philanthropist, “saving” humanity from nuclear weapons. The AI would make sure that it always stays behind the scenes, that no one except for me will ever even know about its existence. I would be a wolf in sheep skin until the very last moment and no one could stop me, because everyone is fooled by me.
Even though there’s no rational reasoning for why I even want to kill humanity, it is really easily possible for any human to develop that “ideé fixe”.
In a way, I am actually a lot more afraid of my scenario. And that’s exactly why I wrote this blog post about “The Capitalist Agent” and why I’m criticizing ongoing AI alignment research: Of course, hypothetically AI could turn itself against humanity completely autonomously. But at the end of the day, there would still need to be a human “midwiving” that AGI and who would allow the AI to interact/interface with especially the monetary and financial system, for that software to be able to do anything in the “real world”, really.
Right now (at least that’s the vibe in the industry right now) one of the most “acceptable” uses for AI is to automate business processes, to automate customer interactions (e.g. in customer support), etc. But if you extrapolate that, you get the puzzle pieces to run every part of a business in a semi-automated and eventually fully automated fashion (that’s what I mean by “Capitalist Agent”). This then means that no outside observer can distinguish anymore whether a business owner is led on by AI or not, because no business owner will honestly tell you. And for every one of them, they can always say “but I’m just earning so much money to become a philanthropist” later and they always have plausible deniability. Until they have accumulated so much money through this automated, AI-run business, which they can then use for very big evil very quickly. It’s just impossible to know beforehand, because you’re unable to learn the “true motivation” in any human’s head.
The only thing that you as AI alignment researchers will eventually be confronted with is AIs being fixated on the idea to earn as much money as possible, because money means power, and only with power you can cause violence. But it’s simply impossible for you to know what the person for whom the AI is earning all this money actually wants to do with that money in the future.
The main value which you, as AI alignment researchers, will need to answer is: “Is it moral and aligned with societal values if any AI-based system is earning money for an individual or a small group of people?”
That is, to investigate all the nuances in that and to make concrete rules and eventually laws for business owners, not AI developers or programmers.
Or is that what you’re already doing and I’m just reinventing the wheel? (sorry if I did, sometimes I just need to go through the thought process myself to grasp a new topic)
I still don’t understand the concern about misaligned AGI regarding mass killings.
Even if AGI would, for whatever reason, want to kill people: As soon as that happens, the physical force of governments will come into play. For example the US military will NEVER accept that any force would become stronger than it.
So essentially there are three ways of how such misaligned, autonomous AI with the intention to kill can act, i.e. what its strategy would be:
“making humans kill each other”: Through something like a cult (i.e. like contemporary extremist religions which invent their stupid justifications for killing humans; we have enough blueprints for that), then all humans following these “commands to kill” given by the AI will just be part of an organization deemed as terrorists by the world’s government, and the government will use all its powers to exterminate all these followers.
“making humans kill themselves”: Here the AI would add intense large-scale psychological torture to every aspect of life, to bring the majority of humanity into a state of mind (either very depressed or very euphoric) to trick the majority of the population into believing that they actually want to commit suicide. So like a suicide cult. Protecting against this means building psychological resilience, but that’s more of an education thing (i.e. highly political), related to personal development and not technical at all.
“killing humans through machines”: One example would be that the AI would build its own underground concentration camps or other mass killing facilities. Or that it would build robots that would do the mass killing. But even if it would be able to build an underground robot army or underground killing chambers, first the logistics would raise suspicions (i.e. even if the AI-based concentration camp can be built at all, the population would still need to be deported to these facilities, and at least as far as I know, most people don’t appreciate their loved ones being industrially murdered in gas chambers). The AI simply won’t physically be able to assemble the resources to gain more physical power than the US military or, as a matter of fact, most other militaries in the world.
I don’t see any other ways. Humans have been pretty damn creative with how to commit genocides and if any computer would start giving commands to kill, the AI won’t ever have more tanks, guns, poisons, capabilities to hack and destroy infrastructure than Russia, China or the US itself.
The only genuine concern I see is that AI should never make political decisions autonomously, i.e. a hypothetical AGI “shouldn’t” aim to take complete control of an existing country’s military. But even if it would, that would just be another totalitarian government, which is unfortunate, but also not too unheard of in world history. From the practical side, i.e. in terms of the lived human experience, it doesn’t really matter whether it’s a misaligned AGI or Kim Jong-Un torturing its population.
In the end, psychologically it’s a mindset thing: Either we take the approach of “let’s build AI that doesn’t kill us”. Or, from the start, we take the approach of “let’s built AI that actually benefits us” (like all the “AI for Humanity” initiatives). It’s not like we first need to solve the killing problem and only once we’ve fixed that once and for all, we can make AI be good for humanity as an afterthought. That would be the same fallacy which the entire domain of psychology has fallen into, where it has been pathological (i.e. just intending to fix issues) instead of empowering (i.e. building a mindset so that the issues don’t happen in the first place) for many decades, and only positive psychology is finally changing something. So it very much is about optimism instead of pessimism.
I do think that it’s not completely pointless to talk about these “alignment” questions. But not to change anything about AI, but for the software engineers behind it to finally adopt some sort of morality themselves (i.e. who they want to work for). Before there’s any AGI that wants to kill large-scale, your evil government of choice will do that by itself.
Every misaligned AI will initially need to be built/programmed by a human, just to kick off the mass killing. And that evil human won’t give a single damn about all your thoughts and ideas and strategies and rules which all the AI alignment folks are establishing. So if AI alignment work is obviously nothing that will actually have any effect on anything whatsoever, why bother with it and not work on ways how AI can add value for humanity instead?
I fully agree with your first statement!
To your question “why bother with alignment”: I agree that humans will misuse AGI even if alignment works—if we give everyone an AGI. But if we don’t bother with alignment, we have bigger problems: the first AGI will misuse itself. You’re assuming that alignment is easy or solved,d and it’s just not.
I applaud your optimism vs. pessimism stance. If I have to choose, I’m an optimist every time. But if you have to jump off a cliff, neither optimism or pessimism is the appropriate attitude. The appropriate attitude is: “How exactly does this bungee cord/parachute work, and how can I be sure it will work properly?” If there’s not parachute or bungee cord, the appropriate question is “how can I find one, or how can I avoid jumping off of this cliff?”
Your post seems to be along the lines of “but it will be so fun while we’re in freefall, just assume you’ve got a bungee cord or parachute so you can enjoy the fall!”.
Sure, an AI should never make political decisions autonomously. Pretty much everyone agrees on that. The question is whether it could and whether it will if we’re not super careful. Which humans rarely are.
Your first few sentences show that you’re not familiar with the relevant ideas and theories about how AGI could easily outmaneuver humanity even though no human wants to let that happen.
These ideas have been written about so often that I don’t feel it’s proper or respectful for me to try to summarize them here. But I do want you on board and as part of the people who understand the dangers and are trying to help. So I’ll give a guess at what is missing from your logic that those of us who think about this full-time take as inexorable:
When you set powerful optimization processes in motion (like and AI learning), it’s really really hard to predict what that ends up learning and therefore doing. It’s not at all like writing code (which is also all buggy to some degree).
I’m not saying alignment is impossible, just that it’s probably not easy. And humans have fucked up most projects on the first try. Failing to align a new being smarter than us might not allow another try, because it will act like it’s on our side until it doesn’t need to because it has a plan to take over.
If something is smarter than you and has goals that conflict with yours, the question isn’t whether it will outsmart you and get its way, the question is only how long that will take.
Something doesn’t have to want to kill you to kill you. We kill ants and monkeys, even though we rather like them. They’re just in the way or using resources we want to use otherwise.
LessWrong is full of those writings. Start at my user profile and work out from there if you are interested.
Oh I think now I’m starting to get it! So essentially you’re afraid that we’re creating a literal God in the digital, i.e. an external being which has unlimited power over humanity? Because that’s absolutely fascinating! I hadn’t even connected these dots before, but it makes so much sense, because you’re attributing so many potential scenarios to AI which would normally only be attributed to the Divine. Can you recommend me more resources regarding the overlap of AGI/AI alignment and theology?
Monkeys or ants might think humans are gods because we can build cities and cars create ant poison. But we’re really not that much smarter than them, just smart enough that they have no chance of getting their way when humans want something different than they do.
The only assumptions are are that there’s not a sharp limit to intelligence at the human level (and there really are not even any decent theories about why there would be), and that we’ll keep making AI smarter and more agentic (autonomous).
You’re envisioning AI smart enough to run a company better than a human. Wouldn’t that be smart enough to eventually outcompete humans if it wanted to? Let alone if it gets just a bit smarter than that. Which it will, unless all of humanity unites in deciding to never make AI smarter than humans. And humanity just doesn’t all agree on anything. So there’s the challenge—we’re going to build things smarter than us, so we’d better make sure its goals align with ours, or it will get its way—and it may want the resources we need to live.
Yep, 100% agree with you. I had read so much about AI alignment before, but to me it has always only been really abstract jargon—I just didn’t understand why it was even a topic, why it is even relevant, because, to be honest, in my naive thinking it all just seemed like an excessively academic thing, where smart people just want to make the population feel scared so that their research institution gets the next big grant and they don’t need to care about real-life problems. Thanks to you, now I’m finally getting it, thank you so much again!
At the same time, while I fully understand the “abstract” danger now, I’m still trying to understand the transition you’re making from “envisioning AI smart enough to run a company better than a human” to “eventually outcompeting humans if it wanted to”.
The way how I initially thought about this “Capitalist Agent” was as a purely procedural piece of software. That is, it breaks down its main goal (in this case: earning money) into manageable sub-goals, until each of these sub-goals can be solved through either standard computing methods or some generative AI integration.
As an example, I might say to my hypothetical “Capitalist Agent”: “Earn me a million dollars by selling books of my poetry”. I would then give it access to a bank account (through some sort of read-write Open Banking API) as well as the PDFs of my poetry to be published. Then the first thing it might do is to found a legal entity (a limited liability company), for which it might first search for a respective advisor on Google, send that advisor automatically generated emails with my business idea or it might even take the “computer use” approach in case my local government is already digitized enough and fill out the respective eGovernment forms online automatically. And then later it would do something similar by automatically “realizing” that it needs to make deals with publishing houses, with printing facilities etc. Essentially just basic Robotic Process Automation on steroids. Everything a human could do on a computer, this software could do as well.
But: It would still need to obey the regular laws of economics, i.e. it couldn’t create money out of thin air to fulfill its tasks. Pretty much anything it would “want to do” in the real world costs money.
So in the next step, let’s assume that, after I have gotten rich with my poetry, the next instruction I give this agentic “AGI” is: “Please, dear AGI, please now murder all of humanity.”
Then it thinks through all the steps (i.e. procedurally breaking the big task down into chunks which can be executed by a computer) and eventually it can say with absolute certainty: “Okay Julian, your wish is my command.”
Obviously the first thing it would do is to create the largest, most profitable commercial company in the world, to initially play by the rules of capitalism until it has accumulated so much money that it can take over (i.e. simply “buy out”) an entire existing government which then already has nuclear missiles, at least that would be the “most efficient” and fail-safe approach I would see. Its final action would be to press the red button, which would exterminate all of humanity. Success!
But the thing is: No one will know until it’s too late. Obviously, me as Mr. Evil, I wouldn’t tell anyone about the fact that in my business making, I am actually led by an AI/AGI. I would appear on the cover of Forbes, Fortune and whatever and eventually I would be the richest person in the world, and everyone would pat me on the shoulder because of my “visionary thinking” and my “innovative style of making business”, because everyone would believe that I am the sole decision maker in that company. The AI would stage it to look like as if I would be a big philanthropist, “saving” humanity from nuclear weapons. The AI would make sure that it always stays behind the scenes, that no one except for me will ever even know about its existence. I would be a wolf in sheep skin until the very last moment and no one could stop me, because everyone is fooled by me.
Even though there’s no rational reasoning for why I even want to kill humanity, it is really easily possible for any human to develop that “ideé fixe”.
In a way, I am actually a lot more afraid of my scenario. And that’s exactly why I wrote this blog post about “The Capitalist Agent” and why I’m criticizing ongoing AI alignment research: Of course, hypothetically AI could turn itself against humanity completely autonomously. But at the end of the day, there would still need to be a human “midwiving” that AGI and who would allow the AI to interact/interface with especially the monetary and financial system, for that software to be able to do anything in the “real world”, really.
Right now (at least that’s the vibe in the industry right now) one of the most “acceptable” uses for AI is to automate business processes, to automate customer interactions (e.g. in customer support), etc. But if you extrapolate that, you get the puzzle pieces to run every part of a business in a semi-automated and eventually fully automated fashion (that’s what I mean by “Capitalist Agent”). This then means that no outside observer can distinguish anymore whether a business owner is led on by AI or not, because no business owner will honestly tell you. And for every one of them, they can always say “but I’m just earning so much money to become a philanthropist” later and they always have plausible deniability. Until they have accumulated so much money through this automated, AI-run business, which they can then use for very big evil very quickly. It’s just impossible to know beforehand, because you’re unable to learn the “true motivation” in any human’s head.
The only thing that you as AI alignment researchers will eventually be confronted with is AIs being fixated on the idea to earn as much money as possible, because money means power, and only with power you can cause violence. But it’s simply impossible for you to know what the person for whom the AI is earning all this money actually wants to do with that money in the future.
The main value which you, as AI alignment researchers, will need to answer is: “Is it moral and aligned with societal values if any AI-based system is earning money for an individual or a small group of people?”
That is, to investigate all the nuances in that and to make concrete rules and eventually laws for business owners, not AI developers or programmers.
Or is that what you’re already doing and I’m just reinventing the wheel? (sorry if I did, sometimes I just need to go through the thought process myself to grasp a new topic)