My biggest critique of this approach is that it takes too literally the analogy that we will eventually be to superintelligence what dogs are to humans, and extrapolates it to suggest that we will be just as helpless as dogs are today.
Even if this comparison of intelligence is true on relative terms, on absolute terms we are still much smarter than dogs are. We will still be able to logically comprehend (at a much simpler level relative to the AIs) what is good to us over a long term, in a way that dogs can’t. It follows that if we manage to create aligned AI (it will listen to us and dumb things down without maliciously misrepresenting what’s going on), we (well, some of us) will be able to steer the future.
My biggest critique of this approach is that it takes too literally the analogy that we will eventually be to superintelligence what dogs are to humans, and extrapolates it to suggest that we will be just as helpless as dogs are today.
Thank you, that’s an interesting point. I’ll try to lay out my counterargument as clearly as I can.
I mentioned dogs not because they have a specific level of intelligence relative to humans, but because they got a relatively good deal. Chimps are a lot smarter than dogs, and they’re worse off. Homo erectus had culturally transmitted tools, some art, seafaring craft of some sort, and possibly language. And they’re extinct. The only common factor across these cases is that runners-up in the intelligence of race didn’t get to make the important decisions.
In fact, AGI wouldn’t need to be much smarter than humans to outcompete us in the long run. For example, if it’s no smarter than the average Nobel Prize researcher, if it’s able to work productively for $1/hour, and if it’s able to copy-and-paste multiple copies of itself, then it would already be our evolutionary superior. We might be able to remain in charge for a while. But that’s sort of like how a multicellular organism can survive for many decades. But in the end, if nothing else kills them first, multicellular organisms tend to die of cancer. This is a case of local Darwinian incentives gradually eroding “cellular alignement” with the larger multicellular organism. Similarly, if the world consists of slow, expensive and frankly stupid humans, who can’t even pass down learned knowledge “genetically” with a simple copy-paste (how primitive!), and also highly cost-effective and intelligent AIs, then there’s a constant danger of alignment failing somewhere, and a “cancerous” AI replicator escaping control.
So even if we somehow manage to create “aligned” AI, I don’t expect that to last. When you’re too stupid and too expensive to be allowed anywhere near the real economy, you’re in a very dangerous long-term position.
We will still be able to logically comprehend (at a much simpler level relative to the AIs) what is good to us over a long term, in a way that dogs can’t.
I’m not convinced of this. Paul Graham once described something he called the Blub paradox. He explained this in terms of programming languages, but I suspect that it applies more broadly:
Programmers get very attached to their favorite languages, and I don’t want to hurt anyone’s feelings, so to explain this point I’m going to use a hypothetical language called Blub. Blub falls right in the middle of the abstractness continuum. It is not the most powerful language, but it is more powerful than Cobol or machine language.
And in fact, our hypothetical Blub programmer wouldn’t use either of them. Of course he wouldn’t program in machine language. That’s what compilers are for. And as for Cobol, he doesn’t know how anyone can get anything done with it. It doesn’t even have x (Blub feature of your choice).
As long as our hypothetical Blub programmer is looking down the power continuum, he knows he’s looking down. Languages less powerful than Blub are obviously less powerful, because they’re missing some feature he’s used to. But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn’t realize he’s looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.
When we switch to the point of view of a programmer using any of the languages higher up the power continuum, however, we find that he in turn looks down upon Blub. How can you get anything done in Blub? It doesn’t even have y.
When we look “down”, chimps are obviously stupider than we are. They don’t have spoken language! They don’t have books! They can’t do real math! The can make “tools”, sure, but they’re basically pointy sticks, not factories, Space Shuttles, or computers. Their “economy” is based on family relationships and some individual reciprocity, and they don’t have even one joint stock company. Their idea of military strategy is to gang up in a band and go murder some other chimps, without understanding the role of non-commissioned officers or combined arms!
Chimps, to put it politely, have no clue.
But let’s trying looking “up” the intelligence spectrum? What do we see? Well, it looks sort of like funny humans with some weird extra stuff. The AIs can’t be that much smarter than we are, right? And if we ask nicely, I’m sure they can explain everything important to us.
But when the AIs look “down” towards Homo sapiens, they just shake their heads. Why, humans can’t even understand Z! Even if you take something really simple, like how isomorphisms between topoi and subsets of the lambda calculus make it trivial to design powerful custom programming languages for specific tasks, their eyes just glaze over! Even primitive baby AIs like Opus 4.5 could understand that. Can you imagine trying to explain to a human what replaced the econony, lol?
So here are some things which I expect to be true:
AI that was in the top 0.01% of human intelligence, that worked for a dollar an hour, and that could be replicated by copying a hard drive would already be enough to jeopardize human control of our futures.
Basic Darwinism suggests that highly resource-efficient replicators with a high rate of replication will ultimately tend to replicate.
Even weakly superintelligent AIs will have a broad range of powerful ideas and skills that humans are poorly equipped to understand, in much the same way that chimps don’t understand joint stock companies or combined arms warfare, or the way that Homo erectus doesn’t seem to have understood long-distance trade. This will make checking up on what the AIs are doing vastly harder.
My argument here is really just basic economics, politics and evolutionary biology. If you create something that renders human intellectual and physical labor economically worthless and evolutionarily uncompetitive, then the odds are excellent that you’re going to lose control. Maybe the AI will like keeping humans around as glorified pets! But that will be the AI’s decision, not ours.
Well, an aligned AI would do whatever the humans want.
If asked to not replicate even with the ability to, it wouldn’t. Or maybe you can tell it to replicate just enough to help you root out the actual AI replicators being built elsewhere, then stop at that point.
I think your argument does show how hard and fragile it is to deeply align AI in this way, though.
I don’t think you can rescue a sense of control or “steering” from a world with superintelligence, aligned or not. Even though we’re smarter than dogs, once you accept that an ASI more profoundly understands reality, we will be in an analogous situation to dogs. Dogs can’t conceptualize grocery stores, and yet we could dedicate ourselves to delivering them the best treats. Dogs might not care about how the supply chain is organized, but the kinds of treats they get and the impact they have on the world can’t be meaningfully controlled by them, since they can’t conceptualize it.
Blurring the lines even further, an ASI would understand the effect of exposing different truths to us about the nature of reality, so the types of priorities and trade offs it makes in communication has a compounding effect that will steer us in given directions. Another analogy is being driven around a foreign country by a trusted translator; their preferences will unavoidably dominate how you conceptualize and interact with the country even in the most benevolent scenarios.
I don’t think you can rescue a sense of control or “steering” from a world with superintelligence, aligned or not.
I think some level of “steering” is possible in a world with aligned AI.
Suppose someone made a super-intelligence that sat in it’s box, worked out if P=NP, and printed an answer of YES/NO/MAYBE. And then it shut itself down. (To be clear, this isn’t a box that the ASI can’t escape, it’s an ASI aligned to stay in it’s box)
A world with ASI, but where humans are in control is possible. It requires good alignment, and good coordination between humans. Although the “stay in box, and do one thing” alignment feels philosophically simpler than the “coherent extrapolated volition” alignment.
This means paying a large capabilities tax. Most of the strange wonderous and powerful things that ASI could make simply don’t exist in this world of boxed ASI.
Lets say you want to do something more useful than the P =NP bot above. You design an ASI to cure ageing. Its main output is a chemical formula in standard notation. This AI is carefully programmed to only think about the biochemistry, and only the biochemistry. It’s programmed to only go for a drug that works for standard drug biochemistry reasons. Anything at all weird, ask a human. If the humans can’t understand, don’t.
I do understand your second point, but perhaps the effect could be countered by simply instructing the aligned ASI to provide facts as objectively as possible and explicitly try to avoid steering.
Of course, the ASI would more or less perfectly be able to predict the human response and so will know ahead of time what the human response to be. But in the end I think what matters is that it’s still a human making the call which the AI respects, who would have made the same call even if the ASI (hypothetically) couldn’t know its full preferences.
If a parent was fully aligned with a child’s preferences and asks a question knowing the child’s answer, then do actions accordingly, does it matter if the parent knew what the child was going to answer in the first place?
I like the parent/child analogy. To apply it to the human/AI dynamic, we need to imagine that it’s mutually understood that the child will never grow up and that they’ll be served by the parent for the rest of time. Now, concretely think about what it means for a parent to be aligned with a child’s preferences. Does the parent arrange the world such that their child can get variations of their favorite candy and play video games all day? Or does the parent make the child study, so they get good grades compared to their peers and feel dignified? Or somewhere in between, based on how mad the child gets when deprived of the video game? The parent can constantly ask the child which angles they prefer, but the child can’t comprehend the deeper implications and even the framing of truths can get them to give predictably different answers.
The life that the child will live is entirely dependent on the parent’s preferences because affecting the world routes through the parent’s cognition. The child isn’t meaningfully “making a call” if they’re only making that specific call because their parent orchestrated the conditions for it, then presented a few options to them in bite sized pieces all the while knowing which one they’ll take (they can even load in the next candy before the kid asks for it).
The loss of agency I’m describing isn’t superficial. Another way to think about agency is in counterfactuals. I think there’s many possible benevolent ASIs that would cater to the child in drastically different ways such that the child would be in agreement and enthusiastic the whole time. Once we create a benevolent ASI, we’re entering a regime where our decisions are no longer the cause of changes in the world. Only things that the ASI prefers will happen, and it would steer us in that direction with full understanding. I think your argument is essentially “but if it thinks our preferences are really important we’re still in control in some sense”, I’m saying “if it’s a lot smarter than us it will have to make many subtle large and small decisions, and our preferences will be one small piece of a large machine. Our desires won’t be coherent at that scale and we won’t be able to make sense of what’s happening to engage with it.”
My biggest critique of this approach is that it takes too literally the analogy that we will eventually be to superintelligence what dogs are to humans, and extrapolates it to suggest that we will be just as helpless as dogs are today.
Even if this comparison of intelligence is true on relative terms, on absolute terms we are still much smarter than dogs are. We will still be able to logically comprehend (at a much simpler level relative to the AIs) what is good to us over a long term, in a way that dogs can’t. It follows that if we manage to create aligned AI (it will listen to us and dumb things down without maliciously misrepresenting what’s going on), we (well, some of us) will be able to steer the future.
Thank you, that’s an interesting point. I’ll try to lay out my counterargument as clearly as I can.
I mentioned dogs not because they have a specific level of intelligence relative to humans, but because they got a relatively good deal. Chimps are a lot smarter than dogs, and they’re worse off. Homo erectus had culturally transmitted tools, some art, seafaring craft of some sort, and possibly language. And they’re extinct. The only common factor across these cases is that runners-up in the intelligence of race didn’t get to make the important decisions.
In fact, AGI wouldn’t need to be much smarter than humans to outcompete us in the long run. For example, if it’s no smarter than the average Nobel Prize researcher, if it’s able to work productively for $1/hour, and if it’s able to copy-and-paste multiple copies of itself, then it would already be our evolutionary superior. We might be able to remain in charge for a while. But that’s sort of like how a multicellular organism can survive for many decades. But in the end, if nothing else kills them first, multicellular organisms tend to die of cancer. This is a case of local Darwinian incentives gradually eroding “cellular alignement” with the larger multicellular organism. Similarly, if the world consists of slow, expensive and frankly stupid humans, who can’t even pass down learned knowledge “genetically” with a simple copy-paste (how primitive!), and also highly cost-effective and intelligent AIs, then there’s a constant danger of alignment failing somewhere, and a “cancerous” AI replicator escaping control.
So even if we somehow manage to create “aligned” AI, I don’t expect that to last. When you’re too stupid and too expensive to be allowed anywhere near the real economy, you’re in a very dangerous long-term position.
I’m not convinced of this. Paul Graham once described something he called the Blub paradox. He explained this in terms of programming languages, but I suspect that it applies more broadly:
When we look “down”, chimps are obviously stupider than we are. They don’t have spoken language! They don’t have books! They can’t do real math! The can make “tools”, sure, but they’re basically pointy sticks, not factories, Space Shuttles, or computers. Their “economy” is based on family relationships and some individual reciprocity, and they don’t have even one joint stock company. Their idea of military strategy is to gang up in a band and go murder some other chimps, without understanding the role of non-commissioned officers or combined arms!
Chimps, to put it politely, have no clue.
But let’s trying looking “up” the intelligence spectrum? What do we see? Well, it looks sort of like funny humans with some weird extra stuff. The AIs can’t be that much smarter than we are, right? And if we ask nicely, I’m sure they can explain everything important to us.
But when the AIs look “down” towards Homo sapiens, they just shake their heads. Why, humans can’t even understand Z! Even if you take something really simple, like how isomorphisms between topoi and subsets of the lambda calculus make it trivial to design powerful custom programming languages for specific tasks, their eyes just glaze over! Even primitive baby AIs like Opus 4.5 could understand that. Can you imagine trying to explain to a human what replaced the econony, lol?
So here are some things which I expect to be true:
AI that was in the top 0.01% of human intelligence, that worked for a dollar an hour, and that could be replicated by copying a hard drive would already be enough to jeopardize human control of our futures.
Basic Darwinism suggests that highly resource-efficient replicators with a high rate of replication will ultimately tend to replicate.
Even weakly superintelligent AIs will have a broad range of powerful ideas and skills that humans are poorly equipped to understand, in much the same way that chimps don’t understand joint stock companies or combined arms warfare, or the way that Homo erectus doesn’t seem to have understood long-distance trade. This will make checking up on what the AIs are doing vastly harder.
My argument here is really just basic economics, politics and evolutionary biology. If you create something that renders human intellectual and physical labor economically worthless and evolutionarily uncompetitive, then the odds are excellent that you’re going to lose control. Maybe the AI will like keeping humans around as glorified pets! But that will be the AI’s decision, not ours.
Well, an aligned AI would do whatever the humans want.
If asked to not replicate even with the ability to, it wouldn’t. Or maybe you can tell it to replicate just enough to help you root out the actual AI replicators being built elsewhere, then stop at that point.
I think your argument does show how hard and fragile it is to deeply align AI in this way, though.
I don’t think you can rescue a sense of control or “steering” from a world with superintelligence, aligned or not. Even though we’re smarter than dogs, once you accept that an ASI more profoundly understands reality, we will be in an analogous situation to dogs. Dogs can’t conceptualize grocery stores, and yet we could dedicate ourselves to delivering them the best treats. Dogs might not care about how the supply chain is organized, but the kinds of treats they get and the impact they have on the world can’t be meaningfully controlled by them, since they can’t conceptualize it.
Blurring the lines even further, an ASI would understand the effect of exposing different truths to us about the nature of reality, so the types of priorities and trade offs it makes in communication has a compounding effect that will steer us in given directions. Another analogy is being driven around a foreign country by a trusted translator; their preferences will unavoidably dominate how you conceptualize and interact with the country even in the most benevolent scenarios.
I think some level of “steering” is possible in a world with aligned AI.
Suppose someone made a super-intelligence that sat in it’s box, worked out if P=NP, and printed an answer of YES/NO/MAYBE. And then it shut itself down. (To be clear, this isn’t a box that the ASI can’t escape, it’s an ASI aligned to stay in it’s box)
A world with ASI, but where humans are in control is possible. It requires good alignment, and good coordination between humans. Although the “stay in box, and do one thing” alignment feels philosophically simpler than the “coherent extrapolated volition” alignment.
This means paying a large capabilities tax. Most of the strange wonderous and powerful things that ASI could make simply don’t exist in this world of boxed ASI.
Lets say you want to do something more useful than the P =NP bot above. You design an ASI to cure ageing. Its main output is a chemical formula in standard notation. This AI is carefully programmed to only think about the biochemistry, and only the biochemistry. It’s programmed to only go for a drug that works for standard drug biochemistry reasons. Anything at all weird, ask a human. If the humans can’t understand, don’t.
I do understand your second point, but perhaps the effect could be countered by simply instructing the aligned ASI to provide facts as objectively as possible and explicitly try to avoid steering.
Of course, the ASI would more or less perfectly be able to predict the human response and so will know ahead of time what the human response to be. But in the end I think what matters is that it’s still a human making the call which the AI respects, who would have made the same call even if the ASI (hypothetically) couldn’t know its full preferences.
If a parent was fully aligned with a child’s preferences and asks a question knowing the child’s answer, then do actions accordingly, does it matter if the parent knew what the child was going to answer in the first place?
I like the parent/child analogy. To apply it to the human/AI dynamic, we need to imagine that it’s mutually understood that the child will never grow up and that they’ll be served by the parent for the rest of time. Now, concretely think about what it means for a parent to be aligned with a child’s preferences. Does the parent arrange the world such that their child can get variations of their favorite candy and play video games all day? Or does the parent make the child study, so they get good grades compared to their peers and feel dignified? Or somewhere in between, based on how mad the child gets when deprived of the video game? The parent can constantly ask the child which angles they prefer, but the child can’t comprehend the deeper implications and even the framing of truths can get them to give predictably different answers.
The life that the child will live is entirely dependent on the parent’s preferences because affecting the world routes through the parent’s cognition. The child isn’t meaningfully “making a call” if they’re only making that specific call because their parent orchestrated the conditions for it, then presented a few options to them in bite sized pieces all the while knowing which one they’ll take (they can even load in the next candy before the kid asks for it).
The loss of agency I’m describing isn’t superficial. Another way to think about agency is in counterfactuals. I think there’s many possible benevolent ASIs that would cater to the child in drastically different ways such that the child would be in agreement and enthusiastic the whole time. Once we create a benevolent ASI, we’re entering a regime where our decisions are no longer the cause of changes in the world. Only things that the ASI prefers will happen, and it would steer us in that direction with full understanding. I think your argument is essentially “but if it thinks our preferences are really important we’re still in control in some sense”, I’m saying “if it’s a lot smarter than us it will have to make many subtle large and small decisions, and our preferences will be one small piece of a large machine. Our desires won’t be coherent at that scale and we won’t be able to make sense of what’s happening to engage with it.”