I’m sure you’ve already thought about this, but it seems like the people who would be willing and able to jump through all of the hoops necessary would likely have a higher propensity towards power-seeking and dominance. So if you don’t edit the personality as well, what was it all for besides creating a smarter god-emperor? I think that in the sane world you’ve outlined where people deliberately avoid developing AGI, an additional level of sanity would be holding off on modifying intelligence until we have the capacity to perform the personality edits to make it safe.
I can just imagine this turning into a world where the rich who are able to make their children superbabies compete with the rest of the elite over whose child will end up ruling the world.
I’m sorry but I’d rather be turned into paper-clips then live in a world where a god-emperor can decide to torture me with their AGI-slave for the hell of it. How is that a better world for anyone but the god-emperor? But people are so blind and selfish, they just assume that they or their offspring would be god-emperor. At least with AI people are scared enough that they’re putting focused effort into trying to make it nice. People won’t put that much effort into their children.
I mean hell, figuring out personality editing would probably just make things backfire. People would choose to make their kids more ruthless, not less.
I certainly wouldn’t sign up to do that, but the type of individual I’m concerned about likely wouldn’t mind sacrificing nannies if their lineage could “win” in some abstract sense. I think it’s great that you’re proposing a plan beyond “pray the sand gods/Sam Altman are benevolent.” But alignment is going to be an issue for superhuman agents, regardless of if they’re human or not.
Agreed. I’ve actually had a post in draft for a couple of years that discusses some of the paralleles between alignment of AI agents and alignment of genetically engineered humans.
I think we have a huge advantage with humans simply because there isn’t the same potential for runaway self-improvement. But in the long term (multiple generations), it would be a concern.
If you look at the grim history of how humans have treated each other on this planet, I don’t think it’s justified to have a prior that this is gonna go well.
I think we have a huge advantage with humans simply because there isn’t the same potential for runaway self-improvement.
Humans didn’t have the potential for runaway self-improvement relative to apes. That was little comfort for the apes.
That sounds very interesting! I always look forward to reading your posts. I don’t know if you know any policy people, but in this world, it would need to be punishable by jail-time to genetically modify intelligence without selecting for pro-sociality. Any world where that is not the case seems much, much worse than just getting turned into paper-clips.
I think the runaway self-improvement problem is IMO vastly outweighed by other problems on aligning humans, like the fact that any control technique on AI would be illegal because of it being essentially equivalent to brainwashing, such that I consider AIs much more alignable than humans, and I think the human intelligence augmentation path is way more risky and fraught than people think for alignment purposes.
I agree. At least I can laugh if the AGI just decides it wants me as paperclips. There will be nothing to laugh about with ruthless power-seeking humans with godlike power.
like the fact that any control technique on AI would be illegal because of it being essentially equivalent to brainwashing, such that I consider AIs much more alignable than humans
A lot of (most?) humans end up nice without needing to be controlled / “aligned”, and I don’t particularly expect this to break if they grow up smarter. Trying to control / “align” them wouldn’t work anyway, which is also what I predict will happen with sufficiently smart AI.
I think this is my disagreement, in that I don’t think that most humans are in fact nice/aligned to each other by default, and the reason why this doesn’t lead to catastrophe broadly speaking is a combo of being able to rely on institutions/mechanism design that means even if people are misaligned, you can still get people well off under certain assumptions (capitalism and the rule of law being one such example), combined with the inequalities not being so great that individual humans can found their own societies, except in special cases.
Even here, I’d argue that human autocracies are very often misaligned to their citizens values very severely.
To be clear about what I’m not claiming, I’m not saying that alignment is worthless, or alignment always or very often fails, because it’s consistent with a world where >50-60% of alignment attempts are successful.
This means I’m generally much more scared of very outlier smart humans, for example a +7-12 SD human that was in power of a large group of citizens, assuming no other crippling disabilities unless they are very pro-social/aligned to their citizenry.
I’m not claiming that alignment will not work, or even that will very often not work, but rather that the chance of failure is real and the stakes are quite high long-term.
(And that’s not even addressing how you could get super-smart people to work on the alignment problem).
This is just a definition for the sake of definition, but I think you could define a human as aligned if they could be given an ASI slave and not be an S-risk. I really think that under this definition, the absolute upper bound of “aligned” humans is 5%, and I think it’s probably a lot lower.
I’m more optimistic, in that the upper bound could be as high as 50-60%, but yeah the people in power are unfortunately not part of this, and I’d only trust 25-30% of the population in practice if they had an ASI slave.
So you think that, for >95% of currently living humans, the implementation of their CEV would constitute an S-risk in the sense of being worse than extinction in expectation? This is not at all obvious to me; in what way do you expect their CEVs to prefer net suffering?
(And that’s not even addressing how you could get super-smart people to work on the alignment problem).
I mean if we actually succeeded at making people who are +7 SD in a meaningful way, I’d expect that at least good chunk of them would figure out for themselves that it makes sense to work on it.
That requires either massive personality changes to make them more persuadable, or massive willingness of people to put genetic changes in their germline, and I don’t expect either of these to happen before AI automates everything and either takes over, leaving us extinct or humans/other AI control/align AIs successfully.
(A key reason for this is that Genesmith admitted that the breakthroughs in germline engineering can’t transfer to the somatic side, and that means we’d have to wait 25-30 years in order for it to grow, minimum given that society won’t maximally favor the genetically lucky, and that’s way beyond most plausible AI timelines at this point)
Because they might consider that other problems are more worth their time, since smartness changes change their values little.
And maybe they believe that AI alignment isn’t impactful for technical/epistemic reasons.
I’m confused/surprised I need to make this point, because I don’t automatically think they will be persuaded that AI alignment is a big problem they will need to work on, and some effort will likely still need to be required.
Because they might consider that other problems are more worth their time, since smartness changes change their values little.
I mean if they care about solving problems at all, and we are in fact correct about AGI ruin, then they should predictably come to view it as the most important problem and start to work on it?
Are you imagining they’re super myopic or lazy and just want to think about math puzzles or something? If so, my reply is that even if some of them ended up like that, I’d be surprised if they all ended up like that, and if so that would be a failure of the enhancement. The aim isn’t to create people who we will then carefully persuade to work on the problem, the aim is for some of them to be smart + caring + wise enough to see the situation we’re in and decide for themselves to take it on.
More so that I’m imagining they might not even have heard of the argument, and it’s helpful to note that people like Terence Tao, Timothy Gowers and more are all excellent people at their chosen fields, but most people that have a big impact on the world don’t go into AI alignment.
Remember, superintelligence is not omniscience.
So I don’t expect them to be self motivated to work on this specific problem without at least a little persuasion.
I’d expect a few superintelligent adults to join alignment efforts, but nowhere near thousands or tens of thousands, and I’d upper bound it at 300-500 new researchers at most in 15-25 years.
How much probability do you assign to automating AI safety not working in time? Because I believe the preparing to automate AI safety work is probably the highest-value in pure ability to reduce X-risk probability, assuming it does work, so I assign much higher EV to automating AI safety, relative to other approaches.
I think I’m at <10% that non-enhanced humans will be able to align ASI in time, and if I condition on them succeeding somehow I don’t think it’s because they got AIs to do it for them. Like maybe you can automate some lower level things that might be useful (e.g. specific interpretability experiments), but at the end of the day someone has to understand in detail how the outcome is being steered or they’re NGMI. Not sure exactly what you mean by “automating AI safety”, but I think stronger forms of the idea are incoherent (e.g. “we’ll just get AI X to figure it all out for us” has the problem of requiring X to be aligned in the first place).
As far as what a plan to automate AI safety would work out in practice, assuming a relatively strong version of the concept is in this post below, and there will be another post that comes out by the same author talking more about the big risks discussed in the comments below:
In general, I think the crux is that in most timelines (at a lower bound, 65-70%) that have AGI developed relatively soon (so timelines from 2030-2045, roughly), and the alignment problem isn’t solvable by default/it’s at least non-trivially tricky to solved, conditioning on alignment success looks more like “we’ve successfully figured out how to prepare for AI automation of everything, and we managed to use alignment and control techniques well enough that we can safely pass most of the effort to AI”, rather than other end states like “humans are deeply enhanced” or “lawmakers actually coordinated to pause AI, and are actually giving funding to alignment organizations such that we can make AI safe.”
I think we have a huge advantage with humans simply because there isn’t the same potential for runaway self-improvement. But in the long term (multiple generations), it would be a concern.
How do you know you can afford to wait multiple generations? My guess is superhuman 6 year olds demonstrating their capabilities on YouTube is sufficient to start off an international arms race for more superhumans. (Increase number of people and increase capability level of each person.) And once the arms race is started it may never stop until the end state of this self-improvement is hit.
I mean hell, figuring out personality editing would probably just make things backfire. People would choose to make their kids more ruthless, not less.
Not at all obvious to me this is true. Do you mean to say a lot of people would, or just some small fraction, and you think a small fraction is enough to worry?
After I finish my methods article, I want to lay out a basic picture of genomic emancipation. Genomic emancipation means making genomic liberty a right and a practical option. In my vision, genomic liberty is quite broad: it would include for example that parents should be permitted and enabled to choose:
to enhance their children (e.g. supra-normal health; IQ at the outer edges of the human envelope); and/or
to propagate their own state even if others would object (e.g. blind people can choose to have blind children); and/or
to make their children more normal even if there’s no clear justification through beneficence (I would go so far as to say that, for example, parents can choose to make their kid have a lower IQ than a random embryo from the parents would be in expectation, if that brings the kid closer to what’s normal).
These principles are more narrow than general genomic liberty (“parents can do whatever they please”), and I think have stronger justifications. I want to make these narrower “tentpole” principles inside of the genomic liberty tent, because the wider principle isn’t really tenable, in part for the reasons you bring up. There are genomic choices that should be restricted—perhaps by law, or by professional ethics for clinicians, or by avoiding making it technically feasible, or by social stigma. (The implementation seems quite tricky; any compromise of full genomic liberty does come with costs as well as preventing costs. And at least to some small extent, it erodes the force of genomic liberty’s contraposition to eugenics, which seeks to impose population-wide forces on individual’s procreative choice.)
Examples:
As you say, if there’s a very high risk of truly egregious behavior, that should be pushed against somehow.
Example: People should not make someone who is 170 Disagreeable Quotient and 140 Unconscientiousness Quotient, because that is most of the way to being a violent psychopath.
Counterexample: People should, given good information, be able to choose to have a kid who is 130 Disagreeable Quotient and 115 Unconscientiousness Quotient, because, although there might be associated difficulties, that’s IIUC a personality profile enriched with creative genius.
People should not be allowed to create children with traits specifically designed to make the children suffer. (Imagine for instance a parent who thinks that suffering, in itself, builds character or makes you productive or something.)
Another thing to point out is that to a significant degree, in the longer-term, many of these things should self-correct, through the voice of the children (e.g. if a deaf kid grows up and starts saying “hey, listen, I love my parents and I know they wanted what was best for me, but I really don’t like that I didn’t get to hear music and my love’s voice until I got my brain implant, please don’t do the same for your kid”), and through seeing the results in general. If someone is destructively ruthless, it’s society’s job to punish them, and it’s parents’s job to say “ah, that is actually not good”.
In that case I’d repeat GeneSmith’s point from another comment: “I think we have a huge advantage with humans simply because there isn’t the same potential for runaway self-improvement.” If we have a whole bunch of super smart humans of roughly the same level who are aware of the problem, I don’t expect the ruthless ones to get a big advantage.
I mean I guess there is some sort of general concern here about how defense-offense imbalance changes as the population gets smarter. Like if there’s some easy way to destroy the world that becomes accessible with IQ > X, and we make a bunch of people with IQ > X, and a small fraction of them want to destroy the world for some reason, are the rest able to prevent it? This is sort of already the situation we’re in with AI: we look to be above the threshold of “ability to summon ASI”, but not above the threshold of “ability to steer the outcome”. In the case of AI, I expect making people smarter differentially speeds up alignment over capabilities: alignment is hard and we don’t know how to do it, while hill-climbing on capabilities is relatively easy and we already know how to do it.
I should also note that we have the option of concentrating early adoption among nice, sane, x-risk aware people (though I also find this kind of cringe in a way and predict this would be an unpopular move). I expect this to happen by default to some extent.
I’m sure you’ve already thought about this, but it seems like the people who would be willing and able to jump through all of the hoops necessary would likely have a higher propensity towards power-seeking and dominance. So if you don’t edit the personality as well, what was it all for besides creating a smarter god-emperor? I think that in the sane world you’ve outlined where people deliberately avoid developing AGI, an additional level of sanity would be holding off on modifying intelligence until we have the capacity to perform the personality edits to make it safe.
I can just imagine this turning into a world where the rich who are able to make their children superbabies compete with the rest of the elite over whose child will end up ruling the world.
I’m sorry but I’d rather be turned into paper-clips then live in a world where a god-emperor can decide to torture me with their AGI-slave for the hell of it. How is that a better world for anyone but the god-emperor? But people are so blind and selfish, they just assume that they or their offspring would be god-emperor. At least with AI people are scared enough that they’re putting focused effort into trying to make it nice. People won’t put that much effort into their children.
I mean hell, figuring out personality editing would probably just make things backfire. People would choose to make their kids more ruthless, not less.
It’s a fair concern. But the problem of predicting personality can be solved! We just need more data.
I also worry somewhat about brilliant psychopaths. But making your child a psychopath is not necessarily going to give them an advantage.
Also can you imagine how unpleasant raising a psychopath would be? I don’t think many parents would willingly sign up for that.
I certainly wouldn’t sign up to do that, but the type of individual I’m concerned about likely wouldn’t mind sacrificing nannies if their lineage could “win” in some abstract sense. I think it’s great that you’re proposing a plan beyond “pray the sand gods/Sam Altman are benevolent.” But alignment is going to be an issue for superhuman agents, regardless of if they’re human or not.
Agreed. I’ve actually had a post in draft for a couple of years that discusses some of the paralleles between alignment of AI agents and alignment of genetically engineered humans.
I think we have a huge advantage with humans simply because there isn’t the same potential for runaway self-improvement. But in the long term (multiple generations), it would be a concern.
If you look at the grim history of how humans have treated each other on this planet, I don’t think it’s justified to have a prior that this is gonna go well.
Humans didn’t have the potential for runaway self-improvement relative to apes. That was little comfort for the apes.
That sounds very interesting! I always look forward to reading your posts. I don’t know if you know any policy people, but in this world, it would need to be punishable by jail-time to genetically modify intelligence without selecting for pro-sociality. Any world where that is not the case seems much, much worse than just getting turned into paper-clips.
I think the runaway self-improvement problem is IMO vastly outweighed by other problems on aligning humans, like the fact that any control technique on AI would be illegal because of it being essentially equivalent to brainwashing, such that I consider AIs much more alignable than humans, and I think the human intelligence augmentation path is way more risky and fraught than people think for alignment purposes.
I agree. At least I can laugh if the AGI just decides it wants me as paperclips. There will be nothing to laugh about with ruthless power-seeking humans with godlike power.
A lot of (most?) humans end up nice without needing to be controlled / “aligned”, and I don’t particularly expect this to break if they grow up smarter. Trying to control / “align” them wouldn’t work anyway, which is also what I predict will happen with sufficiently smart AI.
I think this is my disagreement, in that I don’t think that most humans are in fact nice/aligned to each other by default, and the reason why this doesn’t lead to catastrophe broadly speaking is a combo of being able to rely on institutions/mechanism design that means even if people are misaligned, you can still get people well off under certain assumptions (capitalism and the rule of law being one such example), combined with the inequalities not being so great that individual humans can found their own societies, except in special cases.
Even here, I’d argue that human autocracies are very often misaligned to their citizens values very severely.
To be clear about what I’m not claiming, I’m not saying that alignment is worthless, or alignment always or very often fails, because it’s consistent with a world where >50-60% of alignment attempts are successful.
This means I’m generally much more scared of very outlier smart humans, for example a +7-12 SD human that was in power of a large group of citizens, assuming no other crippling disabilities unless they are very pro-social/aligned to their citizenry.
I’m not claiming that alignment will not work, or even that will very often not work, but rather that the chance of failure is real and the stakes are quite high long-term.
(And that’s not even addressing how you could get super-smart people to work on the alignment problem).
This is just a definition for the sake of definition, but I think you could define a human as aligned if they could be given an ASI slave and not be an S-risk. I really think that under this definition, the absolute upper bound of “aligned” humans is 5%, and I think it’s probably a lot lower.
I’m more optimistic, in that the upper bound could be as high as 50-60%, but yeah the people in power are unfortunately not part of this, and I’d only trust 25-30% of the population in practice if they had an ASI slave.
What would it mean for them to have an “ASI slave”? Like having an AI that implements their personal CEV?
Yeah something like that, the ASI is an extension of their will.
So you think that, for >95% of currently living humans, the implementation of their CEV would constitute an S-risk in the sense of being worse than extinction in expectation? This is not at all obvious to me; in what way do you expect their CEVs to prefer net suffering?
I mean if we actually succeeded at making people who are +7 SD in a meaningful way, I’d expect that at least good chunk of them would figure out for themselves that it makes sense to work on it.
That requires either massive personality changes to make them more persuadable, or massive willingness of people to put genetic changes in their germline, and I don’t expect either of these to happen before AI automates everything and either takes over, leaving us extinct or humans/other AI control/align AIs successfully.
(A key reason for this is that Genesmith admitted that the breakthroughs in germline engineering can’t transfer to the somatic side, and that means we’d have to wait 25-30 years in order for it to grow, minimum given that society won’t maximally favor the genetically lucky, and that’s way beyond most plausible AI timelines at this point)
If they’re that smart, why will they need to be persuaded?
Because they might consider that other problems are more worth their time, since smartness changes change their values little.
And maybe they believe that AI alignment isn’t impactful for technical/epistemic reasons.
I’m confused/surprised I need to make this point, because I don’t automatically think they will be persuaded that AI alignment is a big problem they will need to work on, and some effort will likely still need to be required.
I mean if they care about solving problems at all, and we are in fact correct about AGI ruin, then they should predictably come to view it as the most important problem and start to work on it?
Are you imagining they’re super myopic or lazy and just want to think about math puzzles or something? If so, my reply is that even if some of them ended up like that, I’d be surprised if they all ended up like that, and if so that would be a failure of the enhancement. The aim isn’t to create people who we will then carefully persuade to work on the problem, the aim is for some of them to be smart + caring + wise enough to see the situation we’re in and decide for themselves to take it on.
More so that I’m imagining they might not even have heard of the argument, and it’s helpful to note that people like Terence Tao, Timothy Gowers and more are all excellent people at their chosen fields, but most people that have a big impact on the world don’t go into AI alignment.
Remember, superintelligence is not omniscience.
So I don’t expect them to be self motivated to work on this specific problem without at least a little persuasion.
I’d expect a few superintelligent adults to join alignment efforts, but nowhere near thousands or tens of thousands, and I’d upper bound it at 300-500 new researchers at most in 15-25 years.
Much less impactful than automating AI safety.
I don’t think this will work.
How much probability do you assign to automating AI safety not working in time? Because I believe the preparing to automate AI safety work is probably the highest-value in pure ability to reduce X-risk probability, assuming it does work, so I assign much higher EV to automating AI safety, relative to other approaches.
I think I’m at <10% that non-enhanced humans will be able to align ASI in time, and if I condition on them succeeding somehow I don’t think it’s because they got AIs to do it for them. Like maybe you can automate some lower level things that might be useful (e.g. specific interpretability experiments), but at the end of the day someone has to understand in detail how the outcome is being steered or they’re NGMI. Not sure exactly what you mean by “automating AI safety”, but I think stronger forms of the idea are incoherent (e.g. “we’ll just get AI X to figure it all out for us” has the problem of requiring X to be aligned in the first place).
As far as what a plan to automate AI safety would work out in practice, assuming a relatively strong version of the concept is in this post below, and there will be another post that comes out by the same author talking more about the big risks discussed in the comments below:
https://www.lesswrong.com/posts/TTFsKxQThrqgWeXYJ/how-might-we-safely-pass-the-buck-to-ai
In general, I think the crux is that in most timelines (at a lower bound, 65-70%) that have AGI developed relatively soon (so timelines from 2030-2045, roughly), and the alignment problem isn’t solvable by default/it’s at least non-trivially tricky to solved, conditioning on alignment success looks more like “we’ve successfully figured out how to prepare for AI automation of everything, and we managed to use alignment and control techniques well enough that we can safely pass most of the effort to AI”, rather than other end states like “humans are deeply enhanced” or “lawmakers actually coordinated to pause AI, and are actually giving funding to alignment organizations such that we can make AI safe.”
How do you know you can afford to wait multiple generations? My guess is superhuman 6 year olds demonstrating their capabilities on YouTube is sufficient to start off an international arms race for more superhumans. (Increase number of people and increase capability level of each person.) And once the arms race is started it may never stop until the end state of this self-improvement is hit.
Not at all obvious to me this is true. Do you mean to say a lot of people would, or just some small fraction, and you think a small fraction is enough to worry?
I should have clarified, I meant a small fraction and that that is enough to worry.
After I finish my methods article, I want to lay out a basic picture of genomic emancipation. Genomic emancipation means making genomic liberty a right and a practical option. In my vision, genomic liberty is quite broad: it would include for example that parents should be permitted and enabled to choose:
to enhance their children (e.g. supra-normal health; IQ at the outer edges of the human envelope); and/or
to propagate their own state even if others would object (e.g. blind people can choose to have blind children); and/or
to make their children more normal even if there’s no clear justification through beneficence (I would go so far as to say that, for example, parents can choose to make their kid have a lower IQ than a random embryo from the parents would be in expectation, if that brings the kid closer to what’s normal).
These principles are more narrow than general genomic liberty (“parents can do whatever they please”), and I think have stronger justifications. I want to make these narrower “tentpole” principles inside of the genomic liberty tent, because the wider principle isn’t really tenable, in part for the reasons you bring up. There are genomic choices that should be restricted—perhaps by law, or by professional ethics for clinicians, or by avoiding making it technically feasible, or by social stigma. (The implementation seems quite tricky; any compromise of full genomic liberty does come with costs as well as preventing costs. And at least to some small extent, it erodes the force of genomic liberty’s contraposition to eugenics, which seeks to impose population-wide forces on individual’s procreative choice.)
Examples:
As you say, if there’s a very high risk of truly egregious behavior, that should be pushed against somehow.
Example: People should not make someone who is 170 Disagreeable Quotient and 140 Unconscientiousness Quotient, because that is most of the way to being a violent psychopath.
Counterexample: People should, given good information, be able to choose to have a kid who is 130 Disagreeable Quotient and 115 Unconscientiousness Quotient, because, although there might be associated difficulties, that’s IIUC a personality profile enriched with creative genius.
People should not be allowed to create children with traits specifically designed to make the children suffer. (Imagine for instance a parent who thinks that suffering, in itself, builds character or makes you productive or something.)
Case I’m unsure about, needs more investigation: Autism plus IQ might be associated with increased suicidal ideation (https://www.sciencedirect.com/science/article/abs/pii/S1074742722001228). Not sure what the implication should be.
Another thing to point out is that to a significant degree, in the longer-term, many of these things should self-correct, through the voice of the children (e.g. if a deaf kid grows up and starts saying “hey, listen, I love my parents and I know they wanted what was best for me, but I really don’t like that I didn’t get to hear music and my love’s voice until I got my brain implant, please don’t do the same for your kid”), and through seeing the results in general. If someone is destructively ruthless, it’s society’s job to punish them, and it’s parents’s job to say “ah, that is actually not good”.
In that case I’d repeat GeneSmith’s point from another comment: “I think we have a huge advantage with humans simply because there isn’t the same potential for runaway self-improvement.” If we have a whole bunch of super smart humans of roughly the same level who are aware of the problem, I don’t expect the ruthless ones to get a big advantage.
I mean I guess there is some sort of general concern here about how defense-offense imbalance changes as the population gets smarter. Like if there’s some easy way to destroy the world that becomes accessible with IQ > X, and we make a bunch of people with IQ > X, and a small fraction of them want to destroy the world for some reason, are the rest able to prevent it? This is sort of already the situation we’re in with AI: we look to be above the threshold of “ability to summon ASI”, but not above the threshold of “ability to steer the outcome”. In the case of AI, I expect making people smarter differentially speeds up alignment over capabilities: alignment is hard and we don’t know how to do it, while hill-climbing on capabilities is relatively easy and we already know how to do it.
I should also note that we have the option of concentrating early adoption among nice, sane, x-risk aware people (though I also find this kind of cringe in a way and predict this would be an unpopular move). I expect this to happen by default to some extent.