How much do people know about the genetic components of personality traits like empathy? Editing personality traits might be almost as or even more controversial than modifying “vanity” traits. But in the sane world you sketched out this could essentially be a very trivial and simple first step of alignment. “We are about to introduce agents more capable than any humans except for extreme outliers: let’s make them nice.” Also, curing personality disorders like NPD and BPD would do a lot of good for subjective wellbeing.
I guess I’m just thinking of a failure mode where we create superbabies who solve task-alignment and then control the world. The people running the world might be smarter than the current candidates for god-emperor, but we’re still in a god-emperor world. This also seems like the part of the plan most likely to fail. The people who would pursue making their children superbabies might be disinclined towards making their children more caring.
Very little at the moment. Unlike intelligence and health, a lot of the variance in personality traits seems to be the result of combinations of genes rather than purely additive effects.
This is one of the few areas where AI could potentially make a big difference. You need more complex models to figure out the relationship between genes and personality.
But the actual limiting factor right now is not model complexity, but rather data. Even if you have more complex models, I don’t think you’re going to be able to actually train them until you have a lot more data. Probably a minimum of a few million samples.
We’d like to look into this problem at some point and make scaling law graphs like the ones we made for intelligence and disease risk but haven’t had the time yet.
This is starting to sound a lot like AI actually. There’s a “capabilities problem” which is easy, an “alignment problem” which is hard, and people are charging ahead to work on capabilities while saying “gee, we’d really like to look into alignment at some point”.
Highly dependent on culturally transmitted info, including in-person.
Humans, genomically engineered or not, come with all the stuff that makes humans human. Fear, love, care, empathy, guilt, language, etc. (It should be banned, though, to remove any human universals, though defining that seems tricky.) So new humans are close to us in values-space, and come with the sort of corrigibility that humans have, which is, you know, not a guarantee of safety, but still some degree of (okay I’m going to say something that will trigger your buzzword detector but I think it’s a fairly precise description of something clearly real) radical openness to co-creating shared values.
Tell that to all the other species that went extinct as a result of our activity on this planet?
I think it’s possible that the first superbaby will be aligned, same way it’s possible that the first AGI will be aligned. But it’s far from a sure thing. It’s true that the alignment problem is considerably different in character for humans vs AIs. Yet even in this particular community, it’s far from solved—consider Brent Dill, Ziz, Sam Bankman-Fried, etc.
Not to mention all of history’s great villains, many of whom believed themselves to be superior to the people they afflicted. If we use genetic engineering to create humans which are actually, massively, undeniably superior to everyone else, surely that particular problem is only gonna get worse. If this enhancement technology is going to be widespread, we should be using the history of human activity on this planet as a prior. Especially the history of human behavior towards genetically distinct populations with overwhelming technological inferiority. And it’s not pretty.
So yeah, there are many concrete details which differ between these two situations. But in terms of high-level strategic implications, I think there are important similarities. Given the benefit of hindsight, what should MIRI have done about AI back in 2005? Perhaps that’s what we should be doing about superbabies now.
Tell that to all the other species that went extinct as a result of our activity on this planet?
Individual humans.
Brent Dill, Ziz, Sam Bankman-Fried, etc.
These are incredibly small peanuts compared to AGI omnicide.
You’re somehow leaving out all the people who are smarter than those people, and who were great for the people around them and humanity? You’ve got like 99% actually alignment or something, and you’re like “But there’s some chance it’ll go somewhat bad!”… Which, yes, we should think about this, and prepare and plan and prevent, but it’s just a totally totally different calculus from AGI.
I’d flag here that the 99% number seems very easy to falsify, solely based on the 20th century experience of both the 2 great wars, as well as the genocides/civil wars of the 20th century, and it’s quite often that one human group is vastly unaligned to another human group, causing mass strife and chaos.
I’m saying that (waves hands vigorously) 99% of people are beneficent or “neutral” (like, maybe not helpful / generous / proactively kind, but not actively harmful, even given the choice) in both intention and in action. That type of neutral already counts as in a totally different league of being aligned compared to AGI.
one human group is vastly unaligned to another human group
Ok, yes, conflict between large groups is something to be worried about, though I don’t much see the connection with germline engineering. I thought we were talking about, like, some liberal/techie/weirdo people have some really really smart kids, and then those kids are somehow a threat to the future of humanity that’s comparable to a fast unbounded recursive self-improvement AGI foom.
I’m saying that (waves hands vigorously) 99% of people are beneficent or “neutral” (like, maybe not helpful / generous / proactively kind, but not actively harmful, even given the choice) in both intention and in action. That type of neutral already counts as in a totally different league of being aligned compared to AGI.
I think this is ultimately the crux, at least relative to my values, I’d expect at least 20% in America to support active efforts to harm me or my allies/people I’m altruistic to, and do so fairly gleefully (an underrated example here is voting for people that will bring mass harm to groups they hate, and hope that certain groups go extinct).
Ok, yes, conflict between large groups is something to be worried about, though I don’t much see the connection with germline engineering. I thought we were talking about, like, some liberal/techie/weirdo people have some really really smart kids, and then those kids are somehow a threat to the future of humanity that’s comparable to a fast unbounded recursive self-improvement AGI foom.
Okay, the connection was to point out that lots of humans are not in fact aligned with each other, and I don’t particularly think superbabies are a threat to the future of humanity that is comparable to AGI, so my point was more so that the alignment problem is not naturally solved in human-to human interactions.
lots of humans are not in fact aligned with each other,
Ok… so I think I understand and agree with you here. (Though plausibly we’d still have significant disagreement; e.g. I think it would be feasible to bring even Hitler back and firmly away from the death fever if he spent, IDK, a few years or something with a very skilled listener / psychic helper.)
The issue in this discourse, to me, is comparing this with AGI misalignment. It’s conceptually related in some interesting ways, but in practical terms they’re just extremely quantitatively different. And, naturally, I care about this specific non-comparability being clear because it says whether to do human intelligence enhancement; and in fact many people cite this as a reason to not do human IE.
The issue in this discourse, to me, is comparing this with AGI misalignment. It’s conceptually related in some interesting ways, but in practical terms they’re just extremely quantitatively different. And, naturally, I care about this specific non-comparability being clear because it says whether to do human intelligence enhancement; and in fact many people cite this as a reason to not do human IE.
Re human vs AGI misalignment, I’d say this is true, in that human misalignments don’t threaten the human species, or even billions of people, whereas AI does, so in that regard I admit human misalignment is less impactful than AGI misalignment.
Of course, if we succeed at creating aligned AI, than human misalignments matter much, much more.
(Rest of the comment is a fun tangentially connected scenario, but ultimately is a hypothetical that doesn’t matter that much for AI alignment.)
Ok… so I think I understand and agree with you here. (Though plausibly we’d still have significant disagreement; e.g. I think it would be feasible to bring even Hitler back and firmly away from the death fever if he spent, IDK, a few years or something with a very skilled listener / psychic helper.)
At the very least, that would require him to not be in control of Germany by that point, and IMO most value change histories rely on changing their values in the child-teen years, because that’s when their sensitivity to data is maximal. After that, the plasticity/sensitivity of values goes way down when you are an adult, and changing values is much, much harder.
I’d say this is true, in that human misalignments don’t threaten the human species, or even billions of people, whereas AI does, so in that regard I admit human misalignment is less impactful than AGI misalignment.
Right, ok, agreed.
the plasticity/sensitivity of values goes way down when you are an adult, and changing values is much, much harder.
I agree qualitatively, but I do mean to say he’s in charge of Germany, but somehow has hours of free time every day to spend with the whisperer. If it’s in childhood I would guess you could do it with a lot less contact, though not sure. TBC, the whisperer here would be considered a world-class, like, therapist or coach or something, so I’m not saying it’s easy. My point is that I have a fair amount of trust in “human decision theory” working out pretty well in most cases in the long run with enough wisdom.
I even think something like this is worth trying with present-day AGI researchers (what I call “confrontation-worthy empathy”), though that is hard mode because you have so much less access.
I think it would be feasible to bring even Hitler back and firmly away from the death fever if he spent, IDK, a few years or something with a very skilled listener / psychic helper
There’s an important point to be made here that Hitler was not a genius, and in general the most evil people in history don’t correlate at all to being the smartest people in history. In fact, the smartest people in history generally seemed more likely to contribute positively to the development of humanity.
I would posit it’s easier to make a high IQ child good for society, with positive nurturing.
The alignment problem perhaps is thus less difficult with “super babies”, because they can more easily see the irrationality in poor ethics and think better from first principles, being grounded in the natural alignment that comes from the fact we are all humans with similar sentience (as opposed to AI which might as well be a different species altogether).
Given that Hitler’s actions resulted in his death and the destruction of Germany, a much higher childhood IQ might even have blunted his evil.
Also don’t buy the idea that very smart humans automatically assume control. I suspect Kamala, Biden, Hillary, etc all had a higher IQ than Donald Trump, but he became the most powerful person on the planet.
Does the size of this effect, according to you, depend on parameters of the technology? E.g. if it clearly has a ceiling, such that it’s just not feasible to make humans who are in a meaningful sense 10x more capable than the most capable non-germline-engineered human? E.g. if the technology is widespread, so that any person / group / state has access if they want it?
My interpretation is that you’re 99% of the way there in terms of work required if you start out with humans rather than creating a de novo mind, even if many/most humans currently or historically are not “aligned”. Like, you don’t need very many bits of information to end up with a nice “aligned” human. E.g. maybe you lightly select their genome for prosociality + niceness/altruism + wisdom, and treat them nicely while they’re growing up, and that suffices for the majority of them.
I’d actually maybe agree with this, though with the caveat that there’s a real possibility you will need a lot more selection/firepower as a human gets smarter, because you lack the ability to technically control humans in the way you can control AIs.
I’d probably bump that down to O(90%) at max, and this could get worse (I’m downranking based on the number of psychopaths/sociopaths and narcissists that exist).
These are incredibly small peanuts compared to AGI omnicide.
The jailbreakability and other alignment failures of current AI systems are also incredibly small peanuts compared to AGI omnicide. Yet they’re still informative. Small-scale failures give us data about possible large-scale failures.
You’re somehow leaving out all the people who are smarter than those people, and who were great for the people around them and humanity? You’ve got like 99% actually alignment or something
Are you thinking of people such as Sam Altman, Demis Hassabis, Elon Musk, and Dario Amodei? If humans are 99% aligned, how is it that we ended up in a situation where major lab leaders look so unaligned? MIRI and friends had a fair amount of influence to shape this situation and align lab leaders, yet they appear to have failed by their own lights. Why?
When it comes to AI alignment, everyone on this site understands that if a “boxed” AI acts nice, that’s not a strong signal of actual friendliness. The true test of an AI’s alignment is what it does when it has lots of power and little accountability.
Maybe something similar is going on for humans. We’re nice when we’re powerless, because we have to be. But giving humans lots of power with little accountability doesn’t tend to go well.
Looking around you, you mostly see nice humans. That could be because humans are inherently nice. It could also be because most of the people around you haven’t been given lots of power with little accountability.
Dramatic genetic enhancement could give enhanced humans lots of power with little accountability, relative to the rest of us.
[Note also, the humans you see while looking around are strongly selected for, which becomes quite relevant if the enhancement technology is widespread. How do you think you’d feel about humanity if you lived in Ukraine right now?]
Which, yes, we should think about this, and prepare and plan and prevent, but it’s just a totally totally different calculus from AGI.
I want to see actual, detailed calculations of p(doom) from supersmart humans vs supersmart AI, conditional on each technology being developed. Before charging ahead on this, I want a superforecaster-type person to sit down, spend a few hours, generate some probability estimates, publish a post, and request that others red-team their work. I don’t feel like that is a lot to ask.
Small-scale failures give us data about possible large-scale failures.
But you don’t go from a 160 IQ person with a lot of disagreeability and ambition, who ends up being a big commercial player or whatnot, to 195 IQ and suddenly get someone who just sits in their room for a decade and then speaks gibberish into a youtube livestream and everyone dies, or whatever. The large-scale failures aren’t feasible for humans acting alone. For humans acting very much not alone, like big AGI research companies, yeah that’s clearly a big problem. But I don’t think the problem is about any of the people you listed having too much brainpower.
(I feel we’re somewhat talking past each other, but I appreciate the conversation and still want to get where you’re coming from.)
For humans acting very much not alone, like big AGI research companies, yeah that’s clearly a big problem.
How about a group of superbabies that find and befriend each other? Then they’re no longer acting alone.
I don’t think the problem is about any of the people you listed having too much brainpower.
I don’t think problems caused by superbabies would look distinctively like “having too much brainpower”. They would look more like the ordinary problems humans have with each other. Brainpower would be a force multiplier.
(I feel we’re somewhat talking past each other, but I appreciate the conversation and still want to get where you’re coming from.)
Thanks. I mostly just want people to pay attention to this problem. I don’t feel like I have unique insight. I’ll probably stop commenting soon, since I think I’m hitting the point of diminishing returns.
I mostly just want people to pay attention to this problem.
Ok. To be clear, I strongly agree with this. I think I’ve been responding to a claim (maybe explicit, or maybe implicit / imagined by me) from you like: “There’s this risk, and therefore we should not do this.”. Where I want to disagree with the implication, not the antecedent. (I hope to more gracefully agree with things like this. Also someone should make a LW post with a really catchy term for this implication / antecedent discourse thing, or link me the one that’s already been written.)
But I do strongly disagree with the conclusion ”...we should not do this”, to the point where I say “We should basically do this as fast as possible, within the bounds of safety and sanity.”. The benefits are large, the risks look not that bad and largely ameliorable, and in particular the need regarding existential risk is great and urgent.
That said, more analysis is definitely needed. Though in defense of the pro-germline engineering position, there’s few resources, and everyone has a different objection.
I will go further, and say the human universals are nowhere near strong enough to assume that alignment of much more powerful people will automatically/likely happen, or that not aligning them produces benevolent results, and the reason for this is humans are already misaligned, in many cases very severely to each other, so allowing human augmentation without institutional reform makes things a lot worse by default.
It is better to solve the AI alignment problem first, then have a legal structure created by AIs that can make human genetic editing safe, rather than try to solve the human alignment problem:
I honestly think the EV of superhumans is lower than the EV for AI. sadism and wills to power are baked into almost every human mind (with the exception of outliers of course). force multiplying those instincts is much worse than an AI which simply decides to repurpose the atoms in a human for something else. i think people oftentimes act like the risk ends at existential risks, which i strongly disagree with. i would argue that everyone dying is actually a pretty great ending compared to hyperexistential risks. it is effectively +inf relative utility.
with AIs we’re essentially putting them through selective pressures to promote benevolence (as a hedge by the labs in case they don’t figure out intent alignment). that seems like a massive advantage compared to the evolutionary baggage associated with humans.
with humans you’d need the will and capability to engineer in at least +5sd empathy and −10sd sadism into every superbaby. but people wouldn’t want their children to make them feel like shitty people so they would want them to “be more normal.”
sadism and wills to power are baked into almost every human mind (with the exception of outliers of course). force multiplying those instincts is much worse than an AI which simply decides to repurpose the atoms in a human for something else.
I don’t think the result of intelligence enhancement would be “multiplying those instincts” for the vast majority of people; humans don’t seem to end up more sadistic as they get smarter and have more options.
i would argue that everyone dying is actually a pretty great ending compared to hyperexistential risks. it is effectively +inf relative utility.
I’m curious what value you assign to the ratio [U(paperclipped) - U(worst future)] / [U(best future) - U(paperclipped)]? It can’t be literally infinity unless U(paperclipped) = U(best future).
with humans you’d need the will and capability to engineer in at least +5sd empathy and −10sd sadism into every superbaby.
So your model is that we need to eradicate any last trace of sadism before superbabies is a good idea?
I’m sure you’ve already thought about this, but it seems like the people who would be willing and able to jump through all of the hoops necessary would likely have a higher propensity towards power-seeking and dominance. So if you don’t edit the personality as well, what was it all for besides creating a smarter god-emperor? I think that in the sane world you’ve outlined where people deliberately avoid developing AGI, an additional level of sanity would be holding off on modifying intelligence until we have the capacity to perform the personality edits to make it safe.
I can just imagine this turning into a world where the rich who are able to make their children superbabies compete with the rest of the elite over whose child will end up ruling the world.
I’m sorry but I’d rather be turned into paper-clips then live in a world where a god-emperor can decide to torture me with their AGI-slave for the hell of it. How is that a better world for anyone but the god-emperor? But people are so blind and selfish, they just assume that they or their offspring would be god-emperor. At least with AI people are scared enough that they’re putting focused effort into trying to make it nice. People won’t put that much effort into their children.
I mean hell, figuring out personality editing would probably just make things backfire. People would choose to make their kids more ruthless, not less.
I certainly wouldn’t sign up to do that, but the type of individual I’m concerned about likely wouldn’t mind sacrificing nannies if their lineage could “win” in some abstract sense. I think it’s great that you’re proposing a plan beyond “pray the sand gods/Sam Altman are benevolent.” But alignment is going to be an issue for superhuman agents, regardless of if they’re human or not.
Agreed. I’ve actually had a post in draft for a couple of years that discusses some of the paralleles between alignment of AI agents and alignment of genetically engineered humans.
I think we have a huge advantage with humans simply because there isn’t the same potential for runaway self-improvement. But in the long term (multiple generations), it would be a concern.
If you look at the grim history of how humans have treated each other on this planet, I don’t think it’s justified to have a prior that this is gonna go well.
I think we have a huge advantage with humans simply because there isn’t the same potential for runaway self-improvement.
Humans didn’t have the potential for runaway self-improvement relative to apes. That was little comfort for the apes.
That sounds very interesting! I always look forward to reading your posts. I don’t know if you know any policy people, but in this world, it would need to be punishable by jail-time to genetically modify intelligence without selecting for pro-sociality. Any world where that is not the case seems much, much worse than just getting turned into paper-clips.
I think the runaway self-improvement problem is IMO vastly outweighed by other problems on aligning humans, like the fact that any control technique on AI would be illegal because of it being essentially equivalent to brainwashing, such that I consider AIs much more alignable than humans, and I think the human intelligence augmentation path is way more risky and fraught than people think for alignment purposes.
I agree. At least I can laugh if the AGI just decides it wants me as paperclips. There will be nothing to laugh about with ruthless power-seeking humans with godlike power.
like the fact that any control technique on AI would be illegal because of it being essentially equivalent to brainwashing, such that I consider AIs much more alignable than humans
A lot of (most?) humans end up nice without needing to be controlled / “aligned”, and I don’t particularly expect this to break if they grow up smarter. Trying to control / “align” them wouldn’t work anyway, which is also what I predict will happen with sufficiently smart AI.
I think this is my disagreement, in that I don’t think that most humans are in fact nice/aligned to each other by default, and the reason why this doesn’t lead to catastrophe broadly speaking is a combo of being able to rely on institutions/mechanism design that means even if people are misaligned, you can still get people well off under certain assumptions (capitalism and the rule of law being one such example), combined with the inequalities not being so great that individual humans can found their own societies, except in special cases.
Even here, I’d argue that human autocracies are very often misaligned to their citizens values very severely.
To be clear about what I’m not claiming, I’m not saying that alignment is worthless, or alignment always or very often fails, because it’s consistent with a world where >50-60% of alignment attempts are successful.
This means I’m generally much more scared of very outlier smart humans, for example a +7-12 SD human that was in power of a large group of citizens, assuming no other crippling disabilities unless they are very pro-social/aligned to their citizenry.
I’m not claiming that alignment will not work, or even that will very often not work, but rather that the chance of failure is real and the stakes are quite high long-term.
(And that’s not even addressing how you could get super-smart people to work on the alignment problem).
This is just a definition for the sake of definition, but I think you could define a human as aligned if they could be given an ASI slave and not be an S-risk. I really think that under this definition, the absolute upper bound of “aligned” humans is 5%, and I think it’s probably a lot lower.
I’m more optimistic, in that the upper bound could be as high as 50-60%, but yeah the people in power are unfortunately not part of this, and I’d only trust 25-30% of the population in practice if they had an ASI slave.
So you think that, for >95% of currently living humans, the implementation of their CEV would constitute an S-risk in the sense of being worse than extinction in expectation? This is not at all obvious to me; in what way do you expect their CEVs to prefer net suffering?
(And that’s not even addressing how you could get super-smart people to work on the alignment problem).
I mean if we actually succeeded at making people who are +7 SD in a meaningful way, I’d expect that at least good chunk of them would figure out for themselves that it makes sense to work on it.
That requires either massive personality changes to make them more persuadable, or massive willingness of people to put genetic changes in their germline, and I don’t expect either of these to happen before AI automates everything and either takes over, leaving us extinct or humans/other AI control/align AIs successfully.
(A key reason for this is that Genesmith admitted that the breakthroughs in germline engineering can’t transfer to the somatic side, and that means we’d have to wait 25-30 years in order for it to grow, minimum given that society won’t maximally favor the genetically lucky, and that’s way beyond most plausible AI timelines at this point)
Because they might consider that other problems are more worth their time, since smartness changes change their values little.
And maybe they believe that AI alignment isn’t impactful for technical/epistemic reasons.
I’m confused/surprised I need to make this point, because I don’t automatically think they will be persuaded that AI alignment is a big problem they will need to work on, and some effort will likely still need to be required.
Because they might consider that other problems are more worth their time, since smartness changes change their values little.
I mean if they care about solving problems at all, and we are in fact correct about AGI ruin, then they should predictably come to view it as the most important problem and start to work on it?
Are you imagining they’re super myopic or lazy and just want to think about math puzzles or something? If so, my reply is that even if some of them ended up like that, I’d be surprised if they all ended up like that, and if so that would be a failure of the enhancement. The aim isn’t to create people who we will then carefully persuade to work on the problem, the aim is for some of them to be smart + caring + wise enough to see the situation we’re in and decide for themselves to take it on.
More so that I’m imagining they might not even have heard of the argument, and it’s helpful to note that people like Terence Tao, Timothy Gowers and more are all excellent people at their chosen fields, but most people that have a big impact on the world don’t go into AI alignment.
Remember, superintelligence is not omniscience.
So I don’t expect them to be self motivated to work on this specific problem without at least a little persuasion.
I’d expect a few superintelligent adults to join alignment efforts, but nowhere near thousands or tens of thousands, and I’d upper bound it at 300-500 new researchers at most in 15-25 years.
How much probability do you assign to automating AI safety not working in time? Because I believe the preparing to automate AI safety work is probably the highest-value in pure ability to reduce X-risk probability, assuming it does work, so I assign much higher EV to automating AI safety, relative to other approaches.
I think I’m at <10% that non-enhanced humans will be able to align ASI in time, and if I condition on them succeeding somehow I don’t think it’s because they got AIs to do it for them. Like maybe you can automate some lower level things that might be useful (e.g. specific interpretability experiments), but at the end of the day someone has to understand in detail how the outcome is being steered or they’re NGMI. Not sure exactly what you mean by “automating AI safety”, but I think stronger forms of the idea are incoherent (e.g. “we’ll just get AI X to figure it all out for us” has the problem of requiring X to be aligned in the first place).
As far as what a plan to automate AI safety would work out in practice, assuming a relatively strong version of the concept is in this post below, and there will be another post that comes out by the same author talking more about the big risks discussed in the comments below:
In general, I think the crux is that in most timelines (at a lower bound, 65-70%) that have AGI developed relatively soon (so timelines from 2030-2045, roughly), and the alignment problem isn’t solvable by default/it’s at least non-trivially tricky to solved, conditioning on alignment success looks more like “we’ve successfully figured out how to prepare for AI automation of everything, and we managed to use alignment and control techniques well enough that we can safely pass most of the effort to AI”, rather than other end states like “humans are deeply enhanced” or “lawmakers actually coordinated to pause AI, and are actually giving funding to alignment organizations such that we can make AI safe.”
I think we have a huge advantage with humans simply because there isn’t the same potential for runaway self-improvement. But in the long term (multiple generations), it would be a concern.
How do you know you can afford to wait multiple generations? My guess is superhuman 6 year olds demonstrating their capabilities on YouTube is sufficient to start off an international arms race for more superhumans. (Increase number of people and increase capability level of each person.) And once the arms race is started it may never stop until the end state of this self-improvement is hit.
I mean hell, figuring out personality editing would probably just make things backfire. People would choose to make their kids more ruthless, not less.
Not at all obvious to me this is true. Do you mean to say a lot of people would, or just some small fraction, and you think a small fraction is enough to worry?
After I finish my methods article, I want to lay out a basic picture of genomic emancipation. Genomic emancipation means making genomic liberty a right and a practical option. In my vision, genomic liberty is quite broad: it would include for example that parents should be permitted and enabled to choose:
to enhance their children (e.g. supra-normal health; IQ at the outer edges of the human envelope); and/or
to propagate their own state even if others would object (e.g. blind people can choose to have blind children); and/or
to make their children more normal even if there’s no clear justification through beneficence (I would go so far as to say that, for example, parents can choose to make their kid have a lower IQ than a random embryo from the parents would be in expectation, if that brings the kid closer to what’s normal).
These principles are more narrow than general genomic liberty (“parents can do whatever they please”), and I think have stronger justifications. I want to make these narrower “tentpole” principles inside of the genomic liberty tent, because the wider principle isn’t really tenable, in part for the reasons you bring up. There are genomic choices that should be restricted—perhaps by law, or by professional ethics for clinicians, or by avoiding making it technically feasible, or by social stigma. (The implementation seems quite tricky; any compromise of full genomic liberty does come with costs as well as preventing costs. And at least to some small extent, it erodes the force of genomic liberty’s contraposition to eugenics, which seeks to impose population-wide forces on individual’s procreative choice.)
Examples:
As you say, if there’s a very high risk of truly egregious behavior, that should be pushed against somehow.
Example: People should not make someone who is 170 Disagreeable Quotient and 140 Unconscientiousness Quotient, because that is most of the way to being a violent psychopath.
Counterexample: People should, given good information, be able to choose to have a kid who is 130 Disagreeable Quotient and 115 Unconscientiousness Quotient, because, although there might be associated difficulties, that’s IIUC a personality profile enriched with creative genius.
People should not be allowed to create children with traits specifically designed to make the children suffer. (Imagine for instance a parent who thinks that suffering, in itself, builds character or makes you productive or something.)
Another thing to point out is that to a significant degree, in the longer-term, many of these things should self-correct, through the voice of the children (e.g. if a deaf kid grows up and starts saying “hey, listen, I love my parents and I know they wanted what was best for me, but I really don’t like that I didn’t get to hear music and my love’s voice until I got my brain implant, please don’t do the same for your kid”), and through seeing the results in general. If someone is destructively ruthless, it’s society’s job to punish them, and it’s parents’s job to say “ah, that is actually not good”.
In that case I’d repeat GeneSmith’s point from another comment: “I think we have a huge advantage with humans simply because there isn’t the same potential for runaway self-improvement.” If we have a whole bunch of super smart humans of roughly the same level who are aware of the problem, I don’t expect the ruthless ones to get a big advantage.
I mean I guess there is some sort of general concern here about how defense-offense imbalance changes as the population gets smarter. Like if there’s some easy way to destroy the world that becomes accessible with IQ > X, and we make a bunch of people with IQ > X, and a small fraction of them want to destroy the world for some reason, are the rest able to prevent it? This is sort of already the situation we’re in with AI: we look to be above the threshold of “ability to summon ASI”, but not above the threshold of “ability to steer the outcome”. In the case of AI, I expect making people smarter differentially speeds up alignment over capabilities: alignment is hard and we don’t know how to do it, while hill-climbing on capabilities is relatively easy and we already know how to do it.
I should also note that we have the option of concentrating early adoption among nice, sane, x-risk aware people (though I also find this kind of cringe in a way and predict this would be an unpopular move). I expect this to happen by default to some extent.
There are some promising but under-utilized interventions for improving personality traits / virtues in already-developed humans,* and a dearth of research about possible interventions for others. If we want more of that sort of thing, we might be better advised to fill in some of those gaps rather than waiting for a new technology and a new generation of megalopsychebabies.
Putting aside for the moment the fact that even “intelligence” is hardly a well-defined and easily quantified property, isn’t it rather a giant leap to say we even know what a “better” personality is? I might agree that some disorders are reasonably well defined, and those might be candidates for trying to “fix”, but if you’re trying to match greater intelligence with “better” personality I think you first need a far better notion of what “better” personality actually means.
How much do people know about the genetic components of personality traits like empathy? Editing personality traits might be almost as or even more controversial than modifying “vanity” traits. But in the sane world you sketched out this could essentially be a very trivial and simple first step of alignment. “We are about to introduce agents more capable than any humans except for extreme outliers: let’s make them nice.” Also, curing personality disorders like NPD and BPD would do a lot of good for subjective wellbeing.
I guess I’m just thinking of a failure mode where we create superbabies who solve task-alignment and then control the world. The people running the world might be smarter than the current candidates for god-emperor, but we’re still in a god-emperor world. This also seems like the part of the plan most likely to fail. The people who would pursue making their children superbabies might be disinclined towards making their children more caring.
Very little at the moment. Unlike intelligence and health, a lot of the variance in personality traits seems to be the result of combinations of genes rather than purely additive effects.
This is one of the few areas where AI could potentially make a big difference. You need more complex models to figure out the relationship between genes and personality.
But the actual limiting factor right now is not model complexity, but rather data. Even if you have more complex models, I don’t think you’re going to be able to actually train them until you have a lot more data. Probably a minimum of a few million samples.
We’d like to look into this problem at some point and make scaling law graphs like the ones we made for intelligence and disease risk but haven’t had the time yet.
This is starting to sound a lot like AI actually. There’s a “capabilities problem” which is easy, an “alignment problem” which is hard, and people are charging ahead to work on capabilities while saying “gee, we’d really like to look into alignment at some point”.
It’s utterly different.
Humans are very far from fooming.
Fixed skull size; no in silico simulator.
Highly dependent on childhood care.
Highly dependent on culturally transmitted info, including in-person.
Humans, genomically engineered or not, come with all the stuff that makes humans human. Fear, love, care, empathy, guilt, language, etc. (It should be banned, though, to remove any human universals, though defining that seems tricky.) So new humans are close to us in values-space, and come with the sort of corrigibility that humans have, which is, you know, not a guarantee of safety, but still some degree of (okay I’m going to say something that will trigger your buzzword detector but I think it’s a fairly precise description of something clearly real) radical openness to co-creating shared values.
Tell that to all the other species that went extinct as a result of our activity on this planet?
I think it’s possible that the first superbaby will be aligned, same way it’s possible that the first AGI will be aligned. But it’s far from a sure thing. It’s true that the alignment problem is considerably different in character for humans vs AIs. Yet even in this particular community, it’s far from solved—consider Brent Dill, Ziz, Sam Bankman-Fried, etc.
Not to mention all of history’s great villains, many of whom believed themselves to be superior to the people they afflicted. If we use genetic engineering to create humans which are actually, massively, undeniably superior to everyone else, surely that particular problem is only gonna get worse. If this enhancement technology is going to be widespread, we should be using the history of human activity on this planet as a prior. Especially the history of human behavior towards genetically distinct populations with overwhelming technological inferiority. And it’s not pretty.
So yeah, there are many concrete details which differ between these two situations. But in terms of high-level strategic implications, I think there are important similarities. Given the benefit of hindsight, what should MIRI have done about AI back in 2005? Perhaps that’s what we should be doing about superbabies now.
Individual humans.
These are incredibly small peanuts compared to AGI omnicide.
You’re somehow leaving out all the people who are smarter than those people, and who were great for the people around them and humanity? You’ve got like 99% actually alignment or something, and you’re like “But there’s some chance it’ll go somewhat bad!”… Which, yes, we should think about this, and prepare and plan and prevent, but it’s just a totally totally different calculus from AGI.
I’d flag here that the 99% number seems very easy to falsify, solely based on the 20th century experience of both the 2 great wars, as well as the genocides/civil wars of the 20th century, and it’s quite often that one human group is vastly unaligned to another human group, causing mass strife and chaos.
I’m saying that (waves hands vigorously) 99% of people are beneficent or “neutral” (like, maybe not helpful / generous / proactively kind, but not actively harmful, even given the choice) in both intention and in action. That type of neutral already counts as in a totally different league of being aligned compared to AGI.
Ok, yes, conflict between large groups is something to be worried about, though I don’t much see the connection with germline engineering. I thought we were talking about, like, some liberal/techie/weirdo people have some really really smart kids, and then those kids are somehow a threat to the future of humanity that’s comparable to a fast unbounded recursive self-improvement AGI foom.
I think this is ultimately the crux, at least relative to my values, I’d expect at least 20% in America to support active efforts to harm me or my allies/people I’m altruistic to, and do so fairly gleefully (an underrated example here is voting for people that will bring mass harm to groups they hate, and hope that certain groups go extinct).
Okay, the connection was to point out that lots of humans are not in fact aligned with each other, and I don’t particularly think superbabies are a threat to the future of humanity that is comparable to AGI, so my point was more so that the alignment problem is not naturally solved in human-to human interactions.
Ok… so I think I understand and agree with you here. (Though plausibly we’d still have significant disagreement; e.g. I think it would be feasible to bring even Hitler back and firmly away from the death fever if he spent, IDK, a few years or something with a very skilled listener / psychic helper.)
The issue in this discourse, to me, is comparing this with AGI misalignment. It’s conceptually related in some interesting ways, but in practical terms they’re just extremely quantitatively different. And, naturally, I care about this specific non-comparability being clear because it says whether to do human intelligence enhancement; and in fact many people cite this as a reason to not do human IE.
Re human vs AGI misalignment, I’d say this is true, in that human misalignments don’t threaten the human species, or even billions of people, whereas AI does, so in that regard I admit human misalignment is less impactful than AGI misalignment.
Of course, if we succeed at creating aligned AI, than human misalignments matter much, much more.
(Rest of the comment is a fun tangentially connected scenario, but ultimately is a hypothetical that doesn’t matter that much for AI alignment.)
At the very least, that would require him to not be in control of Germany by that point, and IMO most value change histories rely on changing their values in the child-teen years, because that’s when their sensitivity to data is maximal. After that, the plasticity/sensitivity of values goes way down when you are an adult, and changing values is much, much harder.
Right, ok, agreed.
I agree qualitatively, but I do mean to say he’s in charge of Germany, but somehow has hours of free time every day to spend with the whisperer. If it’s in childhood I would guess you could do it with a lot less contact, though not sure. TBC, the whisperer here would be considered a world-class, like, therapist or coach or something, so I’m not saying it’s easy. My point is that I have a fair amount of trust in “human decision theory” working out pretty well in most cases in the long run with enough wisdom.
I even think something like this is worth trying with present-day AGI researchers (what I call “confrontation-worthy empathy”), though that is hard mode because you have so much less access.
There’s an important point to be made here that Hitler was not a genius, and in general the most evil people in history don’t correlate at all to being the smartest people in history. In fact, the smartest people in history generally seemed more likely to contribute positively to the development of humanity.
I would posit it’s easier to make a high IQ child good for society, with positive nurturing.
The alignment problem perhaps is thus less difficult with “super babies”, because they can more easily see the irrationality in poor ethics and think better from first principles, being grounded in the natural alignment that comes from the fact we are all humans with similar sentience (as opposed to AI which might as well be a different species altogether).
Given that Hitler’s actions resulted in his death and the destruction of Germany, a much higher childhood IQ might even have blunted his evil.
Also don’t buy the idea that very smart humans automatically assume control. I suspect Kamala, Biden, Hillary, etc all had a higher IQ than Donald Trump, but he became the most powerful person on the planet.
My estimate is 97% not sociopaths, but only about 60% inclined to avoid teaming up with sociopaths.
Germline engineering likely destroys most of what we’re trying to save, via group conflict effects. There’s a reason it’s taboo.
Does the size of this effect, according to you, depend on parameters of the technology? E.g. if it clearly has a ceiling, such that it’s just not feasible to make humans who are in a meaningful sense 10x more capable than the most capable non-germline-engineered human? E.g. if the technology is widespread, so that any person / group / state has access if they want it?
My interpretation is that you’re 99% of the way there in terms of work required if you start out with humans rather than creating a de novo mind, even if many/most humans currently or historically are not “aligned”. Like, you don’t need very many bits of information to end up with a nice “aligned” human. E.g. maybe you lightly select their genome for prosociality + niceness/altruism + wisdom, and treat them nicely while they’re growing up, and that suffices for the majority of them.
I’d actually maybe agree with this, though with the caveat that there’s a real possibility you will need a lot more selection/firepower as a human gets smarter, because you lack the ability to technically control humans in the way you can control AIs.
Also true, though maybe only for O(99%) of people.
I’d probably bump that down to O(90%) at max, and this could get worse (I’m downranking based on the number of psychopaths/sociopaths and narcissists that exist).
The jailbreakability and other alignment failures of current AI systems are also incredibly small peanuts compared to AGI omnicide. Yet they’re still informative. Small-scale failures give us data about possible large-scale failures.
Are you thinking of people such as Sam Altman, Demis Hassabis, Elon Musk, and Dario Amodei? If humans are 99% aligned, how is it that we ended up in a situation where major lab leaders look so unaligned? MIRI and friends had a fair amount of influence to shape this situation and align lab leaders, yet they appear to have failed by their own lights. Why?
When it comes to AI alignment, everyone on this site understands that if a “boxed” AI acts nice, that’s not a strong signal of actual friendliness. The true test of an AI’s alignment is what it does when it has lots of power and little accountability.
Maybe something similar is going on for humans. We’re nice when we’re powerless, because we have to be. But giving humans lots of power with little accountability doesn’t tend to go well.
Looking around you, you mostly see nice humans. That could be because humans are inherently nice. It could also be because most of the people around you haven’t been given lots of power with little accountability.
Dramatic genetic enhancement could give enhanced humans lots of power with little accountability, relative to the rest of us.
[Note also, the humans you see while looking around are strongly selected for, which becomes quite relevant if the enhancement technology is widespread. How do you think you’d feel about humanity if you lived in Ukraine right now?]
I want to see actual, detailed calculations of p(doom) from supersmart humans vs supersmart AI, conditional on each technology being developed. Before charging ahead on this, I want a superforecaster-type person to sit down, spend a few hours, generate some probability estimates, publish a post, and request that others red-team their work. I don’t feel like that is a lot to ask.
But you don’t go from a 160 IQ person with a lot of disagreeability and ambition, who ends up being a big commercial player or whatnot, to 195 IQ and suddenly get someone who just sits in their room for a decade and then speaks gibberish into a youtube livestream and everyone dies, or whatever. The large-scale failures aren’t feasible for humans acting alone. For humans acting very much not alone, like big AGI research companies, yeah that’s clearly a big problem. But I don’t think the problem is about any of the people you listed having too much brainpower.
(I feel we’re somewhat talking past each other, but I appreciate the conversation and still want to get where you’re coming from.)
How about a group of superbabies that find and befriend each other? Then they’re no longer acting alone.
I don’t think problems caused by superbabies would look distinctively like “having too much brainpower”. They would look more like the ordinary problems humans have with each other. Brainpower would be a force multiplier.
Thanks. I mostly just want people to pay attention to this problem. I don’t feel like I have unique insight. I’ll probably stop commenting soon, since I think I’m hitting the point of diminishing returns.
Ok. To be clear, I strongly agree with this. I think I’ve been responding to a claim (maybe explicit, or maybe implicit / imagined by me) from you like: “There’s this risk, and therefore we should not do this.”. Where I want to disagree with the implication, not the antecedent. (I hope to more gracefully agree with things like this. Also someone should make a LW post with a really catchy term for this implication / antecedent discourse thing, or link me the one that’s already been written.)
But I do strongly disagree with the conclusion ”...we should not do this”, to the point where I say “We should basically do this as fast as possible, within the bounds of safety and sanity.”. The benefits are large, the risks look not that bad and largely ameliorable, and in particular the need regarding existential risk is great and urgent.
That said, more analysis is definitely needed. Though in defense of the pro-germline engineering position, there’s few resources, and everyone has a different objection.
I will go further, and say the human universals are nowhere near strong enough to assume that alignment of much more powerful people will automatically/likely happen, or that not aligning them produces benevolent results, and the reason for this is humans are already misaligned, in many cases very severely to each other, so allowing human augmentation without institutional reform makes things a lot worse by default.
It is better to solve the AI alignment problem first, then have a legal structure created by AIs that can make human genetic editing safe, rather than try to solve the human alignment problem:
https://www.lesswrong.com/posts/DfrSZaf3JC8vJdbZL/how-to-make-superbabies#jgDtAPXwSucQhPBwf
I honestly think the EV of superhumans is lower than the EV for AI. sadism and wills to power are baked into almost every human mind (with the exception of outliers of course). force multiplying those instincts is much worse than an AI which simply decides to repurpose the atoms in a human for something else. i think people oftentimes act like the risk ends at existential risks, which i strongly disagree with. i would argue that everyone dying is actually a pretty great ending compared to hyperexistential risks. it is effectively +inf relative utility.
with AIs we’re essentially putting them through selective pressures to promote benevolence (as a hedge by the labs in case they don’t figure out intent alignment). that seems like a massive advantage compared to the evolutionary baggage associated with humans.
with humans you’d need the will and capability to engineer in at least +5sd empathy and −10sd sadism into every superbaby. but people wouldn’t want their children to make them feel like shitty people so they would want them to “be more normal.”
I don’t think the result of intelligence enhancement would be “multiplying those instincts” for the vast majority of people; humans don’t seem to end up more sadistic as they get smarter and have more options.
I’m curious what value you assign to the ratio [U(paperclipped) - U(worst future)] / [U(best future) - U(paperclipped)]? It can’t be literally infinity unless U(paperclipped) = U(best future).
So your model is that we need to eradicate any last trace of sadism before superbabies is a good idea?
Artificial wombs may remove this bottleneck.
No I mean like a person can’t 10x their compute.
I’m sure you’ve already thought about this, but it seems like the people who would be willing and able to jump through all of the hoops necessary would likely have a higher propensity towards power-seeking and dominance. So if you don’t edit the personality as well, what was it all for besides creating a smarter god-emperor? I think that in the sane world you’ve outlined where people deliberately avoid developing AGI, an additional level of sanity would be holding off on modifying intelligence until we have the capacity to perform the personality edits to make it safe.
I can just imagine this turning into a world where the rich who are able to make their children superbabies compete with the rest of the elite over whose child will end up ruling the world.
I’m sorry but I’d rather be turned into paper-clips then live in a world where a god-emperor can decide to torture me with their AGI-slave for the hell of it. How is that a better world for anyone but the god-emperor? But people are so blind and selfish, they just assume that they or their offspring would be god-emperor. At least with AI people are scared enough that they’re putting focused effort into trying to make it nice. People won’t put that much effort into their children.
I mean hell, figuring out personality editing would probably just make things backfire. People would choose to make their kids more ruthless, not less.
It’s a fair concern. But the problem of predicting personality can be solved! We just need more data.
I also worry somewhat about brilliant psychopaths. But making your child a psychopath is not necessarily going to give them an advantage.
Also can you imagine how unpleasant raising a psychopath would be? I don’t think many parents would willingly sign up for that.
I certainly wouldn’t sign up to do that, but the type of individual I’m concerned about likely wouldn’t mind sacrificing nannies if their lineage could “win” in some abstract sense. I think it’s great that you’re proposing a plan beyond “pray the sand gods/Sam Altman are benevolent.” But alignment is going to be an issue for superhuman agents, regardless of if they’re human or not.
Agreed. I’ve actually had a post in draft for a couple of years that discusses some of the paralleles between alignment of AI agents and alignment of genetically engineered humans.
I think we have a huge advantage with humans simply because there isn’t the same potential for runaway self-improvement. But in the long term (multiple generations), it would be a concern.
If you look at the grim history of how humans have treated each other on this planet, I don’t think it’s justified to have a prior that this is gonna go well.
Humans didn’t have the potential for runaway self-improvement relative to apes. That was little comfort for the apes.
That sounds very interesting! I always look forward to reading your posts. I don’t know if you know any policy people, but in this world, it would need to be punishable by jail-time to genetically modify intelligence without selecting for pro-sociality. Any world where that is not the case seems much, much worse than just getting turned into paper-clips.
I think the runaway self-improvement problem is IMO vastly outweighed by other problems on aligning humans, like the fact that any control technique on AI would be illegal because of it being essentially equivalent to brainwashing, such that I consider AIs much more alignable than humans, and I think the human intelligence augmentation path is way more risky and fraught than people think for alignment purposes.
I agree. At least I can laugh if the AGI just decides it wants me as paperclips. There will be nothing to laugh about with ruthless power-seeking humans with godlike power.
A lot of (most?) humans end up nice without needing to be controlled / “aligned”, and I don’t particularly expect this to break if they grow up smarter. Trying to control / “align” them wouldn’t work anyway, which is also what I predict will happen with sufficiently smart AI.
I think this is my disagreement, in that I don’t think that most humans are in fact nice/aligned to each other by default, and the reason why this doesn’t lead to catastrophe broadly speaking is a combo of being able to rely on institutions/mechanism design that means even if people are misaligned, you can still get people well off under certain assumptions (capitalism and the rule of law being one such example), combined with the inequalities not being so great that individual humans can found their own societies, except in special cases.
Even here, I’d argue that human autocracies are very often misaligned to their citizens values very severely.
To be clear about what I’m not claiming, I’m not saying that alignment is worthless, or alignment always or very often fails, because it’s consistent with a world where >50-60% of alignment attempts are successful.
This means I’m generally much more scared of very outlier smart humans, for example a +7-12 SD human that was in power of a large group of citizens, assuming no other crippling disabilities unless they are very pro-social/aligned to their citizenry.
I’m not claiming that alignment will not work, or even that will very often not work, but rather that the chance of failure is real and the stakes are quite high long-term.
(And that’s not even addressing how you could get super-smart people to work on the alignment problem).
This is just a definition for the sake of definition, but I think you could define a human as aligned if they could be given an ASI slave and not be an S-risk. I really think that under this definition, the absolute upper bound of “aligned” humans is 5%, and I think it’s probably a lot lower.
I’m more optimistic, in that the upper bound could be as high as 50-60%, but yeah the people in power are unfortunately not part of this, and I’d only trust 25-30% of the population in practice if they had an ASI slave.
What would it mean for them to have an “ASI slave”? Like having an AI that implements their personal CEV?
Yeah something like that, the ASI is an extension of their will.
So you think that, for >95% of currently living humans, the implementation of their CEV would constitute an S-risk in the sense of being worse than extinction in expectation? This is not at all obvious to me; in what way do you expect their CEVs to prefer net suffering?
I mean if we actually succeeded at making people who are +7 SD in a meaningful way, I’d expect that at least good chunk of them would figure out for themselves that it makes sense to work on it.
That requires either massive personality changes to make them more persuadable, or massive willingness of people to put genetic changes in their germline, and I don’t expect either of these to happen before AI automates everything and either takes over, leaving us extinct or humans/other AI control/align AIs successfully.
(A key reason for this is that Genesmith admitted that the breakthroughs in germline engineering can’t transfer to the somatic side, and that means we’d have to wait 25-30 years in order for it to grow, minimum given that society won’t maximally favor the genetically lucky, and that’s way beyond most plausible AI timelines at this point)
If they’re that smart, why will they need to be persuaded?
Because they might consider that other problems are more worth their time, since smartness changes change their values little.
And maybe they believe that AI alignment isn’t impactful for technical/epistemic reasons.
I’m confused/surprised I need to make this point, because I don’t automatically think they will be persuaded that AI alignment is a big problem they will need to work on, and some effort will likely still need to be required.
I mean if they care about solving problems at all, and we are in fact correct about AGI ruin, then they should predictably come to view it as the most important problem and start to work on it?
Are you imagining they’re super myopic or lazy and just want to think about math puzzles or something? If so, my reply is that even if some of them ended up like that, I’d be surprised if they all ended up like that, and if so that would be a failure of the enhancement. The aim isn’t to create people who we will then carefully persuade to work on the problem, the aim is for some of them to be smart + caring + wise enough to see the situation we’re in and decide for themselves to take it on.
More so that I’m imagining they might not even have heard of the argument, and it’s helpful to note that people like Terence Tao, Timothy Gowers and more are all excellent people at their chosen fields, but most people that have a big impact on the world don’t go into AI alignment.
Remember, superintelligence is not omniscience.
So I don’t expect them to be self motivated to work on this specific problem without at least a little persuasion.
I’d expect a few superintelligent adults to join alignment efforts, but nowhere near thousands or tens of thousands, and I’d upper bound it at 300-500 new researchers at most in 15-25 years.
Much less impactful than automating AI safety.
I don’t think this will work.
How much probability do you assign to automating AI safety not working in time? Because I believe the preparing to automate AI safety work is probably the highest-value in pure ability to reduce X-risk probability, assuming it does work, so I assign much higher EV to automating AI safety, relative to other approaches.
I think I’m at <10% that non-enhanced humans will be able to align ASI in time, and if I condition on them succeeding somehow I don’t think it’s because they got AIs to do it for them. Like maybe you can automate some lower level things that might be useful (e.g. specific interpretability experiments), but at the end of the day someone has to understand in detail how the outcome is being steered or they’re NGMI. Not sure exactly what you mean by “automating AI safety”, but I think stronger forms of the idea are incoherent (e.g. “we’ll just get AI X to figure it all out for us” has the problem of requiring X to be aligned in the first place).
As far as what a plan to automate AI safety would work out in practice, assuming a relatively strong version of the concept is in this post below, and there will be another post that comes out by the same author talking more about the big risks discussed in the comments below:
https://www.lesswrong.com/posts/TTFsKxQThrqgWeXYJ/how-might-we-safely-pass-the-buck-to-ai
In general, I think the crux is that in most timelines (at a lower bound, 65-70%) that have AGI developed relatively soon (so timelines from 2030-2045, roughly), and the alignment problem isn’t solvable by default/it’s at least non-trivially tricky to solved, conditioning on alignment success looks more like “we’ve successfully figured out how to prepare for AI automation of everything, and we managed to use alignment and control techniques well enough that we can safely pass most of the effort to AI”, rather than other end states like “humans are deeply enhanced” or “lawmakers actually coordinated to pause AI, and are actually giving funding to alignment organizations such that we can make AI safe.”
How do you know you can afford to wait multiple generations? My guess is superhuman 6 year olds demonstrating their capabilities on YouTube is sufficient to start off an international arms race for more superhumans. (Increase number of people and increase capability level of each person.) And once the arms race is started it may never stop until the end state of this self-improvement is hit.
Not at all obvious to me this is true. Do you mean to say a lot of people would, or just some small fraction, and you think a small fraction is enough to worry?
I should have clarified, I meant a small fraction and that that is enough to worry.
After I finish my methods article, I want to lay out a basic picture of genomic emancipation. Genomic emancipation means making genomic liberty a right and a practical option. In my vision, genomic liberty is quite broad: it would include for example that parents should be permitted and enabled to choose:
to enhance their children (e.g. supra-normal health; IQ at the outer edges of the human envelope); and/or
to propagate their own state even if others would object (e.g. blind people can choose to have blind children); and/or
to make their children more normal even if there’s no clear justification through beneficence (I would go so far as to say that, for example, parents can choose to make their kid have a lower IQ than a random embryo from the parents would be in expectation, if that brings the kid closer to what’s normal).
These principles are more narrow than general genomic liberty (“parents can do whatever they please”), and I think have stronger justifications. I want to make these narrower “tentpole” principles inside of the genomic liberty tent, because the wider principle isn’t really tenable, in part for the reasons you bring up. There are genomic choices that should be restricted—perhaps by law, or by professional ethics for clinicians, or by avoiding making it technically feasible, or by social stigma. (The implementation seems quite tricky; any compromise of full genomic liberty does come with costs as well as preventing costs. And at least to some small extent, it erodes the force of genomic liberty’s contraposition to eugenics, which seeks to impose population-wide forces on individual’s procreative choice.)
Examples:
As you say, if there’s a very high risk of truly egregious behavior, that should be pushed against somehow.
Example: People should not make someone who is 170 Disagreeable Quotient and 140 Unconscientiousness Quotient, because that is most of the way to being a violent psychopath.
Counterexample: People should, given good information, be able to choose to have a kid who is 130 Disagreeable Quotient and 115 Unconscientiousness Quotient, because, although there might be associated difficulties, that’s IIUC a personality profile enriched with creative genius.
People should not be allowed to create children with traits specifically designed to make the children suffer. (Imagine for instance a parent who thinks that suffering, in itself, builds character or makes you productive or something.)
Case I’m unsure about, needs more investigation: Autism plus IQ might be associated with increased suicidal ideation (https://www.sciencedirect.com/science/article/abs/pii/S1074742722001228). Not sure what the implication should be.
Another thing to point out is that to a significant degree, in the longer-term, many of these things should self-correct, through the voice of the children (e.g. if a deaf kid grows up and starts saying “hey, listen, I love my parents and I know they wanted what was best for me, but I really don’t like that I didn’t get to hear music and my love’s voice until I got my brain implant, please don’t do the same for your kid”), and through seeing the results in general. If someone is destructively ruthless, it’s society’s job to punish them, and it’s parents’s job to say “ah, that is actually not good”.
In that case I’d repeat GeneSmith’s point from another comment: “I think we have a huge advantage with humans simply because there isn’t the same potential for runaway self-improvement.” If we have a whole bunch of super smart humans of roughly the same level who are aware of the problem, I don’t expect the ruthless ones to get a big advantage.
I mean I guess there is some sort of general concern here about how defense-offense imbalance changes as the population gets smarter. Like if there’s some easy way to destroy the world that becomes accessible with IQ > X, and we make a bunch of people with IQ > X, and a small fraction of them want to destroy the world for some reason, are the rest able to prevent it? This is sort of already the situation we’re in with AI: we look to be above the threshold of “ability to summon ASI”, but not above the threshold of “ability to steer the outcome”. In the case of AI, I expect making people smarter differentially speeds up alignment over capabilities: alignment is hard and we don’t know how to do it, while hill-climbing on capabilities is relatively easy and we already know how to do it.
I should also note that we have the option of concentrating early adoption among nice, sane, x-risk aware people (though I also find this kind of cringe in a way and predict this would be an unpopular move). I expect this to happen by default to some extent.
There are some promising but under-utilized interventions for improving personality traits / virtues in already-developed humans,* and a dearth of research about possible interventions for others. If we want more of that sort of thing, we might be better advised to fill in some of those gaps rather than waiting for a new technology and a new generation of megalopsychebabies.
Imagine Star Trek if Khan were also engineered to be a superhumanly moral person.
Putting aside for the moment the fact that even “intelligence” is hardly a well-defined and easily quantified property, isn’t it rather a giant leap to say we even know what a “better” personality is? I might agree that some disorders are reasonably well defined, and those might be candidates for trying to “fix”, but if you’re trying to match greater intelligence with “better” personality I think you first need a far better notion of what “better” personality actually means.