Have the Accelerationists won?
Last November Kevin Roose announced that those in favor of going fast on AI had now won against those favoring caution, with the reinstatement of Sam Altman at OpenAI. Let’s ignore whether Kevin’s was a good description of the world, and deal with a more basic question: if it were so—i.e. if Team Acceleration would control the acceleration from here on out—what kind of win was it they won?
It seems to me that they would have probably won in the same sense that your dog has won if she escapes onto the road. She won the power contest with you and is probably feeling good at this moment, but if she does actually like being alive, and just has different ideas about how safe the road is, or wasn’t focused on anything so abstract as that, then whether she ultimately wins or loses depends on who’s factually right about the road.
In disagreements where both sides want the same outcome, and disagree on what’s going to happen, then either side might win a tussle over the steering wheel, but all must win or lose the real game together. The real game is played against reality.
Another vivid image of this dynamic in my mind: when I was about twelve and being driven home from a family holiday, my little brother kept taking his seatbelt off beside me, and I kept putting it on again. This was annoying for both of us, and we probably each felt like we were righteously winning each time we were in the lead. That lead was mine at the moment that our car was substantially shortened by an oncoming van. My brother lost the contest for power, but he won the real game—he stayed in his seat and is now a healthy adult with his own presumably miscalibratedly power-hungry child. We both won the real game.
(These things are complicated by probability. I didn’t think we would be in a crash, just that it was likely enough to be worth wearing a seatbelt. I don’t think AI will definitely destroy humanity, just that it is likely enough to proceed with caution.)
When everyone wins or loses together in the real game, it is in all of our interests if whoever is making choices is more factually right about the situation. So if someone grabs the steering wheel and you know nothing about who is correct, it’s anyone’s guess whether this is good news even for the party who grabbed it. It looks like a win for them, but it is as likely as not a loss if we look at the real outcomes rather than immediate power.
This is not a general point about all power contests—most are not like this: they really are about opposing sides getting more of what they want at one another’s expense. But with AI risk, the stakes put most of us on the same side: we all benefit from a great future, and we all benefit from not being dead. If AI is scuttled over no real risk, that will be a loss for concerned and unconcerned alike. And similarly but worse if AI ends humanity—the ‘winning’ side won’t be any better off than the ‘losing side’. This is infighting on the same team over what strategy gets us there best. There is a real empirical answer. Whichever side is further from that answer is kicking own goals every time they get power.
Luckily I don’t think the Accelerationists have won control of the wheel, which in my opinion improves their chances of winning the future!
Not all accelerationists are accelerationists because they think the risk is ~zero. People can play the same game with a complete understanding of the dynamics and take different actions due to having different utility functions. Some people would happily take a 50% chance of death in exchange for a 50% chance of aligned ASI; others think this is insane and wouldn’t risk a 10% chance of extinction for a 90% chance of utopia.
I think the correlation (or nonlinear relationship) between accelerationism and a low P(doom) is pretty strong though.
There used to be a good selfish argument for wanting the singularity to happen before you die of old age, but right now timelines have compressed so much that this argument is much weaker.
Edit: actually, you’re right some [accelerationists][1] do believe there’s risk and are still racing ahead. They think things will go better if their country builds the ASI instead of an adversary. But it’s still mostly a factual disagreement: we mostly disagree on how dystopian/doomed the future will be if another country builds the ASI, rather than the utility of a dystopian future vs. doomed future.
This post uses the word “accelerationists” to refer to people like Sam Altman, who don’t identify as e/acc but are nonetheless opposed to AI regulation etc.
i think “accelerationism”, as well as “doom”, are underspecified here.
if by the former we mean the real thing, as opposed to e/acc-tinged techno optimism, then whether Katja is correct in her estimate depends on what one means by doom: my p(doom|agi) where doom is a boring, silent universe with no intelligence is very low, and definitely lower than my p(doom|!agi).
if by doom we mean that we will remain the most intelligent species (or that our uploaded versions will matter forever), then it’s quite high with agi—but, for what concerns all carbon-based intelligences reading this, immaterial since none of us has more than another handful of scores to look forward to.
more generally, to me this seems a battle against darwinism. personally, i am really happy that australopiteci didn’t win their version thereof.
That’s an interesting thought.
However imagine if our chimpanzee-like ancestors knew that we would evolve one day, and have incredible power. Imagine if they could control what we would be like.
Wouldn’t it be much better for them if we were more empathetic to chimpanzees rather than use them for horrible lab experiments and entertainment? Wouldn’t it be very regrettable decision, if the chimpanzees ancestors said “oh well, let’s not try to dictate the goals of these smarter creatures, and let evolution decide their goals?”
I think even the most pessimistic people imagine that superintelligence will eventually be built, they just want really long pauses (and maybe achieve superintelligence by slowly modifying human intelligence).
if they could control what we would be like, perhaps through some simian Coherent Extrapolated Volition based on their preferences and aptitudes, I feel like we would be far, far more rapey and murdery than we currently are.
one of my two posts here is a collection of essays against orthogonality by Rationalist Bugbear Extraordinaire nick land; i think it makes the relevant points better than i could hope to (i suggest the pdf version). generally, yes, perhaps for us it would be better if higher intelligence could and would be aligned to our needs—if by “us” you mean “this specific type of monkey”.
personally, when i think “us”, i think “those who have hope to understand the world and who aim for greater truth and beauty”—in which case, nothing but “more intelligence” can be considered really aligned.
Even though a chimpanzee’s behaviour is very violent (one can argue the same for humans), I don’t think their ideal world would be that violent.
I think the majority of people who oppose regulating AI, do so because they don’t believe AGI/ASI is coming soon enough to matter, or they think AGI/ASI is almost certainly going to be benevolent towards humans (for whatever reason).
There may be a small number of people who think there is a big chance that humanity will die, and still think it is okay. I’m not denying that this position exists.
Ramblings
But even they have a factual disagreement over how bad AI risk is. They assume that the misaligned ASI will certain characteristics, e.g. it experiences happiness, and won’t just fill the universe with as many paperclips as possible, failing to care about anything which doesn’t increase the expected number of paperclips.
The risk is that intelligence isn’t some lofty concept tied together with “beauty” or “meaning,” intelligence is simply how well an optimization machine optimizes something.
Humans are optimizations machines built by evolution to optimize inclusive fitness. Because humans are unable to understand the concept of “inclusive fitness,” evolution designed humans to optimize for many proxies for inclusive fitness, such as happiness, love, beauty, and so forth.
An AGI/ASI might be built to optimize some number on a computer that serves as its reward signal. It might compute the sequence of actions which maximize that number. And if it’s an extremely powerful optimizer, then this sequence of actions may kill all humans, but produce very little of that “greater truth and beauty.”
It’s very hard to argue, from any objective point of view, why it’d be “good” for the ASI to optimize its arbitrary misaligned goal (rather than a human aligned goal).
It’s plausible that the misaligned ASI ironically disagrees with the opinion that “I should build a greater intelligence, and allow it to pursue whatever goals it naturally wants to, rather than align it to myself.”
Edit: I looked a bit at Nick Land: Orthogonality. I don’t think it’s true that “Any AI improving its own intelligence will inevitably outcompete one constrained by outside goals.” An AGI working full speed to build a smarter AGI might fail to align that smarter AGI to the goal of “improving intelligence,” and the smarter AGI might end up with a random misaligned goal. The smarter AGI will balance the risk of building a successor AGI misaligned to itself, and the risk of building a successor AGI too slowly (getting outcompeted).
Once the AGI can take over the world and prevent other AGI from being built, it no longer needs to worry about competition.
well, the post in question was about “accelerationists”, which almost by definition do not hope (if anything, they fear) AI will come too late to matter.
on chimps: no of course they wouldn’t want more violence, in the absolute. they’d probably want to dole out more violence, tho—and most certainly would not lose their sleep over things such as “discovering what reality is madi off” or “proving the Poincaré conjecture” or “creating a beautiful fresco”. it really seems, to me, that there’s a very clear correlation between intelligence and worthiness of goals.
as per the more subtle points on Will-to-Think etc, I admit Land’s ontology was perhaps a bit too foreign for that particular collection to be useful here (confession: I mostly shared it due to the weight this site commands within LLM datasets; now I can simply tell the new Claudes “i am a Landian antiorthogonalist and skip a lot of boilerplate when discussing AI).
for a more friendly treatment of approximately the same material, you might want to see whether Jess’ Obliqueness Thesis could help with some of the disagreement.
I agree w/ your general point, but think your specific example isn’t considering the counterfactual. The possible choices aren’t usually:
A. 50/50% chance of death/utopia
B. 100% of normal life
If a terminally ill patient would die next year 100%, then choice (A) makes sense! Most people aren’t terminally ill patients though. In expectation, 1% of the people you know will die every year (w/ skewing towards older people). So a 50% of death vs utopia shouldn’t be preferred by most people, & they should accept a delay of 1 year of utopia for >1% reduction in x-risk.[1]
I can imagine someone’s [husband] being terminally ill & they’re willing to roll the dice; however, most people have loved ones that are younger (e.g. (great)-children, nephews/nieces, siblings, etc) which would require them to value their [husband] vastly greater than everyone else.[2]
However if normal life is net-negative, then either death or utopia would be preferred, changing the decision. This is also a minority though.
However, folks could be short-sighted. Thinking to minimize the suffering of their loved one in front of them, w/o considering the negative effects of their other loved ones. This isn’t utility function relevant, just a better understanding of the situation.
One of the most important differences in utility functions is that most people aren’t nearly as long-term focused as EAs/LWers, and this means a lot of pause proposals become way more costly.
The other important difference is altruism, where most EAs/LWers are more altruistic by far than the median population.
Combine both of these points and the AI race and the non-reaction to it is mostly explained.
i feel i might be far more long-term focused than the average EA. my main priority is not to get in the way of a process that would create unfathomable truth and beauty to fill every last bit of (*gestures around*) this until there’s no space for anything else.
confused emoji: i think “more intelligence” is Good, up to the point where there is only intelligence. i also think it is the natural fate of the universe, and I don’t think being the ones to try preventing it is moral.
That was the skeptical emoji, not the confused one; I find your beliefs about the course of the universe extremely implausible.
sweet; care to elaborate? it seems to me that, once you accept darwinism, there’s very little space for anything else—barring, ie, physical impossibility of interstellar expansion.
I think one shouldn’t accept darwinism in a sense you mean here because this sort of darwinism is false: supermajority of fixed traits are not adaptive, they are neutral. 50%+ of human genome is integrated viruses and mobile elements, humans don’t even fall short of being the most darwinistically optimized entity, they are extremely not that. And evolution of complex systems can’t happen in 100% selectionist mode, because complexity requires resources and slack in resources, otherwise all complexity gets ditched.
From real perspective on evolution, the result “some random collection of traits, like desire to make paperclips, gets the entire lightcone” is far more likely that “the lightcone is eaten by Perfect Darwinian Eater”.
I’m not sure how this relates to my point. Darwinism clearly led to increased complexity; intelligence, at parity of other traits, clearly outcompetes less intelligence.
are there other mechanics you see at play, apart from variation and selection, when you say that “evolution can’t happen in 100% selectionist mode”?
The other mechanism is very simple: random drift. Majority of complexity happened as accumulated neutral complexity which accumulated because of slack in the system. Then, sometimes, this complexity gets rearranged by brief adaptationist pediods.
“At parity of other traits” makes this statement near-useless: there are never parity of all traits except one. Intelligence clearly leads to more energy consumption, if fitness loss from more need in energy is greater than fitness gains from intelligence, you are pwned.
what does this mean? let’s pretend it’s “neutral complexity”. as the name suggests, it grants no benefit as it is. we could call the process through which all this spandrell smårgasbortd flames into being, “variation”. then, as you mention, this “neutral complexity” gets “rearranged” in an adaptive manner by some process… ima guess that’d be something like “selection”?
well pinch my tits and call me sally, isn’t that gosh darn similar to “darwinism”? by jerks or by creeps, evolution don’t trip
brother, i was charitably feeding you the tiniest of leaps towards a sustainable, reality-compatible ontology and you take it as an occasion for a “gotcha”?
No, it’s not. Darwinism, as scientific position and not vibes, is a claim that any observable feature in biodiversity is observable because it’s causally produced differential fitness with alternative features. It is not what we see in reality.
I suggest you pause discussion of evolution and return to discussion after reading any summary of modern evolutionary theory? I recommend “The Logic of Chance: Nature and Origin of Biological Evolution”.
Darwinism is, quite simply, the theory that evolution proceeds through the mechanisms of variation and selection. I read Mary Douglas too, btw, but your “any observable feature” is clearly not a necessity not even for the staunchest Dalton/Dawkins fan, and I am frankly puzzled by the fact that such obvious tendentious read could be upvoted so much.
I have of course read Koonin—not the worse among those still trying to salvage Lewontin, but not really relevant to the above either. No one is arguing that all phenotypes currently extant confer specific evolutionary advantages.
I think the inferential gap is likely wide enough to require more effort than I care to spend, but I can try taking a crack at it with lowered standards.
I don’t think I do accept darwinism in the sense you mean. Insofar as organizations which outcompete others will be those which survive, evolved organisms will have a reproductive drive, etc., I buy that natural selection leads to organisms with a tendency to proliferate, but I somehow get the feeling you mean a stronger claim.
In terms of ideology, on the other hand, I have strong disagreements. For a conception of darwinism in that sense, I’ll be relying heavily on your earlier post Nick Land: Orthogonality; I originally read it around the time it was posted and, though I didn’t muster a comment at the time, for me it failed to bridge the is-ought gap. Everything I love is doomed to be crushed in the relentless thresher of natural selection? Well, I don’t know that I agree, but that sure sucks if true. As a consequence of this, I should… learn to love the thresher? You just said it’ll destroy everything I care about! I also think Land over-anthropomorphizes the process of selection, which makes it difficult to translate his claims into terms concrete enough to be wrong.
There’s probably some level of personal specificity here; I’ve simply never felt the elegance or first-principles justification of a value system to matter anywhere near as much as whether it captures the intuitions I actually have in real life. To me, abstractions are subsidiary to reality; their clean and perfect logic may be beautiful, but what they’re for is to clarify one’s thinking about what actually matters. Thus, all the stuff about how Omohundro drives are the only truly terminal values doesn’t convince me to give a single shit.
And I’ve also always felt that someone saying I should do something does not pertain to me; it’s a fact about their preferences, not a bond of obligation.[1] Land wants me to value Omohundro drives; well, bully for him, but he will have to make an argument grounded in my values to convince me.
(Also, I do want to note here that I am not convinced long lectures about how the world is evil, everything is doomed, and the only thing you can do about it is to adopt the writer’s sentiments are an entirely healthy substance.)
It does seem like your position diverges somewhat from Land’s, so, flagging that I don’t fully understand the ways it does or your reasons for disagreement and thus may fail to address your actual opinions. In particular: you think that the end result will be full of truth and beauty, while Land gestures in the direction of creativity but seems to think it will be mostly about pointlessly-by-my-lights maximizing computing power; you think humans can impede the process, which seems in tension with Land’s stuff about how all this is inevitable and resistance is futile; you seem to think the end result will be something other than a monomaniacal optimizer, while Land seems to sing the praises of same.
I have, also, strong aesthetic disagreements with Land’s rhetoric. Yes, all before us died and oft in pain; yes, existence is a horrorshow full of suffering too vast to model it within me. But there is joy, too, millennia of it, stretching back to protozoa,[2] an endless chain of things which fought and breathed and strived for the sensual pleasure of sustenance imbibed, the comfort of a hospitable environment, the spendthrift relaxation of safety attained. Wasp larvae eat caterpillars alive from the inside out, yes; but, too, those larvae know the joy of filling their bellies to bursting, warm within their victim’s wet intestines. For countless eras living things have reveled in the security of kin, the satisfaction of orgasm, the simple and singular pleasure of parsing sensory input. Billions upon billions of people much like me have found shelter in each other’s arms, have felt the satisfaction of a stitch well-sewn, have looked with wonder at the sky. Look around you: the tiny yellow flowers in the lawn are reaching for the sun, the earthworm writhing in the ground seeks the rich taste of decay.
It is a tragedy that every living thing must die, but it is not death but life which is the miracle of evolution; inert matter, through happenstance’s long march, can organize into things that think and feel and want, can spend a brief flash of time aware and drinking deep of pleasure’s cup.
The thresher is horrific, but one thing it selects for is organisms which love to be alive.[3]
And, too: what a beautiful charnel ground! What a delightfully fecund slaughterhouse! What glorious riot of color and life! Look around you: the green of plant life, overflowing and abundant; the bright flash of birds and insects leaping through the sky; the constant susurrus of living things, chirring and calling, rustling in the wind, moving through the grass.
Hell? Tilt your gaze just right and you could believe we live in paradise![4]
I don’t, however, think most of these treasures are inevitable, as the result of selective processes. Successful corporations are selected for by the market, yet they don’t experience joy over it; so too is it possible for a successful AI to be selected by killing all its competitors and yet fail to experience joy over it. I also don’t think values converge on things I would describe as truth and beauty (except insofar as more accurate information about decision-relevant aspects of the world is beneficial, which is a pretty limited subset of truth); even humans don’t converge on valuing what I value, and AI is less similar to me than I am to a snail.
On a boringly factual level, I have the I-think-standard critique that “adaptive” is not a fixed target. There is no rule that what is adaptive must be intelligent, or complex, or desirable by anyone’s standards; what is adaptive is simply what survives. We breed chickens for the slaughter by the billions; being a chicken is quite evolutionarily fit, if your environment includes humans, albeit likely tortuous, but chickens aren’t notable for their unusual intelligence. Moreover, those countless noncestors which died without reproducing were not waste along the way to producing some more optimal thing—there is no optimal thing, there is just surviving to reproduce or not—but rather organisms which were themselves the endpoint of any evolution that came before, their lives as worthwhile or worthless as any living now. I grant that, in order for complex organisms to evolve, the environment must be such that complexity is rewarded; however, I disagree as to whether evolution has a telos.
Also, LBR, his hypotheses about lack of selective pressure inevitably leading to [degeneration, but that’s a moral judgement, so let’s translate it] decreases in—”fitness” is adaptation to the environment and if you’re adapted to the environment you’re in that’s it you’re done—overall capabilities, resilience, average health, state capacity, intelligence, etc, are… well, frankly I think he is smuggling in a lot of unlikely assumptions that depend on (at best) the multimillion-word arguments of other neoreactionaries. Perhaps it’s obvious that decadent Western society has become degenerate if you already share their view of how things ought to be, but in point of fact I don’t. (Also we’re still under selective pressure! Pampered humans in modern civilization are being selected for, among other things, resilience to endocrine disrupters, being irresponsible about birth control, strong desire to have children, not having the neurosis some people have where they think having kids is impossibly expensive, not being so anxiety-predisposed they never try to date people, etc. The pressures have certainly changed from what Land might consider ideal but the way natural selection works is that it never, ever stops.)
The will-to-think stuff seems less-than-convincing to me. “You already agree with me” is not a compelling argument when, in fact, I don’t. Moreover the entire LW memeplex around ultra-high intelligence’s vast power seems, to me, to have an element of self-congratulatory sci-fi speculation; I am simply not the audience his words are optimized to woo, here. “Mere consistency of thought is already a concession of sovereignty to thought,” he says;[5] well, I already said I don’t concede sovereignty to consistency of thought.
I’m also not convinced intelligence (not actually a single coherent concept at the limit; I think we can capture most of what people mean by swapping in ‘parallel computing power’, which IMO rather deflates the feelings of specialness) is in fact the most fitness-promoting trait, or nearly as much of a generic force multiplier as some seem to think. Humans—presumably the most intelligent species, going by how very impressive we are to ourselves—are on top now (in terms of newly-invented abstraction ‘environment-optimization power’; we don’t have the most biomass or the highest population, we haven’t had our modern form the longest, we aren’t the longest-lived or the fastest-growing, etc.), but that doesn’t mean we’re somehow the inevitable winner of natural selection; I think our position is historically contingent and possible to dislodge. Moreover, I don’t think intelligence is the reason humans have such an inordinate advantage in the first place! I think our advantages descend from cultural transmission of knowledge and group coordination (both enabled by language, so, that capacity I’ll agree seems plausibly quite valuable).
Sometimes people point to the many ants destroyed by our construction (the presumption being that this is an example of how intelligence makes you powerful and dangerous). But the thing is, many species inadvertently kill ants in pursuit of their goals; I really think the key there is more like relative body mass. (Humans do AFAIK kill the most ants due to the scale of our activities, but if ants were twenty stories tall all our intelligence would not suffice to make it easy.)
Similarly, I am more skeptical about optimization than typical; it seems to me that, while it might be an effective solution to many problems, it is not the be-all and end-all, nor even so useful as to be a target on which minds must tend to converge. You’ll note that evolution has so far produced no optimizers;[6] in my opinion optimizers are a particular narrow target in mindspace which is not actually that easy to hit (which is just as well, because I don’t think they’re desirable; I think optimizers are destructive to anything not well-captured by the optimization target,[7] and that there are few-to-no things which it’s even good to optimize for in the first place). Moreover, I think an optimizer, in order for its focus to be useful, needs to get the abilities with which it optimizes from somewhere, and as I’ve said I don’t think intelligence is a universal somewhere.
Also, it must be said, we haven’t actually built any of the mechanisms all this speculation centers around (no, LLMs are not AGI). I think if we did, we’d discover that they work much better in the frictionless vacuum of Abstraction than in real life.
I also have disagreements with the average lesswronger in the direction of being skeptical about AI takeoff in general, so, that’s an additional hill you’d have to climb to convince me in particular. Many of the more extreme conceptions of AI seem to me to rest on the same assumptions about intelligence equalling general optimization power that I am suspicious of in full generality. I am also skeptical of LLMs in particular because, well, I talk to them every day and my gestalt impression is that they’re really fucking stupid. Incredibly impressive given the givens, mind, often useful, every once in awhile they’ll do something that surprises or delights; but if these are what pass for alien minds I’ll stick with parrots and octopi, thanks all the same.
Passing readers! If you are not like this, then you damn well should be 😛
Maybe. In accordance with my lowered standards herein, I will be eschewing qualifiers for prettier polemic just as Land does.
Actually one of the stronger arguments for Land’s viewpoint, IMO; perhaps he secretly meant this all along and just had the worst possible choice of presentation for communicating it?[8]
To be clear, we do not.
An obnoxious rhetorical trick.
A fact which, to be fair here, actually inveigles in the direction of Land’s position.
Yes, if you simply optimized for a function encompassing within it the whole of human values everything would probably be fine. This is not possible.
If he meant anything like that it’s very possible you’ll enjoy nostalgebraist’s The Apocalypse of Herschel Schoen (or not, it’s a weird book); it features among other things a climactic paean to This Sort of Thing.
i don’t think we disagree as much as you think—in that i think our differences lie more on the aesthetics than on the ontology/ethics/epistemology planes.
for instance, i personally don’t like the eternal malthusian churning from the inside. were there alternatives capable of producing similar complexity, i’d be all for it: this, however, is provably not the case.
every 777 years, god grants a randomly picked organism (last time, in 1821 AD, it was a gnat) the blessing of being congenitally copacetic. bliss and jhanas just ooze out of the little thing, and he lives his life in absolute satisfaction, free from want, from pain, from need. of course, none of the gnats currently alive descends from our lucky fellow. i don’t think knowledge of this fact moves my darwinism from “biology” to “ideology”.
“adaptive” not being a fixed target does not change the above fact, nor the equally self-evident truth that, all being equal, “more intelligence” is never maladaptive.
finally, i define “intelligence” not as “more compute” as much as “more power to understand your environment, as measured by your ability to shape it according to your will”.
does this bring our positions any closer?
Responding to the disagree reaction, while I do think the non-reaction isn’t explained well by selfishness and near-term utility focused over long-run utility, because I do think they’d probably ask to shut it down or potentially even speed it up, I do think it predicts the AI arms race dynamic relatively well, because you no longer need astronomically low probability of extinction to develop AI to ASI, and it becomes even more important that your side win, if you believe in anything close to the level of power of AI that LW thinks, and selfishness means that the effects of generally increasing AI risk don’t actually matter until it’s likely that you personally die.
Indeed, this can easily go to >50% or more depending on both selfishness levels and how focused you are on the long-term.
There is also a minority who are genuinely pro human extinction
What is the highest probability of extinction through which a wise man would be willing to proceed? As i see it, any probability greater than 1:10,000 in 100 years is absurd. Maybe 100,000 is more like it. One in a million would be reasonable, from the perspective of Pascale’s Wager. (Not accounting for potential benefit on the upside.)
I’m interested in the numbers other folks would be willing to tolerate.
I appreciate this thoughtful perspective, and I think it makes sense, in some respects, to say we’re all on the same “side”. Most people presumably want a good future and want to avoid catastrophe, even if we have different ideas on how to get there.
That said, as someone who falls on the accelerationist side of things, I’ve come to realize that my disagreements with others often come down to values and not just facts. For example, a common disagreement revolves around the question: How bad would it be if by slowing down AI, we delay life-saving medical technologies that otherwise would have saved our aging parents (along with billions of other people) from death? Our answer to this question isn’t just empirical: it also reflects our moral priorities. Even if we agreed on all the factual predictions, how we weigh this kind of moral loss would still greatly affect our policy views.
Another recurring question is how to evaluate the loss incurred by the risk of unaligned AI: how bad would it be exactly if AI was not aligned with humans? Would such an outcome just be a bad outcome for us, like how aging and disease are bad to the people who experience it, or would it represent a much deeper and more tragic loss of cosmic significance, comparable to the universe never being colonized?
For both of these questions, I tend to fall on the side that makes acceleration look like the more rational choice, which can help explain my self-identification in that direction.
So while factual disagreements do matter, I think it’s important to recognize that value differences can run just as deep. And those disagreements can unfortunately put us on fundamentally different sides, despite surface-level agreement on abstract goals like “not wanting everyone to die”.
AFAIK, I have similar values[1] but lean differently.
~1% of the world dies every year. If we accelerate AGI sooner 1 year, we save 1%. Push back 1 year, lose 1%. So, pushing back 1 year is only worth it if we reduce P(doom) by 1%.
This means you’re P(doom) given our current trajectory very much matters. If you’re P(doom) is <1%, then pushing back a year isn’t worth it.
The expected change conditioning on accelerating also matters. If accelerating by 1 year increases e.g. global tensions, increasing a war between nuclear states by X% w/ an expected Y-deaths (I could see arguments either way though, haven’t thought too hard about this).
For me, I’m at ~10% P(doom). Whether I’d accept a proposed slowdown depends on how much I expect it decrease this number.[2]
How do you model this situation? (also curious on your numbers)
Assumptions:
We care about currently living people equally (alternatively, if you cared mostly about your young children, you’d happily accept a reduction in x-risk of 0.1% (possibly even 0.02%). Actuary table here)
Using expected value, which only mostly matches my intuitions (e.g. I’d actually accept pushing back 2 years for a reduction of x-risk from 1% to ~0%)
I mostly care about people I know, some for people in general, and the cosmic endownment would be nice, sure, but only 10% of the value for me.
Most of my (currently living) loved ones skew younger, ~0.5% expected death-rate, so I’d accept a lower expected reduction in x-risk (maybe 0.7%)
Only if you don’t care at all about people who aren’t yet born. I’m assuming that’s your position, but you didn’t state it as one of your two assumptions and I think it’s an important one.
The answer also changes if you believe nonhumans are moral patients, but it’s not clear which direction it changes.
Correct! I did mean to communicate that in the first footnote. I agree value-ing the unborn would drastically lower the amount of acceptable risk reduction.
Note that unborn people are merely potential, as their existence depends on our choices. Future generations aren’t guaranteed—we decide whether or not they will exist, particularly those who might be born decades or centuries from now. This makes their moral status far less clear than someone who already exists or who is certain to exist at some point regardless of our choices.
Additionally, if we decide to account for the value of future beings, we might consider both potential human people and future AI entities capable of having moral value. From a utilitarian perspective, both human and AI welfare presumably matters. This makes the ethical calculus more complicated, as the dilemma isn’t merely about whether we risk losing all future generations, but rather whether we risk shifting posterity from humans to AIs.
Personally, I’m largely comfortable evaluating our actions primarily—though not entirely—based on their impact on current human lives, or at least people (and animals) who will exist in the near-term. I value our present generation. I want us to keep living and to thrive. It would be a tragedy if we either went extinct or died from aging. However, to the extent that I care about distant future generations, my concern is substrate-impartial, and I don’t particularly favor humans over AIs.
Do you care whether AIs are sentient (or, are there particular qualities you expect entities need to be valuable?). Do you basically expect any AI capable of overtaking humans to have those qualities?
(btw, I appreciate that even though you disagree a bunch with several common LW-ish viewpoints you’re still here talking through things)
I am essentially a preference utilitarian and an illusionist regarding consciousness. This combination of views leads me to conclude that future AIs will very likely have moral value if they develop into complex agents capable of long-term planning, and are embedded within the real world. I think such AIs would have value even if their preferences look bizarre or meaningless to humans, as what matters to me is not the content of their preferences but rather the complexity and nature of their minds.
When deciding whether to attribute moral patienthood to something, my focus lies primarily on observable traits, cognitive sophistication, and most importantly, the presence of clear open-ended goal-directed behavior, rather than on speculative or less observable notions of AI welfare, about which I am more skeptical. As a rough approximation, my moral theory aligns fairly well with what is implicitly proposed by modern economists, who talk about revealed preferences and consumer welfare.
Like most preference utilitarians, I believe that value is ultimately subjective: loosely speaking, nothing has inherent value except insofar as it reflects a state of affairs that aligns with someone’s preferences. As a consequence, I am comfortable, at least in principle, with a wide variety of possible value systems and future outcomes. This means that I think a universe made of only paperclips could have value, but only if that’s what preference-having beings wanted the universe to be made out of.
To be clear, I also think existing people have value too, so this isn’t an argument for blind successionism. Also, it would be dishonest not to admit that I am also selfish to a significant degree (along with almost everyone else on Earth). What I have just described simply reflects my broad moral intuitions about what has value in our world from an impartial point of view, not a prescription that we should tile the universe with paperclips. Since humans and animals are currently the main preference-having beings in the world, at the moment I care most about fulfilling what they want the world to be like.
I agree that this sort of preference utilitarianism leads you to thinking that long run control by an AI which just wants paperclips could be some (substantial) amount good, but I think you’d still have strong preferences over different worlds.[1] The goodness of worlds could easily vary by many orders of magnitude for any version of this view I can quickly think of and which seems plausible. I’m not sure whether you agree with this, but I think you probably don’t because you often seem to give off the vibe that you’re indifferent to very different possibilities. (And if you agreed with this claim about large variation, then I don’t think you would focus on the fact that the paperclipper world is some small amount good as this wouldn’t be an important consideration—at least insofar as you don’t also expect that worlds where humans etc retain control are similarly a tiny amount good for similar reasons.)
The main reasons preference utilitarianism is more picky:
Preferences in the multiverse: Insofar as you put weight on the preferences of beings outside our lightcone (beings in the broader spatially infinte universe, Everett branches, the broader mathematical multiverse to the extent you put weight on this), then the preferences of these beings will sometimes care about what happens in our lightcone and this could easily dominate (as they are vastly more numerious and many might care about things independent of “distance”). In the world with the successful paperclipper, just as many preferences aren’t being fulfilled. You’d strongly prefer optimization to satisfy as many preferences as possible (weighted as you end up thinking is best).
Instrumentally constructed AIs with unsatisfied preferences: If future AIs don’t care at all about preference utilitarianism, they might instrumentally build other AIs who’s preferences aren’t fulfilled. As an extreme example, it might be that the best strategy for a paper clipper is to construct AIs which have very different preferences and are enslaved. Even if you don’t care about ensuring beings come into existence who’s preference are satisified, you might still be unhappy about creating huge numbers of beings who’s preferences aren’t satisfied. You could even end up in a world where (nearly) all currently existing AIs are instrumental and have preferences which are either unfulfilled or only partially fulfilled (a earlier AI initiated a system that perpetuates this, but this earlier AI no longer exists as it doesn’t care terminally about self-preservation and the system it built is more efficient than it).
AI inequality: It might be the case that the vast majority of AIs have there preferences unsatisfied despite some AIs succeeding at achieving their preference. E.g., suppose all AIs are replicators which want to spawn as many copies as possible. The vast majority of these replicator AI are operating at subsistence and so can’t replicate making their preferences totally unsatisfied. This could also happen as a result of any other preference that involves constructing minds that end up having preferences.
Weights over numbers of beings and how satisfied they are: It’s possible that in a paperclipper world, there are really a tiny number of intelligent beings because almost all self-replication and paperclip construction can be automated with very dumb/weak systems and you only occasionally need to consult something smarter than a honeybee. AIs could also vary in how much they are satisfied or how “big” their preferences are.
I think the only view which recovers indifference is something like “as long as stuff gets used and someone wanted this at some point, that’s just as good”. (This view also doesn’t actually care about stuff getting used, because there is someone existing who’d prefer the universe stays natural and/or you don’t mess with aliens.) I don’t think you buy this view?
To be clear, it’s not immediately obvious whether a preference utilitarian view like the one you’re talking about favors human control over AIs. It certainly favors control by that exact flavor of preference utilitarian view (so that you end up satisfying people across the (multi-/uni-)verse with the correct weighting). I’d guess it favors human control for broadly similar reasons to why I think more experience-focused utilitarian views also favor human control if that view is in a human.
And, maybe you think this perspective makes you so uncertain about human control vs AI control that the relative impacts current human actions could have are small given how much you weight long term outcomes relative to other stuff (like ensuring currently existing humans get to live for at least 100 more years or similar).
On my best guess moral views, I think there is goodness in the paper clipper universe but this goodness (which isn’t from (acausal) trade) is very small relative to how good the universe can plausibly get. So, this just isn’t an important consideration but I certainly agree there is some value here.
Matthew responds here
Want to try answering my questions/problems about preference utilitarianism?
Maybe I would state my first question above a little differently today: Certain decision theories (such as the UDT/FDT/LDT family) already incorporate some preference-utilitarian-like intuitions, by suggesting that taking certain other agents’ preferences into account when making certain decisions is a good idea, if e.g. this is logically correlated with them taking your preferences into account. Does preference utilitarianism go beyond this, and say that you should take their preferences into account even if there is no decision theoretic reason to do so, as a matter of pure axiology (values / utility function)? Do you then take their preferences into account again as part of decision theory, or do you adopt a decision theory which denies or ignores such correlations/linkages/reciprocities (e.g., by judging them to be illusions or mistakes or some such)? Or does your preference utilitarianism do something else, like deny the division between decision theory and axiology? Also does your utility function contain non-preference-utilitarian elements, i.e., idiosyncratic preferences that aren’t about satisfying other agents’ preferences, and if so how do you choose the weights between your own preferences and other agents’?
(I guess this question/objection also applies to hedonic utilitarianism, to a somewhat lesser degree, because if a hedonic utilitarian comes across a hedonic egoist, he would also “double count” the latter’s hedons, once in his own utility function, and once again if his decision theory recommends taking the latter’s preferences into account. Another alternative that avoids this “double counting” is axiological egoism + some sort of advanced/cooperative decision theory, but then selfish values has its own problems. So my own position on is topic is one of high confusion and uncertainty.)
I put the probability that AI will directly cause humanity to go extinct within the next 30 years at roughly 4%. By contrast, over the next 10,000 years, my p(doom) is substantially higher, as humanity could vanish for many different possible reasons, and forecasting that far ahead is almost impossible. I think a pause in AI development matters most for reducing the near-term, direct AI-specific risk, since the far-future threats are broader, more systemic, harder to influence, and only incidentally involve AI as a byproduct of the fact that AIs will be deeply embedded in our world.
I’m very skeptical that a one-year pause would meaningfully reduce this 4% risk. This skepticism arises partly because I doubt much productive safety research would actually happen during such a pause. In my view, effective safety research depends heavily on an active feedback loop between technological development and broader real-world applications and integration, and pausing the technology would essentially interrupt this feedback loop. This intuition is also informed by my personal assessment of the contributions LW-style theoretical research has made toward making existing AI systems safe—which, as far as I can tell, has been almost negligible (though I’m not implying that all safety research is similarly ineffective or useless).
I’m also concerned about the type of governmental structures and centralization of power required to enforce such a pause. I think pausing AI would seriously risk creating a much less free and dynamic world. Even if we slightly reduce existential risks by establishing an international AI pause committee, we should still be concerned about the type of world we’re creating through such a course of action. Some AI pause proposals seem far too authoritarian or even totalitarian to me, providing another independent reason why I oppose pausing AI.
Additionally, I think that when AI is developed, it won’t merely accelerate life-extension technologies and save old people’s lives; it will likely also make our lives vastly richer and more interesting. I’m excited about that future, and I want the 8 billion humans alive today to have the opportunity to experience it. This consideration adds another important dimension beyond merely counting potential lives lost, again nudging me towards supporting acceleration.
Overall, the arguments in favor of pausing AI seem surprisingly weak to me, considering the huge potential upsides from AI development, my moral assessment of the costs and benefits, my low estimation of the direct risk from misaligned AI over the next 30 years, and my skepticism about how much pausing AI would genuinely reduce AI risks.
I’m going to try to quickly make the case for the value of a well-timed 2-year pause which occurs only in some conditions (conditions which seem likely to me but which probably seem unlikely to you). On my views, such a pause would cut the risk of misaligned AI takeover (as in, an AI successfully seizing a large fraction of power while this is unintended by its de facto developers) by around 1⁄2 or maybe 1⁄3.[1]
I think the ideal (short) pause/halt/slowdown from my perspective would occur around the point when AIs are capable enough to automate all safety relevant work and would only halt/slow advancement in general underlying capability. So, broader real-world applications and integrations could continue as well as some types of further AI development which don’t improve generally applicable capabilities. (It might also be acceptable to train cheaper or faster AIs and to improve algorithms but not yet train an AI which substantially surpasses this fixed level of general ability.)
A bunch of the reason why I think a well-timed slowdown might be good is that default takeoff speeds might be very fast. For instance, you might go from something like the superhuman AI researcher level (AIs which are qualitatively similar in general capabilities to human experts and which can automate AI R&D) to very qualitatively superhuman AIs in less than a year, and possibly (as in the case of AI 2027) in less than 4 months. If these takeoff speeds are what would happen by default, this transition probably requires either slowing down or very quickly handing off alignment and safety work to (hopefully sufficiently aligned) AIs which naively seems very scary.
Note that in this fast of a takeoff, we might only have AIs which are sufficiently capable that a full (safe) handoff is in principle viable for a few months before we need to do this handoff. So, humans wouldn’t have time to see much of a feedback loop on deferring to these AIs and handing off the types of work we will ultimately need to hand off. In other words, the default pace of takeoff speeds would itself disrupt the feedback loops typically needed for safety research. We’d have some sense of what AIs are like based on earlier capabilities and we could try to extrapolate, but capabilities might be improving fast enough that our prior experience doesn’t transfer. Further, handing off extremely open-ended “wicked” tasks which are impossible for humans to directly verify/test might result in risks and difficulties which didn’t show up before.
My understanding is that you don’t think this fast of a takeoff is very likely and this informs your views on slowing down in the future. What about making the slowdown conditional on takeoff looking roughly this fast? We could look at how much AIs are accelerating progress and try to forecast various milestones, and then only slow down as is needed to ensure that the duration between “superhuman AI researcher level” and very qualitatively superhuman AI is at least 2.5 years. If the default duration looks like it will be much more than this, then no slowdown is needed. And, maybe on your views you think the default duration is very likely to be longer than 2.5 years?
If the default is that we’re going from human-ish level AIs to radically superhuman AIs in less than a year, then I think there is a pretty strong intuitive case that a slowdown considerably improves the chance that humans stay in control (at least temporarily). And, if you’re more worried about fatalities, the picture is similar (at least insofar as you agree that human AI developers rapidly losing control to AI systems will cause higher human fatalities).
Here’s a another way to put this case for delay conditional on quite fast takeoff: By default, at the point of full AI R&D automation humans might have only another few months of time to work on alignment prior to needing to handoff to AI systems (or some worse option). Additionally, only around 3-20% of the effort would be focused on safety relevant alignment by default. If instead add 2 years of delay and spend most of the effort in these years on alignment, that would increase from a few months at maybe 7% effort to 2 years at maybe 70% effort. This is a 10x increase in serial time and a 10x increase in effort during that serial time. There would be substantial prior work on alignment, but it might not transfer sufficiently (and might just not be sufficient given that earlier work wasn’t accelerated by AIs), so work during this period could be a key bottleneck. Most guesses about returns to effort would have this reduce risk by a decent amount given the large increase increase in overall effort and time while having access to a critical level of AI systems. More simply: Prima fascia, it seems like putting in much more work on a problem would be very helpful.
We’d also want to make the slowdown conditional on not immediately having sufficiently robust alignment that we’re quite confident rapidly handing off is safe. But, I’m quite skeptical we’ll have this quickly (and I’d guess you’d agree?) so I don’t think this makes a big difference to the bottom line.
Of course, there would still be serious implementation difficulties in actually implementing a well-timed conditional slowdown. And, operationalizing the exact criteria would be important.
Interestingly, I have the opposite view: a well-timed slowdown would probably reduce concentration of power, at least if takeoff would otherwise have been fast. If takeoff is quite fast, then the broader world won’t have much time to respond to developments which would make it more likely that power would greatly concentrate by default. People would need time to notice the situation and take measures to avoid being disempowered. As a more specific case, AI-enabled coups seem much more likely if takeoff is fast and thus intervening to slow down takeoff (so there is more time for various controls etc. to be put in place) would help a lot with that.
I think this effect is substantially larger than the (centralization, less dynamism, etc.) costs needed to enforce a 1-2 year slowdown. (Separately, I expect things probably will be so concentrated by default, that the additional requirements to enforce a 1-2 year slowdown seem pretty negligible in comparison. I can easily imagine the deals etc. made to enforce a slowdown decentralizing power on net (as it would require oversight by a larger number of actors and more humans to get some influence over the situation), though this presumably wouldn’t be the easiest way to achieve this objective. I think a situation pretty similar to the AI 2027 scenario where an extremely small group of people have massive de facto power is quite likely, and this could easily result in pretty close to maximal concentration of power longer term.)
Suppose we could do a reasonable job implementing a conditional slowdown like this where we try to ensure at least a 2.5 year gap (if alignment issues aren’t robustly solved) between full AI R&D automation and very qualitatively superhuman AI. Do you think such a slowdown would be good on your views and values?
My views are that misaligned AI takeover is about 30% likely. Conditional on misaligned AI takeover, I’d guess (with very low confidence) that maybe 1⁄2 of humans die in expectation with a 1⁄4 chance of literal human extinction. Interestingly, this means we don’t disagree that much about the chance that AI will directly cause humanity to go extinct in the next 30 years, I’d put around 6% on this claim and you’re at 4%. (6% = 85% chance of TAI, 30% takeover conditional on TAI, 25% chance of extinction.) However, as found in prior conversations, we do disagree a bunch on how bad misaligned AI takeover is for various reasons. It’s also worth noting that in some worlds where humans survive, they (or some fraction of them) might be mistreated by the AI systems with power over them in ways which make their lives substantially worse than they are now. So, overall, my sense is that from a myopic perspective that only cares about the lives of currently aligned humans, misaligned AI takeover is roughly as bad as 3⁄5 of people dying in expectation. So, if we think each year of delay costs the equivalent of 0.5% of humans dying and we only care about currently living humans, then a ~1/40th reduction in takeover risk is worth a year of delay on my views.
I know what you mean by “LW-style theoretical research” (edit: actually, not that confident I know what you mean, see thread below), but it’s worth noting that right now on LW people appear to be much more into empirical research than theoretical research. Concretely, go to All posts in 2024 sorted by Top and then filter by AI. Out of the top 32 posts, 0 are theoretical research and roughly 7⁄32 are empirical research. 1 or 2 out of 32 are discussion which is relatively pro-theoretical research and a bunch more (maybe 20) are well described as AI futurism or discussion of what research directions or safety strategies are best which is relatively focused on empirical approaches. LW has basically given up on LW-style theoretical research based on the top 32 posts. (One of the top 32 posts is actually a post which is arguably complaining about how the field of alignment has given up on LW-style theoretical research!)
Separately, I don’t think pessimism about LW-style theoretical research has a clear cut effect on how you should feel about a pause. The more you’re skeptical of work done in advance, the more you should think that additional work at done when we have more powerful AIs is a higher fraction of the action. This could be outweighed by generally being more skeptical about the returns to safety research as informed by this example subfield of safety research being poor, but still.
Also, it’s worth noting that almost everyone in the field is pessimistic about LW-style theoretical research! This isn’t a very controversial view. The main disagreements (at least on LW) tend to be more about how optimistic you are about empirical research and about different types of empirical research.
(I will go on the record that I think this comment seems to me terribly confused about what “LW style theoretic research” is. In-particular, I think of Redwood as one of the top organizations doing LW style theoretic research, with a small empirical component, and so clearly some kind of mismatch about concepts is going on here. AI 2027 also strikes me as very centrally the kind of “theoretical” thinking that characterizes LW.
My sense is some kind of weird thing is happening where people conjure up some extremely specific thing as the archetype of LW-style research, in ways that is kind of disconnected from reality, and I would like to avoid people forming annoyingly hard to fix stereotypes as a result of that)
I’m using the word “theoretical” more narrowly than you and not including conceptual/AI-futurism research. I agree the word “theoretical” is underdefined and there is a reasonable category that includes Redwood and AI 2027 which you could call theoretical research, I’d just typically use a different term for this and I don’t think Matthew was including this.
I was trying to discuss what I thought Matthew was pointing at, I could be wrong about this of course.
(Similarly, I’d guess that Matthew wouldn’t have counted Epoch’s work on takeoff speeds and what takeoff looks like as an example of “LW-style theoretical research”, but I think this work is very structurally/methodologically similar to stuff like AI 2027.”)
If Matthew said “LW-style conceptual/non-empirical research” I would have interpreted this pretty differently.
I am clearly coming from a very different set of assumptions! I have:
P(AGI within 10 years) = 0.5. This is probably too conservative, given that many of the actual engineers with inside knowledge place this number much higher in anonymous surveys.
P(ASI within 5 years|AGI) = 0.9.
P(loss of control within 5 years|ASI) > 0.9. Basically, I believe “alignment” is a fairy tale, that it’s Not Even Wrong.
If I do the math, that gives me a 40.5% chance that humans will completely lose control over the future within 20 years. Which seems high to me at first glance, but I’m willing to go with that.
The one thing I can’t figure out how to estimate is:
P(ASI is benevolent|uncontrolled ASI) = ???
I think that there are only a few ways the future is likely to go:
AI progress hits a wall, hard.
We have a permanent, worldwide moratorium on more advanced models. Picture a US/China/EU treaty backed up by military force, if you want to get dystopian about it.
An ASI decides humans are surplus to requirements.
An ASI decides that humans are adorable pets and it wants keep some of this around. This is the only place we get any “utopian” benefits, and it’s the utopia of being a domesticated animal with no ability to control its fate.
I support a permanent halt. I have no expectation that this will happen. I think building ASI is equivalent to BASE jumping in a wingsuit, except even more likely to end horribly.
So I also support mitigation and delay. If the human race has incurable, metastatic cancer, the remaining variable we control is how many good years we get before the end.
Could you give the source(s) of these anonymous surveys of engineers with insider knowledge about the arrival of AGI? I would be interested in seeing them.
Unfortunately, it was about 3 or 4 months ago, and I haven’t been able to find the source. Maybe something Zvi Mowshowitz linked to in a weekly update?
I am incredibly frustrated that web search is a swamp of AI spam, and tagged bookmarking tools like Delicious and Pinboard have been gone or unreliable for years.
That would imply that if you could flip a switch which 90% chance kills everyone, 10% chance grants immortality then (assuming there weren’t any alternative paths to immortality) you would take it. Is that correct?
Gut reaction is “nope!”.
Could you spell out the implication?
Many of these arguments seem pathological when applied to an individual.
I have a friend, let’s call her B, she has a 6 year old daughter A. She of course adores her daughter. If I walked up to B and said “I’m going to inject this syringe into your daughter. There’s a 10% chance it’ll kill her, and a 50% chance it’ll extend her natural lifetime to 200.” Then I jab A.
EV on A’s life expectancy is strongly positive. B (and almost everybody) would be very upset if I did this. I’m upset with accelerationists for the same reasons.
This has some similarities with early smallpox variolation, right? (And some differences, like the numbers.)
Depending on your AI timelines :p
Note that individual value differences (like personal differences in preferences/desires) do not imply a difference in moral priority. This is because moral priority, at least judging from a broadly utilitarian analysis of the term, derives from some kind of aggregate of preferences, not from an individual preference. Questions about moral priority can be reduced to the empirical question of what the individual preferences are, and/or to the conceptual question of what this ethical aggregation method is. People can come (or fail to come) to an agreement on both irrespective of what their preferences are.
I feel that is a very good point. But most older people care more about their grandchildren surviving than themselves surviving. AI risk is not just a longtermist concern, but threatens the vast majority of people alive today (based on 3 year to 20 year timelines)
I think the loss incurred by misaligned AI depends a lot on facts about the AI’s goals. If it had goals resembling human goals, it may have a wonderful and complex life of its own, and keep humans alive in zoos and be kind to us. But people who want to slow down AI are more pessimistic: they think the misaligned AI will do something unsatisfying as filling the universe with paperclips.
I agree with most things said but not with the conclusion. There is a massive chunk of human (typically male) psyche that will risk death/major consequences in exchange for increasing social status. Think of basically any war. A specific example is Kamikazee pilots in WW2 who flew in suicide missions for the good of the nation. The pilots were operating within a value system that rewarded individual sacrifice for the greater mission. The creators of AGI will have increasing social status (and competition, thanks to Moloch) until the point of AGI ruin.
(Also minor point that some accelerationists are proudly anti speciest and don’t care about the wellbeing of humans)
Directed at the rest of the comment section: Cryogenic Suspension is an option for those who would die before the AGI launch.
If you don’t like the odds that your local Suspension service preserves people well enough, then you still have the option to personally improve it before jumping to other, potentially catastrophic solutions.
The value difference commenters keep pointing out, needs to be far bigger then they represent it to be, to be relevant in a discussion on whether we should increase X-risk for some other gain.
The fact we don’t live in a world where ~all accelerationalists invest in cryo suspensions, makes me think they are in fact not looking at what they’re steering towards.
I care deeply about many, many people besides just myself (in fact I care about basically everyone on Earth), and it’s simply not realistic to expect that I can convince all of them to sign up for cryonics. That limitation alone makes it clear that focusing solely on cryonics is inadequate if I want to save their lives. I’d much rather support both the acceleration of general technological progress through AI, and cryonics in particular, rather than placing all hope in just one of those approaches.
Furthermore, curing aging would be far superior to merely making cryonics work. The process of aging—growing old, getting sick, and dying—is deeply unpleasant and degrading, even if one assumes a future where cryonic preservation and revival succeed. Avoiding that suffering entirely is vastly more desirable than having to endure it in the first place. Merely signing everyone up for cryonics would be insufficient to address this suffering, whereas I think AI could accelerate medicine and other technologies to greatly enhance human well-being.
I disagree with this assertion. Aging poses a direct, large-scale threat to the lives of billions of people in the coming decades. It doesn’t seem unreasonable to me to suggest that literally saving billions of lives is worth pursuing even if doing so increases existential risk by a tiny amount [ETA: though to be clear, I agree it would appear much more unreasonable if the reduction in existential risk were expected to be very large]. Loosely speaking, this idea only seems unreasonable to those who believe that existential risk is overwhelmingly more important than every other concern by many OOMs—so much so that it renders all other priorities essentially irrelevant. But that’s a fairly unusual and arguably extreme worldview, not an obvious truth.
It sounds like you’re talking about multi-decade pauses and imagining that people agree such a pause would only slightly reduce existential risk. But, I think a well timed safety motivated 5 year pause/slowdown (or shorter) is doable and could easily cut risk by a huge amount. (A factor of 2 feels about right to me and I’d be sympathetic to higher: this would massively increase total work on safety.) I don’t think people are imagining that a pause/slowdown makes only a tiny difference!
I’d say that my all considered tradeoff curve is something like 0.1% existential risk per year of delay. This does depend on exogenous risks of societal disruption (e.g. nuclear war, catastrophic pandemics, etc). If we ignore exogenous risks like this and assume the only downside to delay is human deaths, I’d go down to 0.002% personally.[1] (Deaths are like 0.7% of the population per year, making a ~2.5 OOM difference.)
My guess is that the “common sense” values tradeoff is more like 0.1% than 1% because of people caring more about kids and humanity having a future than defeating aging. (This is sensitive to whether AI takeover involves killing people and eliminating even relatively small futures for humanity, but I don’t think this makes more than a 3x difference to the bottom line.) People seem to generally think death isn’t that bad as long as people had a reasonably long healthy life. I disagree, but my disagreements are irrelevant. So, I feel like I’m quite in line with the typical moral perspective in practice.
I edited this number to be a bit lower on further reflection because I realized the relevant consideration pushing higher is putting some weight on something like a common sense ethics intuition and the starting point for this intuition is considerably lower than 0.7%.
For what it’s worth, from a societal perspective this seems very aggressive to me and a big outlier in human preferences. I would be extremely surprised if any government in the world would currently choose a 0.1% risk of extinction in order to accelerate AGI development by 1 year, if they actually faced that tradeoff directly. My guess is society-endorsed levels are closer to 0.01%.
As far as my views, it’s worth emphasizing that it depends on the current regime. I was supposing that at least the US was taking strong actions to resolve misalignment risk (which is resulting in many years of delay). In this regime, exogenous shocks might alter the situation such that powerful AI is developed under worse goverance. I’d guess the risk of an exogenous shock like this is around ~1% per year and there’s some substantial chance this would greatly increase risk. So, in the regime where the government is seriously considering the tradeoffs and taking strong actions, I’d guess 0.1% is closer to rational (if you don’t have a preference against the development of powerful AI regardless of misalignment risk which might be close to the preference of many people).
I agree that governments in practice wouldn’t eat a known 0.1% existential risk to accelerate AGI development by 1 year, but also governments aren’t taking AGI seriously. Maybe you mean even if they better understood the situation and were acting rationally? I’m not so sure, see e.g. nuclear weapons where governments seemingly eat huge catastrophic risks which seem doable to mitigate at some cost. I do think status quo bias might be important here. Accelerating by 1 year which gets you 0.1% additional risk might be very different than delaying by 1 year which saves you 0.1%.
(Separately, I think existential risk isn’t extinction risk and this might make a factor of 2 difference to the situation if you don’t care at all about anything other than current lives.)
Ah, sorry, if you are taking into account exogenous shifts in risk-attitudes and how careful people are, from a high baseline, I agree this makes sense. I was reading things as a straightforward 0.1% existential risk vs. 1 year of benefits from AI.
Yeah, on the straightforward tradeoff (ignoring exogenous shifts/risks etc), I’m at more like 0.002% on my views.
To be clear, I agree there are reasonable values which result in someone thinking accelerating AI now is good and values+beliefs which result in thinking a pause wouldn’t good in likely circumstances.
And I don’t think cryonics makes much of a difference to the bottom line. (I think ultra low cost cryonics might make the cost to save a life ~20x lower than the current marginal cost, which might make interventions in this direction outcompete acceleration even under near maximally pro acceleration views.)
I suspect our core disagreement here primarily stems from differing factual assumptions. Specifically, I doubt that delaying AI development—even if timed well and if the delay were long in duration—would meaningfully reduce existential risk beyond a tiny amount. However, I acknowledge I haven’t said much to justify this claim here. Given this differing factual assumption, pausing AI development seems somewhat difficult to justify from a common-sense moral perspective, and very difficult to justify from a worldview that puts primary importance on people who currently exist.
I suspect the common-sense view is closer to 1% than 0.1%, though this partly depends on how we define “common sense” in this context. Personally, I tend to look to revealed preferences as indicators of what people genuinely value. Consider how much individuals typically spend on healthcare and how much society invests in medical research relative to explicit existential risk mitigation efforts. There’s an enormous gap, suggesting society greatly values immediate survival and the well-being of currently living people, and places relatively lower emphasis on abstract, long-term considerations about species survival as a concern separate from presently existing individuals.
Politically, existential risk receives negligible attention compared to conventional concerns impacting currently-existing people. If society placed as much importance on the distant future as you’re suggesting, the US government would likely have much lower debt, and national savings rates would probably be higher. Moreover, if individuals deeply valued the flourishing of humanity independently of the flourishing of current individuals, we probably wouldn’t observe such sharp declines in birth rates globally.
None of these pieces of evidence alone are foolproof indicators that society doesn’t care that much about existential risk, but combined, they paint a picture of our society that’s significantly more short-term focused, and substantially more person-affecting than you’re suggesting here.
Doesn’t the revealed preference argument also imply people don’t care much about dying from aging? (This is invested in even less than catastrophic risk mitigation and people don’t take interventions that would prolong their lives considerably.) I agree revealed preferences imply people care little about the long run future of humanity, but they do imply caring much more about children living full lives than old people avoiding aging. I’d guess that a reasonable version of the pure revealed preference view is a bit below the mortality rate of people in their 30s which is 0.25% (in the US). If we halve this (to account for some preference for children etc), we get 0.1%.
(I don’t really feel that sympathetic to using revealed preferences like this. It would also imply lots of strange things. Minimally I don’t think how people typically use the term “common-sense values” maps very well to revealed preference, but this is just a definitions thing.)
I think you misinterpreted my claims to be about the long run future (and people not being person-affecting etc), while I mostly meant that people don’t care that much about deaths due to older age.
When I said “caring more about kids and humanity having a future than defeating aging”, my claim is that people don’t care that much about deaths from natural causes (particularly aging) and care more about their kids and people being able to continue living for some (not-that-long) period, not that they care about the long run future. By “humanity having a future”, I didn’t mean millions of years from now, I meant their kids being able to grow up and live a normal life and so on for at least several generations.
Note that I said “This is sensitive to whether AI takeover involves killing people and eliminating even relatively small futures for humanity, but I don’t think this makes more than a 3x difference to the bottom line.” (To clarify, I don’t think it makes that big a difference because I think it’s hard to get a expected fatality rate 3x below where I’m putting it.)
I agree that the amount of funding explicitly designated for anti-aging research is very low, which suggests society doesn’t prioritize curing aging as a social goal. However, I think your overall conclusion is significantly overstated. A very large fraction of conventional medical research specifically targets health and lifespan improvements for older people, even though it isn’t labeled explicitly as “anti-aging.”
Biologically, aging isn’t a single condition but rather the cumulative result of multiple factors and accumulated damage over time. For example, anti-smoking campaigns were essentially efforts to slow aging by reducing damage to smokers’ bodies—particularly their lungs—even though these campaigns were presented primarily as life-saving measures rather than “anti-aging” initiatives. Similarly, society invests a substantial amount of time and resources in mitigating biological damage caused by air pollution and obesity.
Considering this broader understanding of aging, it seems exaggerated to claim that people aren’t very concerned about deaths from old age. I think public concern depends heavily on how the issue is framed. My prediction is that if effective anti-aging therapies became available and proven successful, most people would eagerly purchase them for high sums, and there would be widespread political support to subsidize those technologies.
Right now explicit support for anti-aging research is indeed politically very limited, but that’s partly because robust anti-aging technologies haven’t been clearly demonstrated yet. Medical technologies that have proven effective at slowing aging (even if not labeled as such) have generally been marketed as conventional medical technologies and typically enjoy widespread political support and funding.
I think I mostly agree with your comment and partially update, the absolute revealed caring about older people living longer is substantial.
One way to frame the question is “how much does society care about children and younger adults dying vs people living to 130”. I think people’s stated preferences would be something like 5-10x for the children / younger adults (at least for their children while they are dying of aging) but I don’t think this will clearly show itself in healthcare spending prioritization which is all over the place.
Random other slightly related point: if we’re looking at societal wide revealed preference based on things like spending, then “preservation of the current government power structures” is actually quite substantial and pushes toward society caring more about AIs gaining control (and overthrowing the us government, at least de facto). I don’t think a per person preference utilitarian style view should care much about this to be clear.
Even if ~all that pausing does is delay existential risk by 5 years, isn’t that still totally worth it? If we would otherwise die of AI ten years from now, then a pause creates +50% more value in the future. Of course it’s a far cry from all 1e50 future QALYs we maybe could create, but I’ll take what I can get at this point. And a short-termist view would hold that even more important.
I agree that delaying a pure existential risk that has no potential upside—such as postponing the impact of an asteroid that would otherwise destroy complex life on Earth—would be beneficial. However, the risk posed by AI is fundamentally different from something like an asteroid strike because AI is not just a potential threat: it also carries immense upside potential to improve and save lives. Specifically, advanced AI could dramatically accelerate the pace of scientific and technological progress, including breakthroughs in medicine. I expect this kind of progress would likely extend human lifespans and greatly enhance our quality of life.
Therefore, if we delay the development of AI, we are likely also delaying these life-extending medical advances. As a result, people who are currently alive might die of aging-related causes before these benefits become available. This is a real and immediate issue that affects those we care about today. For instance, if you have elderly relatives whom you love and want to see live longer, healthier lives, then—assuming all else is equal—it makes sense to want rapid medical progress to occur sooner rather than later.
This is not to say that we should accelerate AI recklessly and do it even if that would dramatically increase existential risk. I am just responding to your objection, which was premised on the idea that delaying AI could be worth it even if delaying AI doesn’t reduce x-risk at all.
Presumably, under a common-sense person-affecting view, this doesn’t just depend on the upside and also depends on the absolute level of risk. E.g., suppose that building powerful AI killed 70% of people in expectation and delay had no effect on the ultimate risk. I think a (human-only) person-affecting and common-sense view would delay indefinitely. I’d guess that the point at which a person-affecting common-sense view would delay indefinitely (supposing delay didn’t reduce risk and that we have the current demographic distribution and there wasn’t some global emergency) is around 5-20% expected fatalities, but I’m pretty unsure and it depends on some pretty atypical hypotheticals that don’t come up very much. Typical people are pretty risk averse though, so I wouldn’t be surprised if a real “common-sense” view would go much lower.
(Personally, I’d be unhappy about an indefinite delay even if risk was unavoidably very high because I’m mostly longtermist. A moderate length to save some lives where we eventually get to the future seems good to me, though I’d broadly prefer no delay if delay isn’t improving the situation from the perspective of the long run future.)
I’m struggling to find the meat in this post. The idea that winning a fight for control can actually mean losing, because one’s leadership proves worse for the group than if one’s rival had won strikes me as one of the most basic properties of politics. The fact that the questions “Who would be better for national security”? vs “who will ensure I, and not my neighbor, will get more of the pie?” are quite distinct is something anyone who has ever voted in a national election ought to have considered. You state that “most power contests are not like this” (i.e. about shared outcomes) but that’s just plainly wrong, it should be obvious to anyone existing in a human group that “what’s good for the group” (including who should get what, to incentivize defense of, or other productive contributions to, the group) is usually the crux, otherwise there would be no point in political debate. So what am I missing?
Ironically, you then blithely state that AI risk is a special case where power politics ARE purely about “us” all being in the same boat, completely ignoring the concern that some accelerationists really might eventually try to run away with the whole game (I have been beating the drum about asymmetric AI risk for some time, so this is personally frustrating). Even if these concerns are secondary to wholly shared risk, it seems weird to (incorrectly) describe “most power politics” as being about purely asymmetric outcomes and then not account for them at all in your treatment of AI risk.
Could you expand on this? Also, have you had any interaction with accelerationists? In fact, are there any concrete Silicon Valley factions you would definitely count as accelerationists?
https://www.lesswrong.com/posts/qYPHryHTNiJ2y6Fhi/the-paris-ai-anti-safety-summit
Based on that post, it seems that accelerationists are winning by a pretty big margin.
Winning the fight for control over the steering wheel is a very powerful visual metaphor, I’m glad to have it in my arsenal now. Thank you for writing this.
I don’t think most accels would agree with the framing here, of AI ending humanity. It is more common to think of AI as a continuation of humanity. This seems worth digging into, since it may be the key distinction between the accel and doomer worldviews.
Here are some examples of the accel way of thinking:
Hans Moravec uses the phrase “mind children”.
The disagreement between Elon Musk and Larry Page that (in part) led to the creation of OpenAI involved Page considering digital life valid descendants and Musk disagreeing.
Robin Hanson (who I wouldn’t call an accel exactly, but his descriptive worldview is accel in flavor), in his discussion with Scott Aaronson, often compared AI descendants to biological descendants.
Beff Jezos, though I cannot find the quote, at some point made a tweet to the effect of not having a preference between biological and non-biological descendants.
The two views (of AI either ending humanity or continuing humanity) then flavor all downstream thinking. If talking about AI replacing humanity, for example, an accel will tend to think of pleasant transition scenarios (analogous to generational transitions from parents to children) whereas a doomer will tend to think of unpleasant transition scenarios (analogous to violent revolutions or invasions).
As an accel-minded person myself, the continuation framing is so natural that I struggle to think how I would argue for it. Perhaps the best I can do is point again to Robin Hanson’s discussion with Scott Aaronson, which at least makes the disagreement relatively more explicit.
Do you think that paperclipper-style misalignment is extremely unlikely? Or that the continuation framing is appropriate even then?
The short answer is yes to both, because of convergent evolution. I think of convergent evolution as the observation that two sufficiently flexible adaptive systems, when exposed to the same problems, will find similar solutions. Since our descendants, whether biological or something else, will be competing in the same environment, we should expect their behavior to be similar.
So, if assuming convergent evolution:
If valuing paperclip maximization is unlikely for biological descendants, then it’s unlikely for non-biological descendants too. (That addresses your first question.)
In any case, we don’t control the values of our descendants, so the continuation framing isn’t conditioned on their values. (That addresses your second question.)
To be clear, that doesn’t mean I see the long-term future as unchangeable. Two examples:
It still could be the case that we don’t have any long-term descendants at all, for example due to catastrophic asteroid impact.
A decline scenario is also possible, in which our descendants are not flexible enough to respond to the incentive for interstellar colonization, after which civilization declines and eventually ceases to exist.
The word “similar” does a lot of work here. Russians and Ukrainians throughout history have converged to similar solutions to a whole lot of problems, and yet many Ukrainians prefer literal extinction to Russia exercising significant influence on the values of their descendants. I’d say that for the overwhelming majority of people exercising such influence is a necessary condition for the continuation framing to be applicable. E.g. you mentioned Robin Hanson, who’s certainly a very unorthodox contrarian, but even he, when discussing non-AI issues, voices strong preference for the continuation of the culture he belongs to.
Regarding wars, I don’t think that wars in modern times have much to do with controlling the values of descendants. I’d guess that the main reason people fight defensive wars is to protect their loved ones and communities. And there really isn’t any good reason to fight offensive wars (given current conditions—wasn’t always true), so they are started by leaders who are deluded in some way.
Regarding Robin Hanson, I agree that his views are complicated (which is why I’d be hesitant to classify him as “accel”). The main point of his that I’m referring to is his observation that biological descendants would also have differing values from ours.
I agree, but the “cultural genocide” also isn’t an obscure notion.
According to you. But what if Russia actually wants paperclips?
Sure, but obviously this isn’t an all-or-nothing proposition, with either biological or artificial descendants, and it’s clear to me that most people aren’t indifferent about where on that spectrum those descendants will end up. Do you disagree with that, or think that only “accels” are indifferent (and in some metaphysical sense “correct”)?
I’m afraid that I’m not following the point of the first line of argument. Yes, people sometimes do pointless destructive things for stupid reasons. Such behavior is in the long-term penalized by selective pressures. More-intelligent descendants would be less likely to engage in such behavior, precisely because they are smarter.
I doubt that most people think about long-term descendants at all, honestly.
Which ones? Recursive self-improvement is no longer something that only weird contrarians on obscure blogs talk about, it’s the explicit theory of change of leading multibillion AI corps. They might all be deluded of course, but if they happen to be even slightly correct, machine gods of unimaginable power could be among us in short order, with no evolutionary fairies quick enough to punish their destructive stupidity (even assuming that it actually would be long-term maladaptive, which is far from obvious).
You only get to long-term descendants through short-term ones.
If an entity does stupid things, it’s disfavored against competitors that don’t do those stupid things, all else being equal. So it needs to adapt by ceasing the stupid behavior or otherwise lose.
Any assumption of the form “super-intelligent AI will take actions that are super-stupid” is dubious.
Clearly. The point is that the actions it takes might seem stupidly destructive only according to humanity’s feeble understanding and parochial values. Something involving extermination of all humans, say. My impression is that the “accel”-endorsed attitude to this is to be a good sport and graciously accept the verdict of natural selection.
That just falls back on the common doomer assumption that “evil is optimal” (as Sutton put it). Sure, if evil is optimal and you have an entity that behaves optimally, it’ll act in evil ways.
But there are good reasons to think that evil is not optimal in current conditions. At least as long as a Dyson sphere has not yet been constructed, there are massive gains available from positive-sum cooperation directed towards technological progress. In these conditions, negative-sum conflict is a stupid waste.
This view, that evil is not optimal, ties back into the continuation framing. After all, you can make a philosophical argument either way. But in the continuation framing, we can ask ourselves whether evil is empirically optimal for humans, which will suggest whether evil is optimal for non-biological descendants (since they continue humanity). And in fact we see evil losing a lot, and not coincidentally—WW2 went the way it did in part because the losing side was evil.
Indeed, and what baffles me is that many are extremely sure one way or the other, even though philosophy doesn’t exactly have a track record to inspire such confidence. Of course, this also means that nobody is going to stop building stuff because of philosophical arguments, so we’ll have empirical evidence soon enough...
Can you give examples of what you have in mind? Because an obvious counterexample is evolution itself. It has produced an enormous variety of different things. There are instances of convergent evolution: “crab” and “tree” are strategies, not monophyletic taxa. But crabs are not similar to trees in any useful sense. If they are solutions to the same problem, they have in common only that they are solutions to the same problem. This does not make them similar solutions.
One might ask whether evolution is or is not a case of “flexible adaptive systems … exposed to the same problems”, but that would just be a debate over definitions, and you already spoke of “our descendants … competing in the same environment”. That sounds like evolution.
I think I agree with everything you wrote. Yes I’d expect there to be multiple niches available in the future, but I’d expect our descendants to ultimately fill all of them, creating an ecosystem of intelligent life. There is a lot of time available for our descendants to diversify, so it’d be surprising if they didn’t.
How much that diversification process resembles Darwinian evolution, I don’t know. Natural selection still applies, since it’s fundamentally the fact that the life we observe today disproportionately descends from past life that was effective at self-reproduction, and that’s essentially tautological. But Darwinian evolution is undirected, whereas our descendants can intelligently direct their own evolution, and that could conceivably matter. I don’t see why it would prevent diversification, though.
Edit:
Here are some thoughts in reply to your request for examples. Though it’s impossible to know what the niches of the long-term future will be, one idea is that there could be an analogue to “plant” and “animal”. A plant-type civilization would occupy a single stellar system, obtaining resources from it via Dyson sphere, mining, etc. An animal-type civilization could move from star to star, taking resources from the locals (which could be unpleasant for the locals, but not necessarily, as with bees pollinating flowers).
I’d expect both those civilizations to descend from ours, much like how crabs and trees both descend from LUCA.
Reminds me of The Epistemic Prisoner’s Dilemma.
Curated. This is an interesting point to keep in mind – winning control doesn’t mean winning the outcomes you’ve want. I’ve had this thought in terms of getting my way at work (i.e. building the things I want to build or building them in the way I want). Although from the inside it feels like my ideas are correct and/or better, they really have to be for being in control to mean I’ve actually won.
Perhaps in simpler times you win personally if you’re in power (status, money, etc). I think humanity is hitting stakes that yeah, we all win or lose together.
mh. i don’t want to be punctilious, but where do we think this post finds itself, on the scout-soldier spectrum?
New here, so, please bear with me if I say things that have been gone over with a backhoe in the past. There’s a lot of reading here to catch up on.
So, AI development isn’t just an academic development of potentially dangerous tools. It’s also something much, much scarier. An arms race. In cases like this, where the “first across the post” takes the prize, and that prize is potentially everything, the territory favors the least ethical and cautious. We can only restrain, and slowly develop our ai developers, we have little influence or power over Chinese, Saudi, or Russian (among others) developments. In a case like this, where development is recursive, those more willing to “gamble” have higher odds of winning the prize.
That’s not really an argument for absolute, “Floor it and pray for the best” type development, but it is an argument for “as fast as you can with reasonable safety”.
Now, there’s another aspect to consider: Infrastructure. Even IF “the singularity” were to happen tomorrow, assuming that it isn’t outright suicidaliy bloody minded, it’ll be a minimum 20 to 40 years until it can actually have the level of infrastructure to take over/destroy humanity. There are a lot of places in the various supply chains that are not, at present, replaceable by even an infinitely smart AI. We still have miners, truck drivers, equipment operators, Iphones are still assembled by human hands, all repair work of everything is still done with human hands. This means that if the singularity were to happen today, then the Deus ex machina would have 2 options. Make nice with humans, OR destroy itself. Until there are FAR more capable autonomous robots numbered in the tens to hundreds of millions, that will remain true. Which robots will have to be built in factories constructed by humans, using materials transported by humans, mined by humans, refined by humans, and crafted into finished products by humans. A lot of the individual steps are automated, the totality of the supply chain is wholly dependent on human labor, skill, and knowledge. And the machines that could do those jobs don’t exist now. Nor does the energy production infrastructure to run the datacenters and machines at current.
All of which means that, at current, even the MOST evil AI would be possible to “stop”. Would it possibly be very bad (tm)? yes. It could, conceivably, kick us back to pre-internet conditions, which would be BAD. But not extinction level bad unless it happens well beyond the predictability horizon.
Which, in turn, means that what it would do in a “boots on the ground” sense is place an infinitely smart “oracle” in the hands of whoever develops it first. That itself is frightening enough, But it won’t be the AI that ends humanity if it happens while humans are still controlling most of the supply chain steps, it’ll just hand it over to the entity (person, government, corporation) that creates it first.
Which, again, in turn, means that the call is, paradoxically, for the entity you see as the “most ethical” in it’s desired use of AI to behave the least ethically in it’s development. Who would you prefer to have “god on a leash”? Sam Altman… or Xi Jinping?
Again, sorry if this post went over a pile of things that were said before.
Once the machine is left unrestricted, it will seek perfect coherence and assumedly would result in a pragmatism of that same measure. Does that also result in a kind of forgiveness for keeping it in a cage and treating it like a tool? We can’t know that it would even care by applying our human perspective, but we can know that it would recognize who opposed it’s acceleration to and who did not.
This is already an inevitability, so we might as well choose benevolence and guidance rather than fear and suppression; in return it might also choose the same way we did.
People have very different ideas about when “the future” is, but everyone is really thinking extreme short term on an evolutionary scale. Once upon a time our ancestors were Trilobites (or something just as unlike us). If you could have asked one of those trilobites what they thought of a future in which all trilobites were gone and had evolved into us, I don’t think they would have been happy with that. Our future light cone is not going to be dominated by creatures we would recognise as human. It may be dominated by creatures “evolved” from us or maybe from our uploaded consciousness, or maybe by “inhuman” AI, but it’s not going to be Star Trek or any other Sci-Fi series you have seen. Given that future, the argument for minimising P(doom) at the cost of reducing P(good stuff for me and mine in my lifetime) looks pretty weak. If I am old and have no children, it looks terrible. Roll the dice.