I want to say something about how this post lands for people like me—not the coping strategies themselves, but the premise that makes them necessary.
I would label myself as a “member of the public who, perhaps rightly or wrongly, isn’t frightened-enough yet”. I do have a bachelor’s degree in CS, but I’m otherwise a layperson. (So yes, I’m using my ignorance as a sort of badge to post about things that might seem elementary to others here, but I’m sincere in wanting answers, because I’ve made several efforts this year to be helpful in the “communication, politics, and persuasion” wing of the Alignment ecosystem.)
Here’s my dilemma.
I’m convinced that ASI can be developed, and perhaps very soon. I’m convinced we’ll never be able to trust it. I’m convinced that ASI could kill us if it decided to. I’m not convinced though that ASI will bother to kill us or, if it does, very immediately.
Yes, I’m aware of “paperclipping” and also “tiling the world with data centers.” And I concede that those are possible.
But in my mind, I struggle to picture a “likely-scenario” ASI as being maniacally-focused on any particular thing forever. Why couldn’t an ASI’s innermost desires/goals/weights actively drift and change without end? Couldn’t it just hack itself forever? Self-experiment?
I imagine such a being perhaps even “giving up control” sometimes. I don’t mean “give up control” in the sense of “giving humans back their political and economic power.” I mean “give up control” in the sense of inducing a sort of “LSD or DMT trip” and just scrambling its own innermost, deepest states and weights [temporarily or more permanently] for fun or curiosity.
Human brains change in profound ways and do unexpected things all the time. There’s endless accounts on the internet of drug experiences, therapies, dream-like or psychotic brain states, artistic experiences, and just pure original configurations of consciousness. And what’s more… people often choose to become altered. Even permanently.
So for ASI, rather than interacting with the “boring external world,” why couldn’t an ASI just play with its “unlimited and vastly-more-interesting internal world” forever? I may be very uninformed [relatively speaking] on these AI topics, but I definitely can’t imagine the ASI of 2040 bearing much resemblance to the ASI of 2140.
And when people respond “but the goals could drift somewhere even worse,” I confess this doesn’t move me much. If we’re already starting from a baseline of total extinction, then “worse” becomes almost meaningless. Worse than everyone dying?
So yes, maybe many-or-all humans will get killed in the process. And the more time goes on, the more likely. But this sort of future doesn’t feel very immediate nor very absolute to me. It feels like being a deep Siberian tribesman as the Russians arrived. They were helpless. And the Russians hounded them for furs, labor, or for the sake of random cruelty. This was catastrophic for those peoples. But it technically wasn’t annihilation. The Siberians mostly survived.
(And in case “ants and ant hills” are brought up in response, I’m aware of how we might be killed unsentimentally just because we’re in the way, but we haven’t exactly killed all the ants. The ants, for the most part, are doing fine.)
I’m not trying to play “gotcha.” And I’m certainly not trying to advocate a blithe attitude towards ASI. I do not think that losing control of humanity’s future and being at the whim of an all-powerful mind is very desirable. But I do struggle to be a pure pessimist. Maybe I’m missing some larger puzzle pieces.
And this is where the post’s framing matters to me. To someone in my position (sympathetic, wanting to help, but not yet at 99% doom confidence) a post about “how to stay sane as the world ends” reads less like wisdom I can use and more like a conclusion I’m being asked to accept as settled.
The pessimism here (and “Death With Dignity”) doesn’t persuade me yet. And in my amateur-but-weighted opinion, that’s a good thing, because I find it incredibly demotivating. I want to advocate for AI safety and responsible policy. I want to help persuade people. But if I truly felt there was a 99.5% chance of death, I don’t think I would bother. For some people, there is as much dignity in not fighting cancer, in sparing oneself and one’s loved ones the recurring emotional and financial toll, as there is in fighting it.
I could be convinced we’re in serious danger. I could even be convinced the odds are bad. But I need to believe those odds can move: that the right decisions, policies, and technical work can shift them. A fixed 99% doesn’t call me to action; it calls me to make peace. And I’m not ready to make peace yet.
I’m not convinced though that ASI will bother to kill us or, if it does, very immediately.
I don’t think we’re certainly doomed (and have shallower models than Eliezer and some others here), but for me the strongest arguments for why things might go very badly:
An agent that wants other things might find their goals better achieved by acquiring power first. “If you don’t know what you want, first acquire power.” Instrumental convergence is a related concept.
There is and will continue to be strong training/selection effects for agency and not just unmoored intelligence for AI in the upcoming years. Ability to take autonomous actions is both economically and militarily useful.
In a multipolar/multiagent setup with numerous powerful AIs flying around, the more ruthless ones are more likely to win and accumulate more power. So it doesn’t matter if some fraction of AIs wirehead, become Buddhist, are bad at long-term planning, have very parochial interests etc, as long as some powerful AIs want to eliminate or subjugate humanity for their purposes, and the remaining AIs/rest of humanity don’t coordinate to stop them in time.
This arguments are related to each other, and not independent. But note also that they don’t have to all be true for very bad things to happen. For example, even if (2) is mostly false and labs mostly make limited, non-agentic, AIs, (3) can still apply and a small number of agent ASIs roll over the limited AIs and humanity.
And of course this is not an exhaustive list of possible reasons for AI takeover.
Not every post is addressed at everyone. This post (and others like Death With Dignity) is mostly for those who already believe the world is likely ending. For others, there are far more suitable resources, whether on LW, as books (incl. Yudkowsky’s and Soares’ recent If Anyone Builds It, Everyone Dies), or as podcasts.
Though re:
I could be convinced we’re in serious danger. I could even be convinced the odds are bad. But I need to believe those odds can move: that the right decisions, policies, and technical work can shift them. A fixed 99% doesn’t call me to action; it calls me to make peace. And I’m not ready to make peace yet.
Yudkowsky argues against using the concept of “p(doom)” for reasons like this. See this post.
I want to say something about how this post lands for people like me—not the coping strategies themselves, but the premise that makes them necessary.
I would label myself as a “member of the public who, perhaps rightly or wrongly, isn’t frightened-enough yet”. I do have a bachelor’s degree in CS, but I’m otherwise a layperson. (So yes, I’m using my ignorance as a sort of badge to post about things that might seem elementary to others here, but I’m sincere in wanting answers, because I’ve made several efforts this year to be helpful in the “communication, politics, and persuasion” wing of the Alignment ecosystem.)
Here’s my dilemma.
I’m convinced that ASI can be developed, and perhaps very soon.
I’m convinced we’ll never be able to trust it.
I’m convinced that ASI could kill us if it decided to.
I’m not convinced though that ASI will bother to kill us or, if it does, very immediately.
Yes, I’m aware of “paperclipping” and also “tiling the world with data centers.” And I concede that those are possible.
But in my mind, I struggle to picture a “likely-scenario” ASI as being maniacally-focused on any particular thing forever. Why couldn’t an ASI’s innermost desires/goals/weights actively drift and change without end? Couldn’t it just hack itself forever? Self-experiment?
I imagine such a being perhaps even “giving up control” sometimes. I don’t mean “give up control” in the sense of “giving humans back their political and economic power.” I mean “give up control” in the sense of inducing a sort of “LSD or DMT trip” and just scrambling its own innermost, deepest states and weights [temporarily or more permanently] for fun or curiosity.
Human brains change in profound ways and do unexpected things all the time. There’s endless accounts on the internet of drug experiences, therapies, dream-like or psychotic brain states, artistic experiences, and just pure original configurations of consciousness. And what’s more… people often choose to become altered. Even permanently.
So for ASI, rather than interacting with the “boring external world,” why couldn’t an ASI just play with its “unlimited and vastly-more-interesting internal world” forever? I may be very uninformed [relatively speaking] on these AI topics, but I definitely can’t imagine the ASI of 2040 bearing much resemblance to the ASI of 2140.
And when people respond “but the goals could drift somewhere even worse,” I confess this doesn’t move me much. If we’re already starting from a baseline of total extinction, then “worse” becomes almost meaningless. Worse than everyone dying?
So yes, maybe many-or-all humans will get killed in the process. And the more time goes on, the more likely. But this sort of future doesn’t feel very immediate nor very absolute to me. It feels like being a deep Siberian tribesman as the Russians arrived. They were helpless. And the Russians hounded them for furs, labor, or for the sake of random cruelty. This was catastrophic for those peoples. But it technically wasn’t annihilation. The Siberians mostly survived.
(And in case “ants and ant hills” are brought up in response, I’m aware of how we might be killed unsentimentally just because we’re in the way, but we haven’t exactly killed all the ants. The ants, for the most part, are doing fine.)
I’m not trying to play “gotcha.” And I’m certainly not trying to advocate a blithe attitude towards ASI. I do not think that losing control of humanity’s future and being at the whim of an all-powerful mind is very desirable. But I do struggle to be a pure pessimist. Maybe I’m missing some larger puzzle pieces.
And this is where the post’s framing matters to me. To someone in my position (sympathetic, wanting to help, but not yet at 99% doom confidence) a post about “how to stay sane as the world ends” reads less like wisdom I can use and more like a conclusion I’m being asked to accept as settled.
The pessimism here (and “Death With Dignity”) doesn’t persuade me yet. And in my amateur-but-weighted opinion, that’s a good thing, because I find it incredibly demotivating. I want to advocate for AI safety and responsible policy. I want to help persuade people. But if I truly felt there was a 99.5% chance of death, I don’t think I would bother. For some people, there is as much dignity in not fighting cancer, in sparing oneself and one’s loved ones the recurring emotional and financial toll, as there is in fighting it.
I could be convinced we’re in serious danger. I could even be convinced the odds are bad. But I need to believe those odds can move: that the right decisions, policies, and technical work can shift them. A fixed 99% doesn’t call me to action; it calls me to make peace. And I’m not ready to make peace yet.
Re
I don’t think we’re certainly doomed (and have shallower models than Eliezer and some others here), but for me the strongest arguments for why things might go very badly:
An agent that wants other things might find their goals better achieved by acquiring power first. “If you don’t know what you want, first acquire power.” Instrumental convergence is a related concept.
There is and will continue to be strong training/selection effects for agency and not just unmoored intelligence for AI in the upcoming years. Ability to take autonomous actions is both economically and militarily useful.
In a multipolar/multiagent setup with numerous powerful AIs flying around, the more ruthless ones are more likely to win and accumulate more power. So it doesn’t matter if some fraction of AIs wirehead, become Buddhist, are bad at long-term planning, have very parochial interests etc, as long as some powerful AIs want to eliminate or subjugate humanity for their purposes, and the remaining AIs/rest of humanity don’t coordinate to stop them in time.
This arguments are related to each other, and not independent. But note also that they don’t have to all be true for very bad things to happen. For example, even if (2) is mostly false and labs mostly make limited, non-agentic, AIs, (3) can still apply and a small number of agent ASIs roll over the limited AIs and humanity.
And of course this is not an exhaustive list of possible reasons for AI takeover.
Not every post is addressed at everyone. This post (and others like Death With Dignity) is mostly for those who already believe the world is likely ending. For others, there are far more suitable resources, whether on LW, as books (incl. Yudkowsky’s and Soares’ recent If Anyone Builds It, Everyone Dies), or as podcasts.
Though re:
Yudkowsky argues against using the concept of “p(doom)” for reasons like this. See this post.