I’ll explain my reasoning in a second, but I’ll start with the conclusion:

I think it’d be healthy and good to pause and seriously reconsider the focus on doom if we get to 2028 and the situation feels basically like it does today.

I don’t know how to really precisely define “basically like it does today”. I’ll try to offer some pointers in a bit. I’m hoping folk will chime in and suggest some details.

Also, I don’t mean to challenge the doom focus right now. There seems to be some good momentum with AI 2027 and the Eliezer/Nate book. I even preordered the latter.

But I’m still guessing this whole approach is at least partly misled. And I’m guessing that fact will show up in 2028 as “Oh, huh, looks like timelines are maybe a little longer than we once thought. But it’s still the case that AGI is actually just around the corner….”

A friend described this narrative phenomenon as something like the emotional version of a Shepard tone. Something that sure seems like it’s constantly increasing in “pitch” but is actually doing something more like looping.

(The “it” here is how people talk about the threat of AI, just to be clear. I’m not denying that AI has made meaningful advances in the last few years, or that AI discussion became more mainstream post LLM explosion.)

I’ll spell out some of my reasoning below. But the main point of my post here is to be something folk can link to if we get to 2028 and the situation keeps seeming dire in basically the same increasing way as always. I’m trying to place something loosely like a collective stop loss order.

Maybe my doing this will be irrelevant. Maybe current efforts will sort out AI stuff, or maybe we’ll all be dead, or maybe we’ll be in the middle of a blatant collapse of global supply chains. Or something else that makes my suggestion moot or opaque.

But in case it’s useful, here’s a “pause and reconsider” point. Available for discussion right now, but mainly as something that can be remembered and brought up again in 31 months.

Okay, on to some rationale.

Inner cries for help

Sometimes my parents talk about how every generation had its looming terror about the end of the world. They tell me that when they were young, they were warned about how the air would become literally unbreathable by the 1970s. There were also dire warnings about a coming population collapse that would destroy civilization before the 21st century.

So their attitude upon hearing folks’ fear about overpopulation, and handwringing around Y2K, and when Al Gore was beating the drum about climate change, and terror about the Mayan calendar ending in 2012, was:

Oh. Yep. This again.

Dad would argue that this phenomenon was people projecting their fear of mortality onto the world. He’d say that on some level, most people know they’re going to die someday. But they’re not equipped to really look at that fact. So they avoid looking, and suppress it. And then that unseen yet active fear ends up coloring their background sense of what the world is like. So they notice some plausible concerns but turn those concerns into existential crises. It’s actually more important to them at that point that the problems are existential than that they’re solved.

I don’t know that he’s right. In particular, I’ve become a little more skeptical that it’s all about mortality.

But I still think he’s on to something.

It’s hard for me not to see a similar possibility when I’m looking around AI doomerism. There’s some sound logic to what folk are saying. I think there’s a real concern. But the desperate tone strikes me as… something else. Like folk are excited and transfixed by the horror.

I keep thinking about how in the 2010s it was extremely normal for conversations at rationalist parties to drift into existentially horrid scenarios. Things like infinite torture, and Roko’s Basilisk, and Boltzmann brains. Most of which are actually at best irrelevant to discuss. (If I’m in a Boltzmann brain, what does it matter?)

Suppose that what’s going on is, lots of very smart people have preverbal trauma. Something like “Mommy wouldn’t hold me”, only from a time before there were mental structures like “Mommy” or people or even things like object permanence or space or temporal sequence. Such a person might learn to embed that pain such that it colors what reality even looks like at a fundamental level. It’s a psycho-emotional design that works something like this:

If you imagine that there’s something like a traumatized infant inside such people, then its primary drive is to be held, which it does by crying. And yet, its only way of “crying” is to paint the subjective experience of world in the horror it experiences, and to use the built-up mental edifice it has access to in order to try to convey to others what its horror is like.

If you have a bunch of such people getting together, reflecting back to one another stuff like

OMG yes, that’s so horrid and terrifying!!!

…then it feels a bit like being heard and responded to, to that inner infant. But it’s still not being held, and comforted. So it has to cry louder. That’s all it’s got.

But what that whole process looks like is, people reflecting back and forth how deeply fucked we are. Getting consensus on doom. Making the doom seem worse via framing effects and focusing attention on the horror of it all. Getting into a building sense of how dire and hopeless it all is, and how it’s just getting worse.

But it’s from a drive to have an internal agony seen and responded to. It just can’t be seen on the inside as that, because the seeing apparatus is built on top of a worldview made of attempts to get that pain met. There’s no obvious place outside the pain from which to observe it.

I’m not picky about the details here. I’m also not sure this is whatsoever what’s going on around these parts. But it strikes me as an example type in a family of things that’s awfully plausible.

It’s made even worse by the fact that it’s possible to name real, true, correct problems with this kind of projection mechanism. Which means we can end up in an emotional analogue of a motte-and-bailey fallacy: attempts to name the emotional problem get pushed away because naming it makes the real problem seem less dire, which on a pre-conceptual level feels like the opposite of what could possibly help. And the arguments for dismissing the emotional frame get based on the true fact that the real problem is in fact real. So clearly it’s not just a matter of healing personal trauma!

(…and therefore it’s mostly not about healing personal trauma, so goes the often unstated implication (best as I can tell).)

But the invitation is to address the doom feeling differently, not to ignore the real problem (or at least not indefinitely). It’s also to consider the possibility that the person in question might not be perceiving the real problem objectively because their inner little one might be using it as a microphone and optimizing what’s “said” for effect, not for truth.

I want to acknowledge that if this isn’t at all what’s going on in spaces like Less Wrong, it might be hard to demonstrate that fact conclusively. So if you’re really quite sure that the AI problem is basically as you think it is, and that you’re not meaningfully confused about it, then it makes a lot of sense to ignore this whole consideration as a hard-to-falsify distraction.

But I think that if we get to 2028 and we see more evidence of increasing direness than of actual manifest doom, it’ll be high time to consider that internal emotional work might be way, way, way more central to creating something good than is AI strategizing. Not because AI doom isn’t plausible, but because it’s probably not as dire as it always seems, and there’s a much more urgent problem demanding attention first before vision can become clear.

Scaring people

In particular, it strikes me that the AI risk community orbiting Less Wrong has had basically the same strategy running for about two decades. A bunch of the tactics have changed, but the general effort occurs to me as the same.

The gist is to frighten people into action. Usually into some combo of (a) donating money and (b) finding ways of helping to frighten more people the same way. But sometimes into (c) finding or becoming promising talent and funneling them into AI alignment research.

That sure makes sense if you’re trapped in a house that’s on fire. You want the people trapped with you to be alarmed and to take action to solve the problem.

But I think there’s strong reason to think this is a bad strategy if you’re trapped in a house that’s slowly sinking into quicksand over the course of decades. Not because you’ll all be any less dead for how long it takes, but because activating the fight-or-flight system for that long is just untenable. If everyone gets frightened but you don’t have a plausible pathway to solving the problem in short order, you’ll end up with the same deadly scenario but now everyone will be exhausted and scared too.

I also think it’s a formula for burnout if it’s dire to do something about a problem but your actions seem to have at best no effect on said problem.

I’ve seen a lot of what I’d consider unwholesomeness over the years that I think is a result of this ongoing “scare people into action about AI risk” strategy. A ton of “the ends justify the means” thinking, and labeling people “NPCs”, and blatant Machiavellian tactics. Inclusion and respect with words but an attitude of “You’re probably not relevant enough for us to take seriously” expressed with actions and behind closed doors. Amplifying the doom message. Deceit about what projects are actually for.

I think it’s very easy to lose track of wholesome morality when you’re terrified. And it can be hard to remember in your heart why morality matters when you’re hopeless and burned out.

(Speaking from experience! My past isn’t pristine here either.)

Each time it’s seemed like “This could be the key thing! LFG!!!” So far the results of those efforts seem pretty ambiguous. Maybe a bunch of them actually accelerated AI timelines. It’s hard to say.

Maybe this time is different. With AI in the Overton window and with AI 2027 going viral, maybe Nate & Eliezer’s book can shove the public conversation in a good direction. So maybe this “scare people into action” strategy will finally pay off.

But if it’s still not working when we hit 2028, I think it’ll be a really good time to pause and reconsider. Maybe this direction is both ineffective and unkind. Not as a matter of blame and shame; I think it has made sense to really try. But 31 months from now, it might be really good to steer this ship in a different direction, as a pragmatic issue of sincerely caring for what’s important to us all going forward.

A shared positive vision

I entered the rationality community in 2011. At that time there was a lot of excitement and hope. The New York rationalist scene was bopping, meetups were popping up all over the world, and lots of folk were excited about becoming Beisutsukai. MIRI (then the Singularity Institute for Artificial Intelligence) was so focused on things like the Visiting Summer Fellows Program and what would later be called the Rationality Mega Camp that they weren’t getting much research done.

That was key to what created CFAR. There was a need to split off “offer rationality training” from “research AI alignment” so that the latter could happen at all.

(I mean, I’m sure some was happening. But it was a pretty big concern at the time. Some big donors were getting annoyed that their donations weren’t visibly going to the math project Eliezer was so strongly advocating for.)

At the time there was shared vision. A sense that more was possible. Maybe we could create a movement of super-competent super-sane people who could raise the sanity waterline in lots of different domains, and maybe for the human race as a whole, and drown out madness everywhere that matters. Maybe powerful and relevant science could become fast. Maybe the dreams of a spacefaring human race that mid 20th century sci-fi writers spoke of could become real, and even more awesome than anyone had envisioned before. Maybe we can actually lead the charge in blessing the universe with love and meaning.

It was vague as visions go. But it was still a positive vision. It drove people to show up to CFAR’s first workshops for instance. Partly out of fear of AI, sure, but at least partly out of excitement and hope.

I don’t see or hear that kind of focus here anymore. I haven’t for a long time.

I don’t just mean there’s cynicism about whether we can go forth and create the Art. I watched that particular vision decay as CFAR muddled along making great workshops but turning no one into Ender Wiggin. It turns out we knew how to gather impressive people but not how to create them.

But that’s just one particular approach for creating a good and hopeful future.

What I mean is, nothing replaced that vision.

I’m sure some folk have shared their hopes. I have some. I’ve heard a handful of others. I think Rae’s feedbackloop-first rationality is a maybe promising take on the original rationality project.

But there isn’t anything like a collective vision for something good. Not that I’m aware of.

What I hear instead is:

AI will probably kill us all soon if we don’t do something. Whatever that “something” is.
If anyone builds it, everyone dies. And right now lots of big powerful agents are racing to build it.
We’re almost certainly doomed at this point, and all that’s left is to die with dignity.
Is it ethical to have children right now, since they probably won’t get to grow up and have lives?
No point in saving for retirement. We won’t live that long.

It reminds me of this:

Very young children (infants & toddlers) will sometimes get fixated on something dangerous to them. Like they’ll get a hold of a toxic marker and want to stick it in their mouth. If you just stop them, they’ll get frustrated and upset. Their whole being is oriented to that marker and you’re not letting them explore the way they want to.

But you sure do want to stop them, right? So what do?

Well, you give them something else. You take the marker away and offer them, say, a colorful whisk.

It’s no different with dogs or cats, really. It’s a pretty general thing. Attentional systems orient toward things. “Don’t look here” is much harder than “Look here instead.”

So if you notice a danger, it’s important to acknowledge and address, but you also want to change orientation to the outcome you want.

I’ve been seriously concerned for the mental & emotional health of this community for a good while now. Its orientation, as far as I can tell, is to “not AI doom”. Not a bright future. Not shared wholesomeness. Not healthy community. But “AI notkilleveryoneism”.

I don’t think you want to organize your creativity that way. Steering toward doom as an accidental result of focusing on it would be… really quite ironic and bad.

(And yes, I do believe we see evidence of exactly this pattern. Lots of people have noticed that quite a lot of AI risk mitigation efforts over the last two decades seem to have either (a) done nothing to timelines or (b) accelerated timelines. E.g. I think CFAR’s main contribution to the space is arguably in its key role in inspiring Elon Musk to create OpenAI.)

My guess is most folk here would be happier if they picked a path they do want and aimed for that instead, now that they’ve spotted the danger they want to avoid. I bet we stand a much better chance of building a good future if we aim for one, as opposed to focusing entirely on not hitting the doom tree.

If we get to 2028 and there isn’t yet such a shared vision, I think it’d be quite good to start talking about it. What future do we want to see? What might AI going well actually look like, for instance? Or what if AI stalls out for a long time, but we still end up with a wholesome future? What’s that like? What might steps in that direction look like?

I think we need stuff like this to be whole, together.

Maybe it’ll be okay

In particular, I think faith in humanity as a whole needs to be thinkable.

Yes, most people are dumber than the average Lesswronger. Yes, stupidity has consequences that smart people can often foresee. Yes, maybe humanity is too dumb not to shoot itself in the foot with a bazooka.

But maybe we’ve got this.

Maybe we’re all in this together, and on some level that matters, we all know it.

I’m not saying that definitely is the case. I’m saying it could be. And that possibility seems worth taking to heart.

I’m reminded of a time when I was talking with a “normie” facilitator at a Circling retreat. I think this was 2015. I was trying to explain how humanity seemed to be ignoring its real problems, and how I was at that retreat trying to become more effective at doing something about it all.

I don’t remember his exact words, but the sentiment I remember was something like:

I don’t understand everything you’re saying. But you seem upset, man. Can I give you a hug?

I didn’t think that mattered, but I like hugs, so I said yes.

And I started crying.

I think he was picking up on a level I just wasn’t tracking. Sure, my ideas were sensible and well thought out. But underneath all that I was just upset. He noticed that undercurrent and spoke to and met that part, directly.

He didn’t have any insights about how we might solve existential risk. I don’t know if he even cared about understanding the problem. I didn’t walk away being more efficient at creating good AI alignment researchers.

But I felt better, and met, and cared for, and connected.

I think that matters a lot.

I suspect there’s a lot going on like this. That at least some of the historical mainstream shrugging around AI has been because there’s some other level that also deeply matters that’s of more central focus to “normies” than to rationalists.

I think it needs to be thinkable that the situation is not “AI risk community vs. army of ignorant normie NPCs”. Instead it might be more like, there’s one form of immense brilliance in spaces like Less Wrong. And what we’re all doing, throughout the human race, is figuring out how to interface different forms of brilliance such that we can effectively care for what’s in our shared interests. We’re all doing it. It just looks really different across communities, because we’re all attending to different things and therefore reach out to each other in very different ways. And that’s actually a really good thing.

My guess is that it helps a lot when communities meet each other with an attitude of

We’re same-sided here. We’re in this together. That doesn’t mean we yet know how to get along in each other’s terms. But if it’s important, we’ll figure it out, even if “figure it out” doesn’t look like what either of us expect at the start. We’ll have to learn new ways of relating. But we can get there.

Come 2028, I hope Less Wrong can seriously consider for instance retiring terms like “NPC” and “normie”, and instead adopt a more humble and cooperative attitude toward the rest of the human race. Maybe our fellow human beings care too. Maybe they’re even paying vivid attention. It just might look different than what we’re used to recognizing in ourselves and in those most like us.

And maybe also consider that even if we don’t yet see how, and even if the transition is pretty rough at times, it all might turn out just fine. We don’t know that it will. I don’t mean to assert that it will. I mean, let’s sincerely hold and attend to the possibility that it could. Maybe it’ll all be okay.

Come 2028…

I want to reiterate that I don’t mean what’s going on right now is wrong and needs to stop. Like I said, I preordered If Anyone Builds It, Everyone Dies. I don’t personally feel the need to become more familiar with those arguments or to have new ones. And I’m skeptical about the overall approach. But it seems like a really good push within this strategy, and if it makes things turn out well, then I’d be super happy to be wrong here. I support the effort.

But we now have this plausible timeline spelled out. And by January 2028 we’ll have a reasonably good sense of how much it got right, and wrong.

…with some complication. It’s one of those predictions that interacts with what it’s predicting. So if AI 2027 doesn’t pan out, one could argue it’s because it might have but it changed because the prediction went viral. And therefore we should keep pushing the same strategy as before, because maybe now it’s finally working!

But I’m hoping for a few things here.

One is, maybe we can find a way to make these dire predictions less unfalsifiable. Not in general, but specifically AI 2027. What differences should we expect to see if (a) the predictions were distorted due to the trauma mechanism I describe in this post vs. (b) the act of making the predictions caused them not to come about? What other plausible outcomes are there come 2028, and what do we expect sensible updates to look like at that point?

Another hope I have is that the trauma projection thing can be considered seriously. Not necessarily acted on just yet. That could be distracting. But it’s worth recognizing that if the trauma thing is really a dominant force in AI doomerism spaces, then when we get to January 2028 we might not have hit AI doom but it’s going to seem like there are still lots of reasons to keep doing basically the same thing as before. How can we anticipate this reaction, distinguish it from other outcomes, and appropriately declare an HMC event if and when it happens?

So, this post is my attempt at kind of a collective emotional stop loss order.

I kind of hope it turns out to be moot. Because in the world where it’s needed, that’s yet another 2.5 years of terror and pain that we might have skipped if we could have been convinced a bit sooner.

But being convinced isn’t an idle point. It matters that maybe nothing like what I’m naming in this post is going on. There needs to be a high-integrity way of checking what’s true here first.

I’m hoping I’ve put forward a good compromise.

Let’s discuss for now, and then check in about it in 31 months.

Consider chilling out in 2028

Inner cries for help

Scaring people

A shared positive vision

Maybe it’ll be okay

Come 2028…