x-risk-themed

kave6 May 2026 15:16 UTC

115 points

Sometimes, a friend who works around here, at an x-risk-themed organisation, will think about leaving their job. They’ll ask a group of people “what should I do instead?”. And everyone will chime in with ideas for other x-risk-themed orgs that they could join. A lot of the conversation will be about who’s hiring, what the pay is, what the work-life balance is like, or how qualified the person is for the role.

Sometimes the conversation focuses on what will help with x-risk, and where people are dropping the ball. But often, that’s not the focus. In those conversations, people seem mostly worried about where they’ll thrive. And I think that’s often the correct concern.

Most people aren’t in crunch mode, in super short timelines mode; even if their models would license that, I think they don’t know how to do it without throwing their minds away or Pascal’s mugging themselves. And if they’re playing a longer time horizon game, the plan can’t be to run unsustainably forever. People probably make better plans if they’re honest about their limits.

But, given that they’re willing to trade off so much impact for fit, I’m surprised that basically no one mentions working for or starting a non-x-risk org. And when it is brought up, it’s very perfunctory: “you could work at a non-x-risk place”, “maybe you could just do a startup?”. It’s not as near-mode as the discussions above. There’s no discussion of fit or even of specific ideas.

It seems like people don’t really entertain their outside options. And I think that’s pretty bad. People are focused on staying x-risk-themed even when it doesn’t make x-risk-sense.

But, if you don’t get an x-risk-themed job, if you go and work for Google rather than Anthropic, you can’t go to Constellation, you can’t get an office at Lighthaven, you are judged by some, you’re somewhat less connected to your social scene, invited to fewer things, and maybe that snowballs into more isolation.

Listen, it makes a lot of sense to want to be around people who are woke to x-risk. Some kinds of orienting are hard to do alone. It can be a good idea to “work with your door open” à la Hamming, so you passively get exposed to new ideas and opportunities, ones that actually make sense. It can be alienating to spend time with people who aren’t woke to x-risk. The x-risk crowd is positively selected in a bunch of ways — ways that you may also be, so it’s reasonable to want to join that crowd. And, if you’ve worked around here for a while, then you have a bunch of personal and professional connections with people here, connections you probably want to keep.

If you work on something x-risk-themed, it can help you think about x-risk, even if you don’t believe in your day-to-day work. It might be easier to turn the problem over in your mind you live where you do and see who you see because of it. If you feel dissatisfied with your work, the shape of your dissatisfactions might tell you something about what you’d rather do instead.

But people underestimate the dangers of working on x-risk-themed stuff.

Here’s one. Crucial considerations and sign flips are common. Maybe if you try to create compromise policies that lots of interest groups will support, you won’t be credible enough at a critical juncture. Or, maybe being weird on the internet will mean no one in DC takes AI seriously. Maybe working at labs is the only way to have actual leverage over the important decisions. Or, maybe it provides a fig leaf that makes it easy for them to fob off complaints, while you get corrupted and advance their agendas.

Here’s another danger, which I think may be worse. If you insist on working somewhere x-risk-themed, you’re asking for someone to make you a sucker.

When I was in college, I was mugged. A few evenings later, I was hanging out at my friend S’s house, and it was getting late. My friend W was about to leave to walk home, when he remembered the mugger. Enjoying the feeling of being a bit scared, he decided he needed to be able to defend himself. “C’mon, S,” he said, “You’ve got to give me something! I can’t go back without something to defend myself.” Eventually, S scrounged up a hammer. A hammer is not a weapon. And it’s not a good idea to defend yourself from a phone theft with a hammer anyway. A stolen phone is a cheap price to avoid prison time and mental scarring.

Sometimes I want something to defend myself from x-risk. It’s a little bit like when you’re performing, and you don’t know what to do with your hands. And it’s a little bit like W’s “you’ve got to give me something!”. How could I be unarmed when I face the end? And so I’m looking around, looking for something to pick up in both hands, to have something I’m doing about it all.

I think a lot of people feel something like this. This makes a good market to sell blunderbusses and levers and pitchforks, as long as they’re labelled “x-risk”. In my mind, I sometimes see this theming as a sky blue lick of paint, and someone is offering me a bunch of levers and pipes that lead off invisibly into the heavens. People are working the handles next to mine, and they care about x-risk too. That seems promising. “Operate these,” they say, “it’s part of a plan to make things better.”

For a long time, I felt envious of people with big visions of how they were going to tackle x-risk. I didn’t think their visions were good. But at least they had them! “One day,” I thought, “I will join their ranks. I will enter The Reference Class.” In my mind, I‘d have a good vision, unlike all the others. It didn’t occur to me that the visions being bad was a defining characteristic of The Reference Class.

It can feel better to do something x-risk-themed than to live a life on the farm, knowing about x-risk, caring about it, but knowing you don’t know what to do.

kave6 May 2026 15:16 UTC

115 points

9 comments3 min readLW link

AI World Optimization

Eli Tyre 7 May 2026 18:54 UTC
32 points
14
I have the feeling that this post is getting close to a critical point, but then doesn’t quite express it.

(At the moment, I can’t quite express it myself, so I can’t complain much.)

I think I want an essay that is really making the case for this:
Here’s another danger, which I think may be worse. If you insist on working somewhere x-risk-themed, you’re asking for someone to make you a sucker.
Musing a bit on it...
There’s something like, “the desire to help” / “the desire to be important” is an attack surface.
A lot of people want to help, but figuring out how to help is actually really hard to do. Many things to try are worthless, many more are actively harmful. Strategic thinking is hard and often annoying, and is just the kind of thing that even pretty smart people aren’t cut out for.

A lot of people (including myself) try to slot themselves into roles where they can be a force-multiplier for some strategic vision, that makes sense to them, but which they couldn’t really defend from incisive critique.

I think a lot of EA-ish people treat this as a kind of neutral operation, where they’re straightforwardly glad to have an opportunity to have impact, rather than a kind of fraught transaction where one party is offering their agency and another is offering their strategic orientation / impact opportunity.
When this works well, both parties get to work together to do more good in the world than either party could alone. Which is great!

But this transaction is fundamentally one that presupposes an information asymmetry. In order for this transaction to make any sense, the party offering their agency has to have less strategic discernment than the party offering strategic orientation.

So this setup is ripe ground for scams.

I suppose the general pattern is “if someone really wants something, X, but they have a weak ability to discern if they are achieving it or not, there is an incentive gradient towards scams that make them feel like they’re getting X.”
- TsviBT 7 May 2026 19:59 UTC
  6 points
  6
  Parent
  As an additional point, there are also potentially quite significant negative externalities to slotting in—because you validate the strategic vision, which makes it able to pick up other slotters. (Cf. https://tsvibt.blogspot.com/2024/09/the-moral-obligation-not-to-be-eaten.html )
- kave 8 May 2026 3:36 UTC
  2 points
  0
  Parent
  I wrote a little bit about suckerhood dynamics here. I want to expand it and then crosspost to LessWrong.
  
  Your framing is very reminiscent of this post by John Wentworth.
Sean Herrington 6 May 2026 20:49 UTC
21 points
20
Just trying to get my head around this—this argument mainly works if you think that most people’s attempts to defend against x-risk are mostly bullshit right?
Like, if I actually believe that people in my reference class have good odds of changing stuff positively, it feels very wrong to just shrug and go somewhere else?
- Anthony DiGiovanni 7 May 2026 10:10 UTC
  6 points
  0
  Parent
  
  good odds of changing stuff
  
  Part of the problem OP points out is that “changing stuff” =/= “changing stuff positively”. (“Here’s one. Crucial considerations and sign flips are common.”)
  - Mateusz Bagiński 7 May 2026 13:54 UTC
    3 points
    0
    Parent
    cf https://www.lesswrong.com/posts/BnFuHDueG9vRAYqLd/changing-the-world-for-the-worse
  - Sean Herrington 7 May 2026 10:35 UTC
    1 point
    0
    Parent
    Indeed, have edited my comment to reflect that.
kbear 7 May 2026 2:36 UTC
8 points
5
A hammer is not a weapon. And it’s not a good idea to defend yourself from a phone theft with a hammer anyway.
nonetheless, the mugger—also rational—will likely not pick a fight with the hammer bro.
though if the bluff became common...
Max H 6 May 2026 17:29 UTC
7 points
1
Another danger is that you just won’t have much impact at all, positive or negative. Most x-risk-themed work will end up being a low-impact waste of time ex post, for structural reasons, among others. But if you select what to work on by trying to maximize your own personal impact and / or fit, based on the consensus vibes of your local community, you’re overwhelmingly likely to end up working on something that is predictably nil-impact ex ante.

Somewhat less fraught is to filter strictly for what you personally and independently think is both critically important and independently tractable, for detailed inside view reasons of your own, regardless of what the vibes are. Then intersect that with things that are feasible for you personally to work on and don’t constitute throwing your mind away^[1]. And then accepting that set will most likely be empty for most people.
1. ^
  This is importantly different than “fit”—working on something outside your comfort zone or past experience profile can often be the opposite of “throwing your mind away”.