StanislavKrym comments on Jay Bailey’s Shortform

StanislavKrym 13 Apr 2026 8:23 UTC
1 point
0
Could you explain which AI companies^[1] you considered and how could joining such a company increase x-risk as opposed to the options where you are not in frontier lab safety teams? My baseline extinction scenario is that OAI/Anthropic/GDM/a union thereof creates an unaligned AI which escapes before deployment, after deployment or is trusted to create a successor. The safety team’s role is to develop interpretability techniques and to find evidence for misalignment before the AI becomes capable of harming mankind. I don’t see a way for you to decrease P(finding evidence) unless your potential rival was more competent. Alas, I also struggle to understand how one can estimate P(rival was more conpetent).
1. ^
  Unless, of course, they are xAI or Meta which I’d rather see destroyed.
- Jay Bailey 13 Apr 2026 9:57 UTC
  15 points
  0
  Parent
  The specific company I was considering in this example was GDM, but I see Anthropic as broadly similar and OAI as somewhat worse. That’s why I didn’t name it—there was nothing in my logic that was specific to GDM.
  
  If I put on my pessimist hat, here are the ways where this could increase extinction risk counterfactually:
  - The work I do winds up being used for capabilities, and advances capabilities more than it advances safety.
  - The work I do is largely safetywashing or focused on near-term risks that impact company reputation and/or profit, which has a much smaller or negligible effect on extinction risk. This isn’t strictly worse than doing nothing, but my choices aren’t “Do nothing” or “Join a frontier lab”.
  - I end up getting my epistemics changed, falsely decide that alignment isn’t that hard actually, and then probably end up doing one of the other two points. (Especially re: Point 1, the harder you think alignment is, the less value you should put on frontier company work imo. I can go into more detail on why I think this, if it is non-obvious.)
  By contrast, the counterfactual thing I could do is to potentially do other useful work that might reduce extinction risk more. My alternate path, which is the one I currently estimate I’m very likely to take (just waiting on security clearance) is to join the new Australian AISI and try to shape their research direction in a way that helps with extinction risk. The expected value of this is well above zero in my book, so it’s not just enough to avoid increasing extinction risk. I also have to decrease extinction risk more at a frontier lab than anywhere else I could otherwise be working.
  
  And since that’s really hard to calculate, it’s easy to wind up doing the things that follow money and status gradients as long as they have a plausible path to impact. Committing to giving away any extra money is a good way of taking away one of those incentives. The money could do a lot of good if donated, but that’s an abstract good, it doesn’t emotionally move me in the way that a large salary clearly does. I don’t endorse this and I am not proud of this, but it would be far worse if I denied it to myself.
  - Mateusz Bagiński 13 Apr 2026 11:27 UTC
    6 points
    0
    Parent
    To state the obvious (but that I don’t think you stated explicitly so far): If your judgment is that there are money-bottlenecked initiatives that you could un-bottleneck with an amount like $[GDM AI safety minus Oz AISI] (and that likely will be un-bottlenecked if you don’t do this), and your estimated counterfactual impact is comparable in both cases, this might tip the decision strongly in favor of GDM AI safety.