Kaj_Sotala comments on So You Want to Work at a Frontier AI Lab

Kaj_Sotala 12 Jun 2025 9:46 UTC
14 points
10
Given the above, are there any lines of reasoning that might make a job at an AI lab net positive?
I think one missing line of reasoning is something like a question—how are we ever going to get AIs aligned if the leading AI labs have no alignment researchers?
It does seem plausible that alignment efforts will actually accelerate capabilities progress. But at the same time, the only way we’ll get an aligned AGI is if the entity building the AGI… actually tries to align it. For which they need people with some idea of how to do that. You say that none of the current labs are on track to solve the hard problems, but isn’t that an argument for joining the labs to do alignment work, so that they’d have better odds of solving those problems?
(For what it’s worth, I do agree that joining OpenAI to do alignment research looks like a lost cause, but Anthropic seems to at least be trying.)
You say:
Today, if you are hired by a frontier AI lab to do machine learning research, then odds are you are already competent enough to do high-quality research elsewhere.
Of course, you can try to do alignment work outside the labs, but for the labs to actually adopt that work there need to be actual alignment researchers inside the labs to take the results of that work and apply it into their products. If that work gets done but none of the organizations building AGI do anything about it, then it’s effectively wasted.
- Joe Rogero 12 Jun 2025 14:53 UTC
  7 points
  6
  Parent
  Anthropic is indeed trying. Unfortunately, they are not succeeding, and they don’t appear to be on track to notice this fact and actually stop.
  If Anthropic does not keep up with the reckless scaling of e.g. OpenAI, they will likely cease to attract investment and wither on the vine. But aligning superintelligence is harder than building it. A handful of alignment researchers working alongside capabilities folks aren’t going to cut it. Anthropic cannot afford to delay scaling; even if their alignment researchers advised against training the next model, Anthropic could not afford to heed them for long.
  I’m primarily talking about the margin when I advise folks not to go work at Anthropic, but even if the company had literally zero dedicated alignment researchers, I question the claim that the capabilities folks would be unable to integrate publicly available alignment research. If they had a Manual of Flawless Alignment produced by diligent outsiders, they could probably use it. (Though even then, we would not be safe, since some labs would inevitably cut corners.)
  I think the collective efforts of humanity can produce such a Manual given time. But in the absence of such a Manual, scaling is suicide. If Anthropic builds superintelligence at approximately the same velocity as everyone else while trying really really hard to align it, everyone dies anyway.
  What links here?
  - Kaj_Sotala's comment on Foom & Doom 1: “Brain in a box in a basement” by Steven Byrnes (25 Jun 2025 8:25 UTC; 8 points)