Jozdien comments on Simon Lermen’s Shortform

Jozdien 17 Nov 2025 14:41 UTC
13 points
4
IMO not true. Maybe early on we needed really good conceptual work, and so wanted people who could clearly articulate pros / cons of Paul Christiano and Yudkowsky’s alignment strategies, etc. So it would have made sense to test accordingly. But I think this is less true now—most senior researchers have more good ideas than they can execute.
I don’t think this is a strong argument in favor of the situation being meaningfully different: senior researchers having more good ideas than they have time doesn’t seem like a very new thing at all (e.g. Evan wrote a list like this over three years ago).
More importantly, this doesn’t seem inconsistent with the claim being made. If you had mentors proposing projects in very similar areas or downstream of very similar beliefs, you might still benefit tremendously from people with good understanding of AI safety to work on different things. This depends on whether or not you think that the current project portfolio is close to as good as they can be though. I certainly think we would benefit heavily from more people thinking about what directions are good or not, and that a fair amount of current work suffers from not enough clear thinking about whether they’re useful or not.
That said, I am somewhat optimistic about MATS. I had very similar criticisms during MATS 5.0, when ~1/3-1/2 of all projects were in mech interp. If we’d kept funneling strong engineers to work on mech interp without the skills necessary to evaluate how useful it was, deferring to a specific set of senior researchers, I think the field would be in a meaningfully worse state today. MATS did pivot away from that afterward, which raised my opinion a fair amount (though I’m not sure what the exact mechanism here was).
Also the difficulty of doing good alignment research has increased, since we increasingly need to work with complex training setups, infrastructure etc. to keep up with advances in capabilities. This motivates requiring a high level of technical skill
I don’t think this is true? Like, it’s certainly true for some kinds of good alignment research, but imo very far from a majority.
I don’t think these should be considered strong criteria. IMO “believes in X-risk” is not a necessary pre-requisite to do great work for reducing X-risk. E.g. building good tooling for alignment research doesn’t require this at all.
I also don’t think it’s a necessary pre-requisite to do great alignment research, but MATS is more than the projects MATS scholars work on. For example, if MATS scholars consistently did good research during MATS and then went on to be hired to work on capabilities at OpenAI, I think that would be a pretty bad situation.
- Daniel Tan 17 Nov 2025 15:00 UTC
  4 points
  0
  Parent
  if MATS scholars consistently did good research during MATS and then went on to be hired to work on capabilities at OpenAI, I think that would be a pretty bad situation.
  I agree. To be clear I support ‘value alignment’ tests, but that wasn’t part of the original claims being made
  - Jozdien 17 Nov 2025 15:05 UTC
    7 points
    3
    Parent
    I don’t think this is just about value alignment. I think if people genuinely understood the arguments for why AI might go badly, they would be much less likely to work on capabilities at OpenAI—definitely far from zero, but for the subset of people who are likely to be MATS scholars, I think it would make a pretty meaningful difference.