Steven Byrnes comments on Can we safely automate alignment research?

Steven Byrnes 1 May 2025 18:09 UTC
LW: 7 AF: 4
3
AF
we don’t tend to imagine humans directly building superintelligence
Speak for yourself! Humans directly built AlphaZero, which is a superintelligence for board game worlds. So I don’t think it’s out of the question that humans could directly build a superintelligence for the real world. I think that’s my main guess, actually?
(Obviously, the humans would be “directly” building a learning algorithm etc., and then the trained weights come out of that.)
(OK sure, the humans will use AI coding assistants. But I think AI coding assistants, at least of the sort that exist today, aren’t fundamentally changing the picture, but rather belong in the same category as IDEs and PyTorch and other such mundane productivity-enhancers.)
(You said “don’t tend to”, which is valid. My model here [AI paradigm shift → superintelligence very quickly and with little compute] does seem pretty unusual with respect to today’s alignment community zeitgeist.)
- Joe Carlsmith 1 May 2025 21:18 UTC
  LW: 4 AF: 3
  0
  AF Parent
  Fair point, and plausible that I’m too much taking for granted a certain subset of development pathways. That is: I’m focused in the essay on threat models that proceed via the automation of capabilities R&D, but it’s possible that this isn’t necessary.