elifland comments on Vitalik’s Response to AI 2027

elifland 12 Jul 2025 18:16 UTC
5 points
3
It seems more important whether humans can figure out how to evaluate alignment in 2028 rather than whether they can make human level aligned AGIs (though of course that’s instrumentally useful and correlated). In particular, the AIs need to prevent humans from discovering the method by which the AIs evaluate alignment. This seems probably doable for ASIs but may be a significant constraint esp. for only somewhat superhuman AIs if they’ve e.g. solved mech interp and applied it themselves but need to hide this for a long time.