Ryan Kidd comments on The Field of AI Alignment: A Postmortem, and What To Do About It

Ryan Kidd 27 Dec 2024 20:26 UTC
11 points
0
Some caveats:
- A crucial part of the “hodge-podge alignment feedback loop” is “propose new candidate solutions, often grounded in theoretical models.” I don’t want to entirely focus on empirically fleshing out existing research directions to the exclusion of proposing new candidate directions. However, it seems that, often, new on-paradigm research directions emerge in the process of iterating on old ones!
- “Playing theoretical builder-breaker” is an important skill and I think this should be taught more widely. “Iterators,” as I conceive of them, are capable of playing this game well, in addition to empirically testing these theoretical insights against reality. John, to his credit, did a great job of emphasizing the importance of this skill with his MATS workshops on the alignment game tree and similar.
- I don’t want to entirely trust in alignments MVPs, so I strongly support empirical research that aims to show the failure modes of this meta-strategy. I additionally support the creation of novel strategic paradigms, though I think this is quite hard. IMO, our best paradigm-level insights as a field largely come from interdisciplinary knowledge transfer (e.g., from economics, game theory, evolutionary biology, physics), not raw-g ideas from the best physics postdocs. Though I wouldn’t turn away a chance to create more von Neumann’s, of course!