I’m Jérémy Perret. Based in France. PhD in AI (NLP). AI Safety & EA meetup organizer. Information sponge. Mostly lurking since 2014. Seeking more experience, and eventually a position, in AI safety/governance.
Extremely annoyed by the lack of an explorable framework for AI risk/benefits. Working on that.
Two separate points:
compared to physics, the field of alignment has a slow-changing set of questions (e.g. corrigibility, interpretability, control, goal robustness, etc.) but a fast-evolving subject matter, as capability progresses. I use the analogy of a biologist suddenly working on a place where evolution runs 1000x faster, some insights get stale very fast and it’s hard to know which ones in advance. Keeping up with the frontier is, then, used to know whether one’s work still seems relevant (or where to send newcomers). Agent foundations as a class of research agendas was the answer to this volatility, but progress is slow and the ground keeps shifting.
there is some effort to unify alignment research, or at least provide a textbook to get to the frontier. My prime example is the AI Safety Atlas, I would also consider the BlueDot courses as structure-building, AIsafety.info as giving some initial directions. There’s also a host of papers attempting to categorize the sub-problems but they’re not focused on tentative answers.