Maxime Riché comments on Alignment remains a hard, unsolved problem

Maxime Riché 16 Jan 2026 17:08 UTC
1 point
0
Hot take: A big missing point within the list in “What should we be doing?” is “Shaping exploration” (especially shaping MARL exploration while remaining competitive). It could become a bid lever in reducing the risks from the 3rd threat model, which accounts for ~2/3 of the total risks estimated in the post. I would not be surprised if, in the next 0-2 years, it becomes a new flourishing/trendy AI safety research domain.

Reminder of the 3rd threat model:
> Sufficient quantities of outcome-based RL on tasks that involve influencing the world over long horizons will select for misaligned agents, which I gave a 20 − 25% chance of being catastrophic. The core thing that matters here is the extent to which we are training on environments that are long-horizon enough that they incentivize convergent instrumental subgoals like resource acquisition and power-seeking.