Vanessa Kosoy comments on The hard core of alignment (is robustifying RL)

Vanessa Kosoy 16 May 2026 16:41 UTC
7 points
1
Another such approach is computational superimitation (COSI), which seems to make a totally different set of assumptions (which very few people understand well enough to question). I hope that Vanessa Kosoy and Diffractor do not unilaterally decide that they have properly specified alignment, and then actually try to build an ASI based on COSI.
(I haven’t read the entire post yet, just wanted to respond to this point. The following is on behalf of myself and CORAL, but Diffractor might have his own take.)
I hope we will build ASI based on COSI (or some evolution of COSI), but it will be when
1. The theory is much, much more developed.
2. The assumptions are extensively validated in theory, by some combination of
3. 1. Reducing the assumptions as much as possible to a simple and intuitive core.
  2. Studying the theoretical implications of the assumptions in detail, to see that they lead to a comprehensive, coherent and convincing mathematico-philosophical view.
  3. Tying the assumptions to knowledge in other fields, such as physics, cognitive science and evolutionary biology.
4. The assumptions are extensively validated in practice, by building scaled-down models and studying them with interpretability tools that also come out of the theory.
5. Waiting for an even stronger validation is infeasible because unaligned ASI is about to emerge from other projects, and the other projects refuse to coordinate on a pause.
As to “unilaterally”, we are very interested in thoughtful critique from other researchers. We are also going to vocally support a global AI moratorium that would apply to us. But, if there is no moratorium, we don’t commit to waiting for a global academic consensus that will never come (see point 4 above).