Czynski comments on IABIED: Paradigm Confusion and Overconfidence

Czynski 11 Oct 2025 16:49 UTC
5 points
4

In particular, it looks like we’re close enough to being able to implement corrigibility that the largest obstacle involves being able to observe how corrigible an AI is.

That’s a wild claim to make without reference to specific papers or milestones. I’m not fully up on ‘superalignment’ progress but last I looked no one on the modern paradigm side was seriously attempting to study corrigibility, let alone making this kind of progress. And results like Golden Gate Claude and the ‘buggy code → evil’ transformation indicated it was probably just as hard and unnatural vs in the MIRI paradigm.
- PeterMcCluskey 11 Oct 2025 21:47 UTC
  2 points
  −3
  Parent
  The progress that I’m referring to is Max Harms’ work, which I tried to summarize here.
  - Czynski 12 Oct 2025 7:25 UTC
    1 point
    0
    Parent
    CAST is a great idea and seems like the most promising way forward with architectures similar to the ones we have, but I do not see any reason to believe we could, if we had a corrigibility meter, build an AI that implemented corrigibility with reasonable robustness within a year. Five years would probably be enough but at that point you’re looking for at least one, and maybe 2-3, major insights.