Random Developer comments on Any corrigibility naysayers outside of MIRI?

Random Developer 24 Oct 2025 3:43 UTC
3 points
2

This turns out to be negligible in practice, and this is a good example of how thinking quantitatively will lead you to better results than thinking qualitatively.

Just to clarify, when I mentioned evolution, I was absolutely not thinking of Drexlerian nanotech at all.

I am making a much more general argument about gradual loss of control. I could easily imagine that AI “reproduction” might literally be humans running cp updated_agent_weights.gguf coding_agent_v4.gguf.

There will be enormous, overwhelming corporate pressure for agents that learn as effectively as humans, on the order of trillions of dollars of corporate demand. The agents which learn most effectively will be used as the basis for future agents, which will learn in turn. (Technically, I suppose it’s more likely to be Lamarckian evolution than Darwinian. But it gets you to the same place.) Replication, change, differential success are all you need to create optimizing pressure.

(EDIT: And yes, I’m arguing that we’re going to be dumb enough to do this, just like we immediately hooked up current agents to a command line and invented “vibe coding” where humans brag about not reading the code.)

Also, corrigibility doesn’t depend on us having interpretable/understandable AIs (though it does help).

“We don’t understand the superintelligence, but we’re pretty sure it’s corrigible” does not seem like a good plan to me.