Max Harms comments on Worlds Where Iterative Design Succeeds?

Max Harms 24 Oct 2025 18:16 UTC
4 points
0
(Also, we can, in fact, observe some of the AIs internals and run crude checks for things like deception. Prosaic interpretability isn’t great, but it’s also not nothing.)