This is pretty insightful, but I’m not sure the assumption that we would halt development if there were unsolved legible problems holds. The core issue might not be illegibility, but a risk-tolerance threshold in leadership that’s terrifyingly high.
Even if we legibly showed the powers that be that an AI had a 20% chance of catastrophic unsolved safety problems, I’d expect competitive pressure would lead them to deploy such a system anyway.
I’ve read this story plenty of times before, but this was the first time I saw it on LessWrong. That was a pleasant surprise.