I decided not to include an example in the post, as it directly focuses on a controversial issue, but one example of when this principle was violated and made people unreasonably confident was when people updated back in 2007-2008 that AI risk was a big deal (or at least had uncomfortably high probabilities), based on the orthogonality thesis and instrumental convergence, which attacked and destroyed 2 bad arguments at the time:
The core issue here, and where I diverge from Daniel Kokotajlo, is that I think that bad criticisms like this will always happen independent of whether the doom-by-default case is true, for the reasons Andy Masley discussed in the post, and thus the fact that the criticisms are false is basically not an update at all (and the same goes for AI optimists debunking clearly bad AI doom arguments)
It’s also my top hypothesis for why MIRI became the way it did over time, as it responded and updated based off of there being bad critics (though I do want to note that even assuming a solution to corrigibility existed, MIRI would likely never have found it because it’s a very small group trying to tackle big problems.
I decided not to include an example in the post, as it directly focuses on a controversial issue, but one example of when this principle was violated and made people unreasonably confident was when people updated back in 2007-2008 that AI risk was a big deal (or at least had uncomfortably high probabilities), based on the orthogonality thesis and instrumental convergence, which attacked and destroyed 2 bad arguments at the time:
that smarter AI would necessarily be good (unless we deliberately programmed it not to be) because it would be smart enough to figure out what’s right, what we intended, etc. and 2. that smarter AI wouldn’t lie to us, hurt us, manipulate us, take resources from us, etc. unless it wanted to (e.g. because it hates us, or because it has been programmed to kill, etc) which it probably wouldn’t.
The core issue here, and where I diverge from Daniel Kokotajlo, is that I think that bad criticisms like this will always happen independent of whether the doom-by-default case is true, for the reasons Andy Masley discussed in the post, and thus the fact that the criticisms are false is basically not an update at all (and the same goes for AI optimists debunking clearly bad AI doom arguments)
This is related to “your arguments can be false even if nobody has refuted them” and “other people are wrong vs I am right”.
It’s also my top hypothesis for why MIRI became the way it did over time, as it responded and updated based off of there being bad critics (though I do want to note that even assuming a solution to corrigibility existed, MIRI would likely never have found it because it’s a very small group trying to tackle big problems.