This is the “corrigibility tag” referenced in this post, right?
Paul Christiano made this comment on the original:
I found this useful as an occasion to think a bit about corrigibility. But my guess about the overall outcome is that it will come down to a question of taste. (And this is similar to how I see your claim about the list of lethalities.) The exercise you are asking for doesn’t actually seem that useful to me. And amongst people who decide to play ball, I expect there to be very different taste about what constitutes an interesting idea or useful contribution.
and this seems to have verified, at least the “matter of taste” part. In my quick estimation, Eliezer’s list doesn’t seem nearly as clearly fundamental as the AGI Ruin list, and most of the difference between this and Jan Kulveit’s and John Wentworth’s attempts seems to be taste. It doesn’t look like one list is clearly superior except that it forgot to mention a couple of items, or anything like that.
Yep, Eliezer overly pessimistic here. Still fun to see the thing.