TurnTrout comments on Non-Obstruction: A Simple Concept Motivating Corrigibility

TurnTrout 5 Feb 2021 0:09 UTC
LW: 2 AF: 1
AF
Thanks for leaving this comment. I think this kind of counterfactual is interesting as a thought experiment, but not really relevant to conceptual analysis using this framework. I suppose I should have explained more clearly that the off-state counterfactual was meant to be interpreted with a bit of reasonableness, like “what would we reasonably do if we, the designers, tried to achieve goals using our own power?”. To avoid issues of probable civilizational extinction by some other means soon after without the AI’s help, just imagine that you time-box the counterfactual goal pursuit to, say, a month.
I can easily imagine what my (subjective) attainable utility would be if I just tried to do things on my own, without the AI’s help. In this counterfactual, I’m not really tempted to switch on similar non-obstructionist AIs. It’s this kind of counterfactual that I usually consider for AU landscape-style analysis, because I think it’s a useful way to reason about how the world is changing.