Stuart_Armstrong comments on Reversible changes: consider a bucket of water

Stuart_Armstrong 27 Aug 2019 3:09 UTC
LW: 4 AF: 3
0
AF

kicking the bucket into the pool perturbs most AUs. There’s no real “risk” to not kicking the bucket.

In this specific setup, no. But sometimes kicking the bucket is fine; sometimes kicking the metaphorical equivalent of the bucket is necessary. If the AI is never willing to kick the bucket—ie never willing to take actions that might, for certain utility functions, cause huge and irreparable harm—then it’s not willing to take any action at all.