Two arguments against “you must not be Dutch-bookable” that feel vaguely relevant here:
1) The _extent_ to which you are Dutch-bookable might matter. IE, if you can pump $X per day from me, that only matters for large X. So viewing Dutch-bookability as binary might be misleading.
2) Even if you are _in theory_ Dutch-bookable, it only matters if you can _actually_ be Dutch-booked. EG, if I am the meanest thing in the universe that controls everything (say, a singleton AI), I could probably ensure that I won’t get into situation where my incoherent goals could hurt me.
My takeaway: It shouldn’t be necessary to build AI with a utility function. And it isn’t sufficient to ony defend against misaligned AIs with a utility function.
Two arguments against “you must not be Dutch-bookable” that feel vaguely relevant here:
1) The _extent_ to which you are Dutch-bookable might matter. IE, if you can pump $X per day from me, that only matters for large X. So viewing Dutch-bookability as binary might be misleading.
2) Even if you are _in theory_ Dutch-bookable, it only matters if you can _actually_ be Dutch-booked. EG, if I am the meanest thing in the universe that controls everything (say, a singleton AI), I could probably ensure that I won’t get into situation where my incoherent goals could hurt me.
My takeaway: It shouldn’t be necessary to build AI with a utility function. And it isn’t sufficient to ony defend against misaligned AIs with a utility function.