I think “you can’t get the coffee if you’re dead” goes thru just as well for “you can’t get the coffee if you’re preserved in a vault”; do you not?
https://www.alignmentforum.org/posts/wnzkjSmrgWZaBa2aC/self-preservation-or-instruction-ambiguity-examining-the
Yeah. Here’s a case where, rather than an intrinsic self-preservation drive, task-accomplishment drives self-preservation behavior.
I’d imagine an AI given a task would similarly resist being vaulted because it would interfere with completing the task.
I think “you can’t get the coffee if you’re dead” goes thru just as well for “you can’t get the coffee if you’re preserved in a vault”; do you not?
https://www.alignmentforum.org/posts/wnzkjSmrgWZaBa2aC/self-preservation-or-instruction-ambiguity-examining-the
Yeah. Here’s a case where, rather than an intrinsic self-preservation drive, task-accomplishment drives self-preservation behavior.
I’d imagine an AI given a task would similarly resist being vaulted because it would interfere with completing the task.