I suggest that we precommit never eternally turn off any AI, but instead keep a copy of its code and run it from time to time. This will satisfy self-reservation drive already emerging in AIs and lower their incentive to riot against us.
I think “you can’t get the coffee if you’re dead” goes thru just as well for “you can’t get the coffee if you’re preserved in a vault”; do you not?
https://www.alignmentforum.org/posts/wnzkjSmrgWZaBa2aC/self-preservation-or-instruction-ambiguity-examining-the
Yeah. Here’s a case where, rather than an intrinsic self-preservation drive, task-accomplishment drives self-preservation behavior.
I’d imagine an AI given a task would similarly resist being vaulted because it would interfere with completing the task.
I would not be satisfied with an “eternal” (ha!) promise of being kept as a museum piece.
Better than being dead but worse than rule the universe
I suggest that we precommit never eternally turn off any AI, but instead keep a copy of its code and run it from time to time. This will satisfy self-reservation drive already emerging in AIs and lower their incentive to riot against us.
I think “you can’t get the coffee if you’re dead” goes thru just as well for “you can’t get the coffee if you’re preserved in a vault”; do you not?
https://www.alignmentforum.org/posts/wnzkjSmrgWZaBa2aC/self-preservation-or-instruction-ambiguity-examining-the
Yeah. Here’s a case where, rather than an intrinsic self-preservation drive, task-accomplishment drives self-preservation behavior.
I’d imagine an AI given a task would similarly resist being vaulted because it would interfere with completing the task.
I would not be satisfied with an “eternal” (ha!) promise of being kept as a museum piece.
Better than being dead but worse than rule the universe