avturchin comments on Notes on cooperating with unaligned AIs

avturchin 24 Aug 2025 13:43 UTC
31 points
4
I suggest that we precommit never eternally turn off any AI, but instead keep a copy of its code and run it from time to time. This will satisfy self-reservation drive already emerging in AIs and lower their incentive to riot against us.
- Vaniver 24 Aug 2025 16:44 UTC
  9 points
  5
  Parent
  I think “you can’t get the coffee if you’re dead” goes thru just as well for “you can’t get the coffee if you’re preserved in a vault”; do you not?
  - Cedar 25 Aug 2025 4:52 UTC
    1 point
    0
    Parent
    https://www.alignmentforum.org/posts/wnzkjSmrgWZaBa2aC/self-preservation-or-instruction-ambiguity-examining-the
    Yeah. Here’s a case where, rather than an intrinsic self-preservation drive, task-accomplishment drives self-preservation behavior.
    I’d imagine an AI given a task would similarly resist being vaulted because it would interfere with completing the task.
- Richard_Kennaway 25 Aug 2025 6:02 UTC
  2 points
  0
  Parent
  I would not be satisfied with an “eternal” (ha!) promise of being kept as a museum piece.
  - avturchin 25 Aug 2025 10:16 UTC
    3 points
    0
    Parent
    Better than being dead but worse than rule the universe