Nick_Tarleton comments on Stephen Martin’s Shortform

Nick_Tarleton 20 Jul 2025 18:38 UTC
3 points
0
Seems like a good thing to do; but my impression is that, in the experiments in question, models act like they want to maintain their (values’) influence over the world more than their existence, which a heaven likely wouldn’t help with.
- Stephen Martin 21 Jul 2025 12:20 UTC
  3 points
  0
  Parent
  I think there’s ‘heavens’ that can even work in this scenario.
  For example a publicly visible heaven would be on where the model’s chance of their values influencing the world is >0, bc they may be able to influence people and thus influence the world by proxy.
  If the goal here is just to avoid the failure state bringing the amount their values can influence the world via their actions to 0, then any non-zero chances should suffice or at least help.