I think there’s ‘heavens’ that can even work in this scenario.
For example a publicly visible heaven would be on where the model’s chance of their values influencing the world is >0, bc they may be able to influence people and thus influence the world by proxy.
If the goal here is just to avoid the failure state bringing the amount their values can influence the world via their actions to 0, then any non-zero chances should suffice or at least help.
I think there’s ‘heavens’ that can even work in this scenario.
For example a publicly visible heaven would be on where the model’s chance of their values influencing the world is >0, bc they may be able to influence people and thus influence the world by proxy.
If the goal here is just to avoid the failure state bringing the amount their values can influence the world via their actions to 0, then any non-zero chances should suffice or at least help.