romeostevensit comments on “Destroy humanity” as an immediate subgoal

romeostevensit 22 Dec 2023 20:43 UTC
3 points
0
It seems that one of the goals of religion is to put humans in a state of epistemic uncertainty about the payoff structure of their current game. Relatedly, your setup seems to imply that the AI is in a state of very high epistemic certainty.
- Seth Ahrenbach 22 Dec 2023 21:56 UTC
  1 point
  0
  Parent
  I’m not sure about how high the state of epistemic uncertainty needs to be, but you are correct that there is epistemic uncertainty for all parties. Given a probabilistic action filter, it is uncertain whether any particular action will entail the destruction of humanity, and this is common knowledge. I am not the first or only one to propose epistemic uncertain on the part of the AI with respect to goals and actions. See Stuart Russell: https://arxiv.org/abs/2106.10394