Jeremy Gillen comments on A Shutdown Problem Proposal

Jeremy Gillen 22 Jan 2024 2:09 UTC
5 points
0
I think you’re right that the central problems remaining are in the ontological cluster, as well as the theory-practice gap of making an agent that doesn’t override its hard-coded false beliefs.
But less centrally, I think one issue with the proposal is that the sub-agents need to continue operating in worlds where they believe in a logical contradiction. How does this work? (I think this is something I’m confused about for all agents and this proposal just brings it to the surface more than usual).
Also, agent1 and agent2 combine into some kind of machine. This machine isn’t VNM rational. I want to be able to describe this machine properly. Pattern matching, my guess is that it violates independence in the same way as here. [Edit: Definitely violates independence, because the combined machine should choose a lottery over <button-pressed> over certainty of either outcome. I suspect that it doesn’t have to violate any other axioms].
- johnswentworth 22 Jan 2024 6:10 UTC
  3 points
  0
  Parent
  I think one issue with the proposal is that the sub-agents need to continue operating in worlds where they believe in a logical contradiction… I think this is something I’m confused about for all agents and this proposal just brings it to the surface more than usual
  +1 to this. For the benefit of readers: the “weirdness” here is common to CDT agents in general. In some sense they’re acting-as-though they believe in a do()-operated model, rather than their actual belief. Part of the answer is that the do()-op is actually part of the planning machinery, and part of the answer is Abram’s CDT=EDT thing, but I haven’t grokked the whole answer deeply enough yet to see how it carries over to this new use-case.
  Definitely violates independence, because the combined machine should choose a lottery over <button-pressed> over certainty of either outcome.
  Assuming I’m interpreting you correctly, this is non-obvious, because the lottery-choice will be one of many things the two agents negotiate over. So it could be that the negotiation shakes out to the certainty option, with some other considerations counterbalancing elsewhere in the negotiation.
  More generally, insofar as the argument in Why Not Subagents? generalizes, the subagents should aggregate into an expected utility maximizer of some sort. But the probabilities of the resulting agent don’t necessarily match the epistemic probabilities of the original model—e.g. the agent’s probability on button state mostly reflects the relative bargaining power of the subagents rather than an epistemic state.