TurnTrout comments on [AN #91]: Concepts, implementations, problems, and a benchmark for impact measurement

TurnTrout 18 Mar 2020 18:33 UTC
LW: 4 AF: 3
0
AF

I think this is probably going to do something quite different from the conceptual version of AUP, because impact (as defined in this sequence) occurs only when the agent’s beliefs change, which doesn’t happen for optimal agents in deterministic environments. The current implementation of AUP tries to get around this using proxies for power (but these can be gamed) or by defining “dumber” beliefs that power is measured relative to (but this fails to leverage the AI system’s understanding of the world).

Although the point is more easily made in the deterministic environments, impact doesn’t happen in expectation for optimal agents in stochastic environments, either. This is by conservation of expected AU (this is the point I was making in The Gears of Impact).

Similar things can be said about power gain – when we think an agent is gaining power… gaining power compared to what? The agent “always had” that power, in a sense – the only thing that happens is that we realize it.

This line of argument makes me more pessimistic about there being a clean formalization of “don’t gain power”. I do think that the formalization of power is correct, but I suspect people are doing something heuristic and possibly kludgy when we think about someone else gaining power.
What links here?
- TurnTrout's comment on Attainable Utility Preservation: Scaling to Superhuman by TurnTrout (18 Mar 2020 18:33 UTC; 3 points)
- Rohin Shah 19 Mar 2020 0:47 UTC
  LW: 2 AF: 2
  0
  AF Parent
  Yup, strongly agree. I focused on the deterministic case because the point is easiest to understand there, but they also apply in the stochastic case.
  I suspect people are doing something heuristic and possibly kludgy when we think about someone else gaining power.
  I agree, though if I were trying to have a nice formalization, one thing I might do is look at what “power” looks like in a multiagent setting, where you can’t be “larger” than the environment, and so you can’t have perfectly calibrated beliefs about what’s going to happen.