[Question] What are the causality effects of an agents presence in a reinforcement learning environment

Jonas Kgomo1 Mar 2022 21:57 UTC

0 points

Suppose we have an agent A trying to optimise for a reward R in an environment S.
How can we tell that the presence of the agent does not affect the environment and the measurement(observation) is not only subject to the agent but the environment?

This is related with the measurement problem in quantum computing, we have an agent (a particle ) entangled in a quantum superposition, consider an electron with two possible configurations: up and down, $a | ↑ ⟩ + b | ↓ ⟩$ , when we measure the state, it collapses the wavefunction to a particular classical state, up or down.

Moreover, the observer effect, notes that measurements of certain systems cannot be made without affecting the system.
While the uncertainty principle argues that we cannot predict the value of a quantity with arbitrary certainty.

Another way to state the problem is, does measuring the state of the action affect the state of the environment?
How are the observations in physics different from the observations we make in RL?
Is the environment state at $S_{1}$ causal to the state in $S_{2}$ ?

Jonas Kgomo1 Mar 2022 21:57 UTC

0 points

2 comments1 min readLW link

TLW 2 Mar 2022 4:37 UTC
0 points
0
Simply asking “does it affect the environment” is not enough here.
Leaking, say, 1pJ of thermal energy to the surroundings in a slightly different manner in the two cases technically affects the surroundings quite significantly—thermal motion in a gas is chaotic after all—but in practice we would tend to call this minimal^[1] effects on the environment.
1. ^
  Assuming a human-livable environment at least. Obviously this might be different if e.g. your environment is all at 2 picoKelvin or somesuch.
What links here?
- Jonas Kgomo's comment on gwern’s Shortform by gwern (12 Jul 2022 21:06 UTC; 1 point)

Charlie Steiner 4 Mar 2022 19:32 UTC
3 points
0
You might also be interested in the question of whether the costs of the agent’s own thinking can be included in an RL environment. Suppose I have a finite electricity budget and thinking harder will use electricity. It seems like I as a human am flexible enough to adjust my thinking style to some degree in response to constraints like this, but whap happens yo typical RL agents if they’re given negative reward for running out of electricity?