Stuart_Armstrong comments on Learning Impact in RL

Stuart_Armstrong 6 Feb 2017 9:51 UTC
0 points
0
AF
This feels like it’s the same thing as the low-impact paper.

https://www.dropbox.com/s/cjt5t6ny5gwpcd8/Low_impact_S%2BB.pdf?raw=1

There the AI must maximise $U = u - μ R$ where $μ > 0$ is a weight, $u$ is a positive goal, and $R$ is penalty function for impact.

Am I mistaken in thinking that $T$ is the same as $u$ and $I$ is the same as $- μ R$ ?
- IAFF-User-111 9 Feb 2017 2:02 UTC
  0 points
  0
  AF Parent
  It’s not the same (but similar), because my proposal is just about learning a model of impact, and has nothing to do with the agent’s utility function.
  
  You could use the learned impact function, $I$ , to help measure (and penalize) impact, however.