Vlad Mikulik comments on What failure looks like

Vlad Mikulik 27 Mar 2019 0:22 UTC
LW: 3 AF: 2
0
AF
The goal that the agent is selected to score well on is not necessarily the goal that the agent is itself pursuing. So, unless the agent’s internal goal matches the goal for which it’s selected, the agent might still seek influence because its internal goal permits that. I think this is in part what Paul means by “Avoiding end-to-end optimization may help prevent the emergence of influence-seeking behaviors (by improving human understanding of and hence control over the kind of reasoning that emerges)”
- TurnTrout 27 Mar 2019 15:56 UTC
  LW: 2 AF: 1
  0
  AF Parent
  And if the internal goal doesn’t permit that? I’m trying to feel out which levels of meta are problematic in this situation.