Gordon Seidoh Worley comments on Malign generalization without internal search

Gordon Seidoh Worley 15 Jan 2020 23:29 UTC
LW: 2 AF: 1
0
AF
However, it’s worth noting that saying the agent is mistaken about the state of the world is really an anthropomorphization. It was actually perfectly correct in inferring where the red part of the world was—we just didn’t want it to go to that part of the world. We model the agent as being ‘mistaken’ about where the landing pad is, but it works equally well to model the agent as having goals that are counter to ours.
That we can flip our perspective like this suggests to me that thinking of the agent as having different goals is likely still anthropomorphic or at least teleological reasoning that results from us modeling this agent has having dispositions it doesn’t actually have.
I’m not sure what to offer as an alternative since we’re not talking about a category where I feel grounded enough to see clearly what might be really going on, much less offer a more useful abstraction that avoids this problem, but I think it’s worth considering that there’s a deeper confusion here that this exposes but doesn’t resolve.