Dehumanisation *errors*

In re­sponse to my post con­trast­ing value learn­ing with an­thro­po­mor­phi­sa­tion, steve2152 brought up the fact that de­hu­man­i­sa­tion can be seen as the op­po­site of an­thro­po­mor­phi­sa­tion.

I agree with this in­sight, but only when de­hu­man­i­sa­tion causes er­rors of in­ter­pre­ta­tion. I was us­ing em­pa­thy in the sense of “in­sight into the other agent”, rather than “sym­pa­thy with the other agent”.

In prac­tice, de­hu­man­i­sa­tion does tend to cause er­rors. We see out­groups as more ho­mo­ge­neous, co­her­ent, and or­ganised than they ac­tu­ally are. De­spite the suave psy­chopaths de­picted in movies, psy­chopaths tend to be less effec­tive at achiev­ing their goals (as ev­i­denced by the large num­ber of psy­chopaths in prison). Tor­tur­ers are less effec­tive at ex­tract­ing true in­for­ma­tion than clas­si­cal in­ter­roga­tors.

Now, it’s not a uni­ver­sal law by any means, but it does seem that de­hu­man­i­sa­tion can of­ten lead to er­rors, and from that per­spec­tive can be seen as a failure of value learn­ing.

The mean­ing of errors

  • “Ob­jec­tion! Hold on just a minute!” screams the con­ve­nient straw­man I have just con­structed.

  • “You’ve claimed that ‘agent’s goals’ are in­ter­pre­ta­tions by the out­side ob­server; that you can model a hu­man as perfectly ra­tio­nal, with­out be­ing wrong. You’ve claimed that this is ‘struc­tured white box knowl­edge’, which can’t be de­duced from the agent’s policy or its al­gorithm.”

  • “Given that, how can you claim that any­one ‘fails’ at in­ter­pret­ing the goals of oth­ers, or that they make ‘er­rors’?”

This a very valid point, straw­man, but I’ve also pointed out that hu­man the­ory of mind/​em­pa­thy is very similar from hu­man to hu­man, and tends to agree with how we in­ter­pret our own goals. Be­cause of this, there is a rough “uni­ver­sal hu­man the­ory of mind”, ie a uni­ver­sal way of go­ing from hu­man policy to hu­man prefer­ences.

When I’m talk­ing about er­rors, I’m talk­ing about de­vi­a­tions from this ideal[1].


  1. Be­cause hu­man the­o­ries of mind do not agree perfectly, there will always be an ir­re­ducible level of un­cer­tainty in this ideal, but there is agree­ment on the broad strokes of it. ↩︎

No comments.