DanielFilan comments on Future directions for ambitious value learning

DanielFilan 13 Nov 2018 19:40 UTC
LW: 4 AF: 3
0
AF

One of the most perplexing parts of the impossibility theorem is that we can’t distinguish between fully rational and fully anti-rational behavior, yet we humans seem to do this easily.

Why does it seem to you that humans do this easily? If I saw two people running businesses and was told that one person was optimising for profit and the other was anti-optimising for negative profit, not only would I not anticipate being able to tell which was which, I would be pretty suspicious of the claim that there was any relevant difference between the two.
- Rohin Shah 13 Nov 2018 23:39 UTC
  LW: 4 AF: 3
  0
  AF Parent
  In that scenario I would predict that the thing I was told was wrong, i.e. it is simply not true that one of them is anti-optimizing for negative profit. I have strong priors that people are optimizing for things they want.
  Perhaps it’s just a prior that people are relatively good at optimizing for things they want. But the impossibility theorem seems to indicate that there are lots of different planners you could hypothesize, and somehow humans just seize upon one. (Though we’re often wrong, eg. typical mind fallacy.)
  TL;DR: we do surprisingly well at inferring goals, given this impossibility result, and I’m not sure why. Maybe it’s a prior we’re born with.
  - Jan_Kulveit 30 Jan 2019 9:49 UTC
    5 points
    0
    Parent
    One hypothesis why we do so well: we “simulate” other people on a very similar hardware, and relatively similar mind (when compared to the abstract set of planners). Which is a sort of strong implicit prior. (Some evidence for that is we have much more trouble inferring goals of other people if their brains function far away from what’s usual on some dimension)