zulupineapple comments on Humans can be assigned any values whatsoever…

zulupineapple 27 Aug 2019 19:57 UTC
1 point
0
1 is trivial, so yes. But I don’t agree with 2. Maybe the disagreement comes from “few” and “obvious”? To be clear, I count evaluating some simple statistic on a large data set as one constraint. I’m not so sure about “obvious”. It’s not yet clear to me that my simple constraints aren’t good enough. But if you say that more complex constraints would give us a lot more confidence, that’s reasonable.
From OP I understood that you want to throw out IRL entirely. e.g.
If we give up the assumption of human rationality—which we must—it seems we can’t say anything about the human reward function. So it seems IRL must fail.
seems like an unambiguous rejection of IRL and very different from
Our hope is that with some minimal assumptions about planner and reward we can infer the rest with enough data.
- Stuart_Armstrong 27 Aug 2019 21:31 UTC
  4 points
  0
  Parent
  Ok, we strongly disagree on your simple constraints being enough. I’d need to see these constraints explicitly formulated before I had any confidence in them. I suspect (though I’m not certain) that the more explicit you make them, the more tricky you’ll see that it is.
  
  And no, I don’t want to throw IRL out (this is an old post), I want to make it work. I got this big impossibility result, and now I want to get around it. This is my current plan: https://www.lesswrong.com/posts/CSEdLLEkap2pubjof/research-agenda-v0-9-synthesising-a-human-s-preferences-into
  - zulupineapple 28 Aug 2019 8:06 UTC
    1 point
    0
    Parent
    I got this big impossibility result
    That’s a part of the disagreement. In the past you clearly thought that Occam’s razor was an “obvious” constraint that might work. Possibly you thought it was a unique such constraint. Then you found this result, and made a large update in the other direction. That’s why you say the result is big—rejecting a constraint that you already didn’t expect to work wouldn’t feel very significant.
    On the other hand, I don’t think that Occam’s razor is unique such constraint. So when I see you reject it, I naturally ask “what about all the other obvious constraints that might work?”. To me this result reads like “0 didn’t solve our equation therefore the solution must be very hard”. I’m sure that you have strong arguments against many other approaches, but I haven’t seen them, and I don’t think the one in OP generalizes well.
    I’d need to see these constraints explicitly formulated before I had any confidence in them.
    This is a bit awkward. I’m sure that I’m not proposing anything that you haven’t already considered. And even if you show that this approach is wrong, I’d just try to put a band-aid on it. But here is an attempt:
    First we’d need a data set of human behavior with both positive and negative examples (e.g. “I made a sandwitch”, “I didn’t stab myself”, etc). So it would be a set of tuples of state s, action a and +1 for positive examples, −1 for negative ones. This is not trivial to generate, especially it’s not clear how to pick negative examples, but here too I expect that the obvious solutions are all fine. By the way, I have no idea how the examples are formalized, that seems like a problem, but it’s not unique to this approach, so I’ll assume that it’s solved.
    Next, given a pair (p, R), we would score it by adding up the following:
    1. p(R) should accurately predict human behavior. So we want a count of p(R)(s)=a for positive cases and p(R)(s)!=a for negative cases.
    2. R should also predict human behavior. So we want to sum R(s, a) for positive examples, minus the same sum for negative examples.
    3. Regularization for p.
    4. Regularization for R.
    Here we are concerned about overfitting R, and don’t care about p as much, so terms 1 and 4 would get large weights, and terms 2, 3 would get smaller weights.
    Finally we throw machine learning at the problem to maximize this score.