adamShimi comments on Redwood Research’s current project

adamShimi 18 Oct 2021 13:03 UTC
LW: 2 AF: 1
AF
(This “better almost 50% of the time” property is one way of trying to operationalize “we don’t want the filtered policy to be worse”. It so happens that this property is actually kind of badly behaved, but in our case it seems fine, given that we’re always going to be comparing against a fixed unfiltered distribution.)
I’ve read the intransitive dice page, but I’m confused on how it might apply here? Like concretely, what are the dice in the analogy?
- Buck 20 Oct 2021 17:29 UTC
  LW: 6 AF: 3
  AF Parent
  Suppose you have three text-generation policies, and you define “policy X is better than policy Y” as “when a human is given a sample from both policy X and policy Y, they prefer the sample from the latter more than half the time”. That definition of “better” is intransitive.
  - adamShimi 21 Oct 2021 11:10 UTC
    LW: 3 AF: 2
    AF Parent
    Hum, I see. And is your point that it should not create a problem because you’re only doing comparison X vs Y and Z vs Y (where Y is the standard policy and X and Z are two of your conservative policies) but you don’t really care about the comparison between X and Z?