michaelcohen comments on [missing post]

michaelcohen 9 May 2019 5:11 UTC
LW: 1 AF: 1
AF
defining the evaluator is a fuzzy problem
I’m not sure what you mean by this. We don’t need a mathematical formulation of the evaluator; we can grab one from the real world.
if you don’t have the right formalism, you’re going to get Goodharting on incorrect conceptual contours
I would agree with this for a “wrong” formalism of the evaluator, but we don’t need a formalism of the evaluator. A “wrong” formalism of “deception” can’t affect agent behavior because “deception” is not a concept used in constructing the agent; it’s only a concept used in arguments about how the agent behaves. So “Goodharting” seems like the wrong description of the dangers of using a wrong formalism in an argument; the dangers of using the wrong formalism in an argument are straightforward: the argument is garbage.
- TurnTrout 9 May 2019 16:01 UTC
  LW: 2 AF: 1
  AF Parent
  What do you mean, we can grab an evaluator? What I’m thinking of is similar to “IRL requires locating a human in the environment and formalizing their actions, which seems fuzzy”.
  
  And if we can’t agree informally on deception’s definition, I’m saying “how can we say a proposal has the property”.
  - michaelcohen 10 May 2019 0:21 UTC
    LW: 1 AF: 1
    AF Parent
    An evaluator sits in front of a computer, sees the interaction history (actions, observations, and past rewards), and enters rewards.