Vaniver comments on Proper value learning through indifference

Vaniver 29 Oct 2015 16:52 UTC
0 points
0

In even a rudimentary model of this form (of the kind that we can build today), pressure or manipulation will then screen off the inference from human utterances to human preferences.

This seems surprising to me, because I think a model that is able to determine the level of ‘pressure’ and ‘manipulation’ present in a situation is not rudimentary. That is, yes, if I have a model where “my preferences” have a causal arrow to “my utterances,” and the system can recognize that it’s intervening at “my utterances” then it can’t infer readily about “my preferences.” But deciding where an intervention is intervening in the graph may be difficult, especially when the thing being modeled is a person’s mind.
- paulfchristiano 30 Oct 2015 14:56 UTC
  2 points
  0
  Parent
  Yes, we can’t build models today that reliably make these kinds of inferences. But if we consider a model which is architecturally identical, yet improved far enough to make good predictions, it seems like it would be able to make this kind of inference.
  
  As Stuart points out, the hard part is pointing to the part of the model that you want to access. But for that you don’t have to define “freely, unpressured and unmanipulated.” For example, it would be sufficient to describe any environment that is free of pressure, rather than defining pressure in a precise way.