Rohin Shah comments on What is ambitious value learning?

Rohin Shah 8 Nov 2018 18:53 UTC
2 points
0
I’m not claiming (in the parent comment) that values aren’t learnable.
I am claiming that they are not constrained by rationality (or rather, that this is a reasonable position to have, corresponding roughly to moral anti-realism).
I was talking about terminal values, not instrumental values. I certainly agree that if we take terminal values as given, instrumental values are an empirical fact about reality.
Though I think I see my misunderstanding now. I thought you were claiming that humans arrived at their values by a process of Bayesian updating on what their values should be. But actually what you’re claiming is that to the extent that human beliefs (not values!) are based on correct Bayesian reasoning with shared origins, distributional shifts shouldn’t exist. Humans may still disagree on values.
I was confused because your original comment used the assumption that human values were based on correct Bayesian reasoning, am I correct that you meant that to apply to human beliefs?
- Davidmanheim 8 Nov 2018 20:18 UTC
  1 point
  0
  Parent
  Sorry, I needed to clarify my thinking and my claim a lot further. This is in addition to the (what I assumed was obvious) claim that correct Bayesian thinkers should be able to converge on beliefs despite potentially having different values. I’m speculating that if terminal values are initially drawn from a known distribution, AND “if you think that a different set of life experiences means that you are a different person with different values,” but that values change based on experiences in ways that are understandable, then rational humans will act in a coherent way so that we should expect to be able to learn human values and their distribution, despite the existence of shifts.
  Conditional on those speculative thoughts, I disagree with your conclusion that “that’s a really good reason to assume that the whole framework of getting the true human utility function is doomed.” Instead, I think we should be able to infer the distribution of values that humans actually have—even if they individually change over time from experiences.
  - Rohin Shah 9 Nov 2018 0:46 UTC
    1 point
    0
    Parent
    But what do you optimize then?
    - Davidmanheim 9 Nov 2018 11:23 UTC
      2 points
      0
      Parent
      That’s an important question, bu it’s also fundamentally hard, since it’s almost certainly true that human values are inconsistent—if not individually, than at an aggregate level. (You can’t reconcile opposite preferences, or maximize each person’s share of a finite resource.)
      The best answer I have seen is Eric Drexler’s discussion of Pareto-topia, where he suggests that we can make huge progress and gain of utility according to all value-systems held by humans, despite the fact that they are inconsistent.
      - Rohin Shah 10 Nov 2018 18:00 UTC
        6 points
        0
        Parent
        That seems right. Though if you accept that human values are inconsistent and you won’t be able to optimize them directly, I still think “that’s a really good reason to assume that the whole framework of getting the true human utility function is doomed.”
        By “true human utility function” I really do mean a single function that when perfectly maximized leads to the optimal outcome.
        I think “human values are inconsistent” and “people with different experiences will have different values” and “there are distributional shifts which cause humans to be different than they would otherwise have been” are all different ways of pointing at the same problem.
        What links here?
        sunwillrise's comment on Instruction-following AGI is easier and more likely than value aligned AGI by Seth Herd (12 Jul 2024 15:34 UTC; 6 points)