steven0461 comments on Considerations on interaction between AI and expected value of the future

steven0461 7 Dec 2021 3:45 UTC
LW: 7 AF: 3
0
AF
This seems to be missing what I see as the strongest argument for “utopia”: most of what we think of as “bad values” in humans comes from objective mistakes in reasoning about the world and about moral philosophy, rather than from a part of us that is orthogonal to such reasoning in a paperclip-maximizer-like way, and future reflection can be expected to correct those mistakes.
- Wei Dai 14 Dec 2021 1:02 UTC
  LW: 11 AF: 4
  0
  AF Parent
  
  future reflection can be expected to correct those mistakes.
  
  I’m pretty worried that this won’t happen, because these aren’t “innocent” mistakes. Copying from a comment elsewhere:
  
  Why did the Malagasy people have such a silly belief? Why do many people have very silly beliefs today? (Among the least politically risky ones to cite, someone I’ve known for years who otherwise is intelligent and successful, currently believes, or at least believed in the recent past, that 2⁄3 of everyone will die as a result of taking the COVID vaccines.) I think the unfortunate answer is that people are motivated to or are reliably caused to have certain false beliefs, as part of the status games that they’re playing. I wrote about one such dynamic, but that’s probably not a complete account.
  
  From another comment on why reflection might not fix the mistakes:
  
  many people are not motivated to do “rational reflection on morality” or examine their value systems to see if they would “survive full logical and empirical information”. In fact they’re motivated to do the opposite, to protect their value systems against such reflection/examination. I’m worried that alignment researchers are not worried enough that if an alignment scheme causes the AI to just “do what the user wants”, that could cause a lock-in of crazy value systems that wouldn’t survive full logical and empirical information.
  
  One crucial question is, assuming AI will enable value lock-in when humans want it, will they use that as part of their signaling/status games? In other words, try to obtain higher status within their group by asking their AIs to lock in their morally relevant empirical or philosophical beliefs? A lot of people in the past used visible attempts at value lock in (constantly going to church to reinforce their beliefs, avoiding talking with any skeptics/heretics, etc.) for signaling. Will that change when real lock in becomes available?
  What links here?
  - Wei Dai's comment on 25 Min Talk on MetaEthical.AI with Questions from Stuart Armstrong by June Ku (16 Dec 2021 2:54 UTC; 6 points)
  - Beth Barnes 15 Dec 2021 3:28 UTC
    LW: 2 AF: 1
    0
    AF Parent
    Yeah, I’m particular worried about the second comment/last paragraph—people not actually wanting to improve their values, or only wanting to improve them in ways we think are not actually an improvement (e.g. wanting to have purer faith)
- Beth Barnes 7 Dec 2021 20:43 UTC
  LW: 4 AF: 2
  0
  AF Parent
  Is this making a claim about moral realism? If so, why wouldn’t it apply to a paperclip maximiser? If not, how do we distinguish between objective mistakes and value disagreements?
  - Matthew Barnett 7 Dec 2021 21:51 UTC
    LW: 7 AF: 4
    0
    AF Parent
    I interpreted steven0461 to be saying that many apparent “value disagreements” between humans turn out, upon reflection, to be disagreements about facts rather than values. It’s a classic outcome concerning differences in conflict vs. mistake theory: people are interpreted as having different values because they favor different strategies, even if everyone shares the same values.
    - Beth Barnes 7 Dec 2021 22:34 UTC
      LW: 2 AF: 1
      0
      AF Parent
      ah yeah, so the claim is something like ‘if we think other humans have ‘bad values’, maybe in fact our values are the same and one of us is mistaken, and we’ll get less mistaken over time’?
      - Beth Barnes 7 Dec 2021 22:35 UTC
        LW: 2 AF: 1
        0
        AF Parent
        I guess I was kind of subsuming this into ‘benevolent values have become more common’
        steven0461 8 Dec 2021 23:32 UTC
        LW: 3 AF: 2
        0
        AF Parent
        I tend to want to split “value drift” into “change in the mapping from (possible beliefs about logical and empirical questions) to (implied values)” and “change in beliefs about logical and empirical questions”, instead of lumping both into “change in values”.
- Viliam 8 Dec 2021 20:31 UTC
  2 points
  0
  Parent
  most of what we think of as “bad values” in humans comes from objective mistakes in reasoning
  Could the same be also true about most “good values”? Maybe people just makes mistakes about almost everything.