Daniel Kokotajlo comments on Google’s new 540 billion parameter language model

Daniel Kokotajlo 6 Apr 2022 11:34 UTC
5 points
OK sure. But treacherous turns and acting according to what humans think is good (as opposed to what is actually good) are, like, the two big classic alignment problems. Not being capable enough to figure out what we want is… not even an alignment problem in my book, but I can understand why people would call it one.
What links here?
- Strategic Considerations Regarding Autistic/Literal AI by Chris_Leong (6 Apr 2022 14:57 UTC; -1 points)
- Not Relevant 6 Apr 2022 12:15 UTC
  6 points
  Parent
  I think the distinction here is that obviously any ASI could figure out what humans want, but it’s generally been assumed that that would only happen after its initial goal (Eg paperclips) was already baked. If we can define the goal better before creating the EUM, we’re in slightly better shape.
  
  Treacherous turns are obviously still a problem, but they only happen towards a certain end, right? And a world where an AI does what humans at one point thought was good, as opposed to what was actually good, does seem slightly more promising than a world completely independent from what humans think is good.
  
  That said, the “shallowness” of any such description of goodness (e.g. only needing to fool camera sensors etc) is still the primary barrier to gaming the objective.
  - Chris_Leong 6 Apr 2022 13:57 UTC
    2 points
    Parent
    EUM? Thanks for helping explain.
    - Joe_Collman 6 Apr 2022 15:20 UTC
      1 point
      Parent
      Expected Utility Maximiser.
  - Daniel Kokotajlo 6 Apr 2022 13:10 UTC
    2 points
    Parent
    OK, fair enough.
- Chris_Leong 6 Apr 2022 13:53 UTC
  2 points
  Parent
  You don’t think there could be powerful systems that take what we say too literally and thereby cause massive issues^[1]. Isn’t it better if power comes along with human understanding? I admit some people desire the opposite, for powerful machines to be unable to model humans so that it can’t manipulate us, but such machines will either a) be merely imitating behaviour and thereby struggle to adapt to new situations or b) most likely not do what we want when we try to use them.
  1. ^
    As an example, high-functioning autism exists.
  - Daniel Kokotajlo 6 Apr 2022 13:57 UTC
    3 points
    Parent
    Sure, there could be such systems. But I’m more worried about the classic alignment problems.