Rohin Shah comments on Alignment as Translation

Rohin Shah 27 Mar 2020 18:19 UTC
LW: 4 AF: 3
0
AF
Planned summary for the Alignment Newsletter:
At a very high level, we can model powerful AI systems as moving closer and closer to omniscience. As we move in that direction, what becomes the new constraint on technology? This post argues that the constraint is _good interfaces_, that is, something that allows us to specify what the AI should do. As with most interfaces, the primary challenge is dealing with the discrepancy between the user’s abstractions (how humans think about the world) and the AI system’s abstractions, which could be very alien to us (e.g. perhaps the AI system uses detailed low-level simulations). The author believes that this is the central problem of AI alignment: how to translate between these abstractions that accurately preserves meaning.
The post goes through a few ways that we could attempt to do this translation, but all of them seem to only reduce the amount of translation that is necessary: none of them solve the chicken-and-egg problem of how you do the very first translation between the abstractions.
Planned opinion:
I like this view on alignment, but I don’t know if I would call it the _central_ problem of alignment. It sure seems important that the AI is _optimizing_ something: this is what prevents solutions like “make sure the AI has an undo button / off switch”, which would be my preferred line of attack if the main source of AI risk were bad translations between abstractions. There’s a longer discussion on this point here.
(I might change the opinion based on further replies to my other comment.)
- johnswentworth 27 Mar 2020 19:16 UTC
  LW: 4 AF: 3
  0
  AF Parent
  Endorsed; that definitely captures the key ideas.
  If you haven’t already, you might want to see my answer to Steve’s comment, on why translation to low-level structure is the right problem to think about even if the AI is using higher-level models.
  - Rohin Shah 27 Mar 2020 20:30 UTC
    LW: 4 AF: 3
    0
    AF Parent
    I did see that answer and pretty strongly agree with it, the “low-level structure” part of my summary was meant to be an example, not a central case. To make this clearer, I changed
    which could potentially be detailed accurate low-level simulations
    to
    which could be very alien to us (e.g. perhaps the AI system uses detailed low-level simulations)