At a very high level, we can model powerful AI systems as moving closer and closer to omniscience. As we move in that direction, what becomes the new constraint on technology? This post argues that the constraint is _good interfaces_, that is, something that allows us to specify what the AI should do. As with most interfaces, the primary challenge is dealing with the discrepancy between the user’s abstractions (how humans think about the world) and the AI system’s abstractions, which could be very alien to us (e.g. perhaps the AI system uses detailed low-level simulations). The author believes that this is the central problem of AI alignment: how to translate between these abstractions that accurately preserves meaning.
The post goes through a few ways that we could attempt to do this translation, but all of them seem to only reduce the amount of translation that is necessary: none of them solve the chicken-and-egg problem of how you do the very first translation between the abstractions.
Planned opinion:
I like this view on alignment, but I don’t know if I would call it the _central_ problem of alignment. It sure seems important that the AI is _optimizing_ something: this is what prevents solutions like “make sure the AI has an undo button / off switch”, which would be my preferred line of attack if the main source of AI risk were bad translations between abstractions. There’s a longer discussion on this point here.
(I might change the opinion based on further replies to my other comment.)
If you haven’t already, you might want to see my answer to Steve’s comment, on why translation to low-level structure is the right problem to think about even if the AI is using higher-level models.
I did see that answer and pretty strongly agree with it, the “low-level structure” part of my summary was meant to be an example, not a central case. To make this clearer, I changed
which could potentially be detailed accurate low-level simulations
to
which could be very alien to us (e.g. perhaps the AI system uses detailed low-level simulations)
Planned summary for the Alignment Newsletter:
Planned opinion:
(I might change the opinion based on further replies to my other comment.)
Endorsed; that definitely captures the key ideas.
If you haven’t already, you might want to see my answer to Steve’s comment, on why translation to low-level structure is the right problem to think about even if the AI is using higher-level models.
I did see that answer and pretty strongly agree with it, the “low-level structure” part of my summary was meant to be an example, not a central case. To make this clearer, I changed
to