I’ve been following this discussion from Jan’s first post, and I’ve been enjoying it. I’ve put together some pictures to explain what I see in this discussion.
Something like the original misalignment might be something like this:
This is fair as a first take, and if we want to look at it through a utility function optimisation lens, we might say something like this:
Where cultural values is the local environment that we’re optimising for.
As Jacob mentions, humans are still very effective when it comes to general optimisation if we look directly at how well it matches evolution’s utility function. This calls for a new model.
Here’s what I think actually happens :
Which can be perceived as something like this in the environmental sense:
Based on this model, what is cultural (human) evolution telling us about misalignment?
We have adopted proxy values (Y1,Y2,..YN) or culture in order to optimise for X or IGF. In other words, the shard of cultural values developed as a more efficient optimisation target in the new environment where different tribes applied optimisation pressure on each other.
Also, I really enjoy the book The Secret Of Our Success when thinking about these models as it provides some very nice evidence about human evolution.
I will say that I thought Connor Leahy’s talk on ML Street Talk was amazing and that we should if possible make Connor go on Lex Fridman?
The dude looks like a tech wizard and is smart, funny, charming and a short timeline doomer. What else do you want?
Anyway we should create a council of charming doomers or something and send them at the media, it would be very epic. (I am in full agreement with this post btw)