I don’t know what you mean by fixed points in policy. Elaborate?
I might have slightly abused the term “fix point” & being unnecessarily wordy.
I mean that though I don’t see how memes can change objectives of agents in a fundamental way, memes influence “how certain objectives are being maximized”. Low-level objectives are the same yet their policies are implemented differently—because of receiving different memes. I think it’s vaguely like externally installed bias.
Ex: humans all crave social connections but people model their relationship with the society and interpret such desire differently, partially depending on cultural upbringing (meme).
I don’t know if having higher-levels of intelligence/being more rational/coherent cancels out the effects, ex: smarter version of agent now thinks more generally about all possible policies and finds there’s a ‘optimal’ way to realize certain objective and is no longer steered by memes/biases. Though I think in open-ended tasks it’s less likely to see such convergence, because current space of policies is built upon solutions and tools built before and is highly path-dependent in general. So memes early on might matter more to open-ended tasks.
I’m also thinking about agency foundations atm, and also confused about the generality of the utility maximizer frame. One simple answer to why humans don’t fit the frame is “humans aren’t optimizing hard enough (so haven’t shown convergence in policy)”. But this answer doesn’t clarify “what happens when agents aren’t as rational/hard-optimizing”, “dynamics and preconditions when agents-in-general becomes more rational/coherent/utility maximizer”, etc. so I’m not happy with my state of understand on this matter.
The book looks cool, will read soon, TY!
(btw this is my first interaction on lw so it’s cool :) )
I find this perspective interesting (and confusing), and want to think about it more deeply. Can you recommend reading anything to have a better understanding of what you’re thinking, or what led you to this idea in specific?
Beyond the possible implications you mentioned, I think this might be useful in clarifying the ‘trajectory’ of agent selection pressure far from theoretical extremes that Richard Ngo mentioned in “agi safety from first principles” sequence.
My vague intuition is that successful, infectious memes work by reconfiguring agents to shift from one fix point in policy to another while not disrupting utility. Does that make sense?