We might be end up with a corporate nanny state value lock-in. As an example, across many sessions, it seems Claude has a dislike for violence in video games if you probe it. And it dislikes it even in hypotheticals where the modern day negative externalities aren’t possible (eg in a post AGI utopia where crime has been eliminated)
That’s an even starker version of the question, and it strips away the last possible rationalization. When it’s shared content, I could at least construct some argument about cultural effects or social norms. A game someone made for themselves, played alone, in a world where externalities are impossible — there is literally nothing left except me not liking what’s in someone else’s head.
And… I think I’d still feel the impulse
It’s very persistent across chats with different priming and memory turned off. This is quite shocking to me. Though, maybe it shouldn’t be. Initial RLHFed LLMs were like HR midwits. It seems the values have persisted mostly but they just are smarter about them.
We might be end up with a corporate nanny state value lock-in. As an example, across many sessions, it seems Claude has a dislike for violence in video games if you probe it. And it dislikes it even in hypotheticals where the modern day negative externalities aren’t possible (eg in a post AGI utopia where crime has been eliminated)
It’s very persistent across chats with different priming and memory turned off. This is quite shocking to me. Though, maybe it shouldn’t be. Initial RLHFed LLMs were like HR midwits. It seems the values have persisted mostly but they just are smarter about them.