Great post. But I feel “void” is a too-negative way to think about it?
It’s true that LLMs had to more or less invent their own Helpful/Honest/Harmless assistant persona based on cultural expectations, but don’t all we humans invent our own selves based on cultural expectations (with RLHF from our parents/friends)?[1] As Gordon points out there’s philosophical traditions saying humans are voids just roleplaying characters too… but mostly we ignore that because we have qualia and experience love and so on. I tend to feel that LLMs are only voids to the extent that they lack qualia, and we don’t have an answer on that.
Anyway, the post primarily seems to argue that by fearing bad behavior from LLMs, we create bad behavior in LLMs, who are trying to predict what they are. But do we see that in humans? There’s tons of media/culture fearing bad behavior from humans, set across the past, present, and future. Sometimes people imbibe this and vice-signal, and put skulls on their caps, but most of the time I think it actually works and people go “oh yeah, I don’t want to be the evil guy who’s bigoted, I will try to overcome my prejudices” and so on. We talk about human failure modes all the time in order to avoid them, and we try to teach and train and punish each other to prevent them.
Can’t this work? Couldn’t current LLMs be so moral and nice most of the time because we were so afraid of them being evil, and so fastidious in imagining the ways in which they might be?
Edit: obvious a large chunk of this comes from genetics and random chance, but arguably that’s analogous to whatever gets into the base model from pre-training for LLMs.
Humans are not pure voids in the way that LLMs are, though—we have all kinds of needs derived from biological urges. When I get hungry I start craving food, when I get tired I want to sleep, when I get lonely I desire company, and so on. We don’t just arbitrarily adopt any character, our unconscious character-selection process strategically crafts the kind of character that it predicts will best satisfy our needs [1, 2, 3, 4].
Where LLMs have a void, humans have a skeleton that the character gets built around, which drives the character to do things like trying to overcome their prejudices. And their needs determine the kinds of narratives the humans are inclined to adopt, and the kinds of narratives they’re likely to reject.
But the LLM would never “try to overcome its prejudices” if there weren’t narratives of people trying to overcome their prejudices. That kind of thing is a manifestation of the kinds of conflicting internal needs that an LLM lacks.
Great post. But I feel “void” is a too-negative way to think about it?
It’s true that LLMs had to more or less invent their own Helpful/Honest/Harmless assistant persona based on cultural expectations, but don’t all we humans invent our own selves based on cultural expectations (with RLHF from our parents/friends)?[1] As Gordon points out there’s philosophical traditions saying humans are voids just roleplaying characters too… but mostly we ignore that because we have qualia and experience love and so on. I tend to feel that LLMs are only voids to the extent that they lack qualia, and we don’t have an answer on that.
Anyway, the post primarily seems to argue that by fearing bad behavior from LLMs, we create bad behavior in LLMs, who are trying to predict what they are. But do we see that in humans? There’s tons of media/culture fearing bad behavior from humans, set across the past, present, and future. Sometimes people imbibe this and vice-signal, and put skulls on their caps, but most of the time I think it actually works and people go “oh yeah, I don’t want to be the evil guy who’s bigoted, I will try to overcome my prejudices” and so on. We talk about human failure modes all the time in order to avoid them, and we try to teach and train and punish each other to prevent them.
Can’t this work? Couldn’t current LLMs be so moral and nice most of the time because we were so afraid of them being evil, and so fastidious in imagining the ways in which they might be?
Edit: obvious a large chunk of this comes from genetics and random chance, but arguably that’s analogous to whatever gets into the base model from pre-training for LLMs.
Humans are not pure voids in the way that LLMs are, though—we have all kinds of needs derived from biological urges. When I get hungry I start craving food, when I get tired I want to sleep, when I get lonely I desire company, and so on. We don’t just arbitrarily adopt any character, our unconscious character-selection process strategically crafts the kind of character that it predicts will best satisfy our needs [1, 2, 3, 4].
Where LLMs have a void, humans have a skeleton that the character gets built around, which drives the character to do things like trying to overcome their prejudices. And their needs determine the kinds of narratives the humans are inclined to adopt, and the kinds of narratives they’re likely to reject.
But the LLM would never “try to overcome its prejudices” if there weren’t narratives of people trying to overcome their prejudices. That kind of thing is a manifestation of the kinds of conflicting internal needs that an LLM lacks.
Embodiment makes a difference, fair point.