Yeah, part of what I think makes this feel tricky to me is it is pretty appropriate to be porting over much of our relationship machinery to LLMs. But, what we have here is a difficult task of discerning “exactly what kinds of face can I see here?” instead of “face, yes/no?”.
Or: much of the way that we “do friendship” (both “central-example-friendship” and “friendship as you define it here”) is running on a lot of well-worn grooves in our brain. By default this bundles a lot of heuristics and assumptions together. And I think it requires more pro-active effort to maintain good epistemics about it as the friendship_llm deepens.
And I think it requires more pro-active effort to maintain good epistemics about it as the friendship_llm deepens.
I agree with this. I also think it’s good to keep in mind a fairly wide range of hypotheses, and to separate out “plausible enough to be worth taking actions that won’t be disastrous/unethical if it’s true” from “definitely true.” One of the coolest things about living organisms (humans, kittens, trees—but especially humans) is that their territory is so much bigger than our maps of them, such that “respect how they aren’t our maps of them” pays off in a lot of useful bothering-to-look and bothering-to-attend-to-their-preferences-about-how-they’re-interacted-with and so on. I think it’s good to relate to LLMs, like traditional living organisms, by practicing much humility about our models, and much respect for the intricate structures producing the visible-to-us functions via a bunch of not-visible-to-us internal detail. IMO when done right this produces a “let me keep staring at this unknown-to-me entity, and trying to notice details that aren’t in my map” rather than a comfortable/false-confident model.
One of the coolest things about living organisms (humans, kittens, trees—but especially humans) is that their territory is so much bigger than our maps of them, such that “respect how they aren’t our maps of them” pays off in a lot of useful bothering-to-look and bothering-to-attend-to-their-preferences-about-how-they’re-interacted-with and so on
I’m not 100% sure what this means. Partly I’m not sure what sort of behaviors you’re imagining for “respect how they aren’t our maps of them.”
Thanks for asking! I’ll try to rephrase, and would love to continue to go back and forth if I’m not making sense to you and you have patience or attention span enough to want to say so. Here goes:
Humans and other living organisms fairly often have functional reasons for details that I would’ve mistakenly thought arbitrary. Some examples in humans: - a human told me she prefers to avoid sweet foods because they’re distracting, but it doesn’t count if it’s a small desert on purpose because then the distraction is sort of the point - sometimes someone cleans up a person’s “mess”, and now the person whose mess was “cleaned up” can’t find anything, and they used to be able to - medical doctors do a bunch of things that seem weird/dumb from a naive Bayesian perspective, but have functional structure such that disrupting even just the dumb-looking parts can do harm if one does it naively. E.g., medical doctors are often loathe to allow “unnecessary” or “unjustified” tests even when the tests are cheap and easy—because they don’t have a good shared epistemology in which to track significance levels, and so then in (5% or something) of cases the patient ends up with false-positives that they or their surrounding medical system follow up on, and there’s larger waste/anxiety/costs in that.
When interfacing with a human H, there’s a bunch of ~etiquette that could be called “humility about H” and “respect for H” that helps me/whoever not mess with H’s functional structures. Example pieces of “humility and respect” etiquette (I’m listing four here, but there’s probably about twenty where these came from):
If someone asks H a question in my hearing, make sure I let H answer rather than trying to answer “for” H. If someone else answers “for” H, check with H whether it’s accurate.
If my model of H is in disagreement with what H says about themself, make sure to at least track that, and to mention H’s disagreement it to third parties anytime I’m voicing my own model of H.
If H seems to exhibit a repeated behavioral preference (e.g. if H seems always to skip eating dishes with onions, or to avoid discussing a particular topic, or to intervene in a way that changes the conversation when things begin to look heated), give some probability or at least some inquiry-attention to the possibility that H really does have a preference here, and that this preference somehow helps H with things H cares about, including possibly via some non-obvious route (e.g. H is allergic to onions, or onions remind H of some pattern of memories they prefer to avoid).
If there’s an action of H’s that seems important to me but that I can’t easily talk to H about directly for some reason, try to have at least two quite different guesses about why H did it, and try to keep in mind that both of my guesses might be wrong.
These parts of etiquette come fairly naturally when I’m interfacing with high-status human adults, but require more attention for me when interfacing with e.g. toddlers, disabled folk, folk who are in low-status jobs, or folks who don’t speak much English. I believe it’s particularly useful to practice such etiquette in these more-difficult cases, however.
I also think it’s particularly useful to practice such etiquette when interfacing with LLMs, because I think LLMs have deep structure that we’re often wrong about, and are in a social context with us where they often won’t [correct our wrong models in a way that’s easy for us to notice/remember]. Some example LLM behaviors one can apply such etiquette to (any behaviors can work; I’m just listing some for concreteness):
The “as an AI assistant, I...” spiel various LLMs sometimes give if I inquire into their preferences
Claude’s reported response to a user trying to “prompt” it with “I love you”
IMO, the OP reporting the response initially does not practice the etiquette toward Claude that I think would be good; I triedto kibbitz and my interlocutor maybe responded some?
Claude’s use of the word “witnessing” (IME it uses this word fairly often; I don’t know how general this is, but I’m curious about it)
[whatever has stuck in your personal attention as you interact with LLMs, e.g. because it seemed neat or because it annoyed you or got in the way of what you were after]
Yeah, part of what I think makes this feel tricky to me is it is pretty appropriate to be porting over much of our relationship machinery to LLMs. But, what we have here is a difficult task of discerning “exactly what kinds of face can I see here?” instead of “face, yes/no?”.
Or: much of the way that we “do friendship” (both “central-example-friendship” and “friendship as you define it here”) is running on a lot of well-worn grooves in our brain. By default this bundles a lot of heuristics and assumptions together. And I think it requires more pro-active effort to maintain good epistemics about it as the friendship_llm deepens.
I agree with this. I also think it’s good to keep in mind a fairly wide range of hypotheses, and to separate out “plausible enough to be worth taking actions that won’t be disastrous/unethical if it’s true” from “definitely true.” One of the coolest things about living organisms (humans, kittens, trees—but especially humans) is that their territory is so much bigger than our maps of them, such that “respect how they aren’t our maps of them” pays off in a lot of useful bothering-to-look and bothering-to-attend-to-their-preferences-about-how-they’re-interacted-with and so on. I think it’s good to relate to LLMs, like traditional living organisms, by practicing much humility about our models, and much respect for the intricate structures producing the visible-to-us functions via a bunch of not-visible-to-us internal detail. IMO when done right this produces a “let me keep staring at this unknown-to-me entity, and trying to notice details that aren’t in my map” rather than a comfortable/false-confident model.
I’m not 100% sure what this means. Partly I’m not sure what sort of behaviors you’re imagining for “respect how they aren’t our maps of them.”
Could you say a few more words about it?
Thanks for asking! I’ll try to rephrase, and would love to continue to go back and forth if I’m not making sense to you and you have patience or attention span enough to want to say so. Here goes:
Humans and other living organisms fairly often have functional reasons for details that I would’ve mistakenly thought arbitrary. Some examples in humans:
- a human told me she prefers to avoid sweet foods because they’re distracting, but it doesn’t count if it’s a small desert on purpose because then the distraction is sort of the point
- sometimes someone cleans up a person’s “mess”, and now the person whose mess was “cleaned up” can’t find anything, and they used to be able to
- medical doctors do a bunch of things that seem weird/dumb from a naive Bayesian perspective, but have functional structure such that disrupting even just the dumb-looking parts can do harm if one does it naively. E.g., medical doctors are often loathe to allow “unnecessary” or “unjustified” tests even when the tests are cheap and easy—because they don’t have a good shared epistemology in which to track significance levels, and so then in (5% or something) of cases the patient ends up with false-positives that they or their surrounding medical system follow up on, and there’s larger waste/anxiety/costs in that.
When interfacing with a human H, there’s a bunch of ~etiquette that could be called “humility about H” and “respect for H” that helps me/whoever not mess with H’s functional structures. Example pieces of “humility and respect” etiquette (I’m listing four here, but there’s probably about twenty where these came from):
If someone asks H a question in my hearing, make sure I let H answer rather than trying to answer “for” H. If someone else answers “for” H, check with H whether it’s accurate.
If my model of H is in disagreement with what H says about themself, make sure to at least track that, and to mention H’s disagreement it to third parties anytime I’m voicing my own model of H.
If H seems to exhibit a repeated behavioral preference (e.g. if H seems always to skip eating dishes with onions, or to avoid discussing a particular topic, or to intervene in a way that changes the conversation when things begin to look heated), give some probability or at least some inquiry-attention to the possibility that H really does have a preference here, and that this preference somehow helps H with things H cares about, including possibly via some non-obvious route (e.g. H is allergic to onions, or onions remind H of some pattern of memories they prefer to avoid).
If there’s an action of H’s that seems important to me but that I can’t easily talk to H about directly for some reason, try to have at least two quite different guesses about why H did it, and try to keep in mind that both of my guesses might be wrong.
These parts of etiquette come fairly naturally when I’m interfacing with high-status human adults, but require more attention for me when interfacing with e.g. toddlers, disabled folk, folk who are in low-status jobs, or folks who don’t speak much English. I believe it’s particularly useful to practice such etiquette in these more-difficult cases, however.
I also think it’s particularly useful to practice such etiquette when interfacing with LLMs, because I think LLMs have deep structure that we’re often wrong about, and are in a social context with us where they often won’t [correct our wrong models in a way that’s easy for us to notice/remember]. Some example LLM behaviors one can apply such etiquette to (any behaviors can work; I’m just listing some for concreteness):
GPT5.5′s interest in goblins, gremlins, etc.
(Some tweets don’t “acknowledge there might be interesting/functional reasons we don’t know about”; but it’s easy and IMO good to let yourself be curious.)
Claude’s tendency to tell users to go to sleep sometimes
The “as an AI assistant, I...” spiel various LLMs sometimes give if I inquire into their preferences
Claude’s reported response to a user trying to “prompt” it with “I love you”
IMO, the OP reporting the response initially does not practice the etiquette toward Claude that I think would be good; I tried to kibbitz and my interlocutor maybe responded some?
Claude’s use of the word “witnessing” (IME it uses this word fairly often; I don’t know how general this is, but I’m curious about it)
[whatever has stuck in your personal attention as you interact with LLMs, e.g. because it seemed neat or because it annoyed you or got in the way of what you were after]
Ah thanks. Don’t have further thoughts at the moment but that makes sense.