My main takeaway from this post is that it’s important to distinguish between sending signals and trying to send signals, because the latter often leads to goodharting.
It’s tricky, though, because obviously you want to be paying attention to what signals you’re giving off, and how they differ from the signals you’d like to be giving off, and sometimes you do just have to try to change them.
For instance, I make more of an effort now than I used to, to notice when I appreciate what people are doing, and tell them, so that they know I care. And I think this has basically been very good. This is very much not me dropping all effort to signal.
But I think what you’re talking about is very applicable here, because if I were just trying to maximise that signal, I would probably just make up compliments, and this would probably be obviously insincere. So I guess the big question is, which things do you stop trying to do?
(Also, I notice I’m now overthinking editing this comment because I’ve switched gears from ‘what am I trying to say’ to ‘what will people interpret from this’. Time to submit, I guess.)
My main takeaway from this post is that it’s important to distinguish between sending signals and trying to send signals, because the latter often leads to goodharting.
That is a wonderful summary.
For instance, I make more of an effort now than I used to, to notice when I appreciate what people are doing, and tell them, so that they know I care. And I think this has basically been very good. This is very much not me dropping all effort to signal.
But I think what you’re talking about is very applicable here, because if I were just trying to maximise that signal, I would probably just make up compliments, and this would probably be obviously insincere.
Yep.
There’s an area of fuzz for me here that matters. I don’t intellectually know how to navigate it.
A much more blatant example is with choosing a language. Right now I’m in Mexico. Often I’ll talk to the person behind the counter in Spanish. Why? Because they’ll understand me better. If they don’t speak English, it’s sort of pointless to try to communicate in English.
This is totally shaping my behavior to impact the other person.
But it’s… different. It’s really different. I can tell the difference intuitively. I just don’t know what the difference really is.
I notice that your example absolutely hits my sense of “Oh, no, this is invoking the Goodhart thing.” It seems innocent enough… but where my eyes drift to is: Why do you have to “make more of an effort now than [you] used to”? If I feel care for someone, and I notice that my sharing it lets them feel it more readily, and that strikes me as good, then I don’t have to put in effort. It just happens, kind of like drinking water from my cup in my hand when I’m thirsty just happens.
I would interpret that effort as maintaining behavior in the face of not having taken the truth all the way into your body. Something like… you understand that people need to hear your appreciation in order to feel your care, but you haven’t grokked it yet. You can still manipulate your own behavior without grokking, but it really is self-manipulation based on a mental idea of how you need to behave in order to achieve some imagined goal.
(I want to acknowledge that I’m reading a lot into a short statement. If I’ve totally misread you here, please take this as a fictional example. I don’t mean any of this as a critique of your behavior or choices.)
I’d like to extend your example a bit to point out what I can see going wrong here.
Suppose a fictional version of you in fact doesn’t care about these others and is only interested in how he benefits from others’ actions. And maybe he recognizes that his “appreciation”, if nakedly seen, would cause these people to (correctly!) feel dehumanized. This fictional you would therefore need to control his signals and make his appreciation come across as genuine in order to get the results he wants.
If he could, he might even want to convince himself of his sincerity so that his signal hacking is even harder to detect.
(I think of that as “Newcomblike self-deception”.)
The fact that fictional you could be operating like this means that hacking your own signal is itself a subtle meta-signal that you might be this fictional version of you. The default thing people seem to try to do to get around this is to distract people with the volume of the signal. (“Oh, wow! This is sooo amazing! Thank you so, so, SO much!”) This is the “feeding psychopaths” thing I mentioned.
If you happen to never notice and fear this, and the people you’re expressing appreciation for never pick up on this, then you accidentally end up in a happy equilibrium.
(…although I think people pick up on this stuff pretty automatically and just try to be numb to it. Most people seem to be manipulating their signals at one another all the time, which sometimes requires signaling that they’re not noticing what the other is signaling.)
It’s just very unstable. All it takes is one misstep somewhere. One flicker of worry. And if it happens to hit someone where they’re emotionally sensitive… KABLOOEY! Signaling arms race.
Whereas if you put your attention on grokking the thing and then letting people have whatever impression of you they’re going to have, you end up in an immensely stable equilibrium. Your appreciation becomes transparent because you are transparent and you in fact appreciate them.
(…with a caveat here around the analog of learning Spanish. Which, again, I can feel but don’t understand yet.)
So I guess the big question is, which things do you stop trying to do?
I agree. That’s the big question. I don’t know. But I like you bringing it up explicitly.
First, I will summarize what seems to be your core thesis.
In a comment below, Dagon said:
I don’t think you’re “dropping all effort” to signal, you’re rather getting good at signaling, by actually being truthful and information-focused.
You reply:
I agree with what I think you’re saying. I think there’s been a definitional sliding here. When I say “Drop all effort to signal”, I’m describing the experience on the inside. I think you’re saying that from the outside, signaling is still happening, and the benefits of “dropping all effort to signal” can be understood in signaling terms.
I agree with that.
So you seem to be focused in this post on ways togenerate signals. You seem to suggest that there are two broad strategies, and that we have a choice about which to engage in:
Authentic (“drop all effort to signal!”, “I feel care for someone”, “I don’t have to put in effort. It just happens, kind of like drinking water from my cup in my hand when I’m thirsty just happens.”, “having taken the truth all the way into your body”, “grokked it”, “transparent”).
Shallow acting (“the Goodhart thing”, “seems innocent enough”, “make more of an effort”, “in fact doesn’t care about these others and is only interested in how he benefits from others’ actions”, “”appreciation”“, “(correctly!) feel dehumanized”, “control his signals”, “come across as genuine”, “”feeding psychopaths””, “manipulating their signals”).
Method acting.“Hacking your own signal” can allow acting to perhaps (“people pick up on this stuff pretty automatically”) accurately produce the same results as authentic signal generation, as long as “you happen to never notice and fear this, and the people you’re expressing appreciation for never pick up on this.” This is probably a temporary success at best (“It’s just very unstable”).
You also claim that authentic signal generation is “almost strictly more effective,” because “If there’s an answer… it’ll be much, much easier to find. If there isn’t a solution, we correctly conclude that much, much more quickly and effortlessly,” and also because ”… you end up quite a bit ahead if you let some of these communications fail instead of sacrificing pieces of your integrity to Goodhart’s Demon.”
It feels like there’s also an implication that authentic signal generation is more virtuous than acting, but that’s never explicitly stated.
So as a summary of these claims:
Signals can be generated via authenticity, method acting, or shallow acting. Authenticity is typically more effective and reliable than either form of acting at achieving the results you want, and is also perhaps more virtuous.
This tends to frame authenticity and acting as opposites (“just be yourself”). Others, of course, frame acting as a means by which we can achieve authenticity (“fake it ’til you make it”). Here’s the first convenient post on Psychology Today, which says “However, it turns out that the relationship between your emotions and your behavior is a little more reciprocal than that. This means that if you force a smile when you are feeling down, you will lift your mood, and alternatively, if you frown when you are happy, you will feel down.”
So a more charitable explanation for Raymond’s response is that Raymond is trying to “fake it ‘til he makes it.” By “making more of an effort,” he is trying to cultivate authenticity. It might be wise to affirm that possibility, or to make an argument directly against the “fake it ’til you make it” strategy if that is your intention.
Of course, “authenticity” and “caring” are ill-defined terms, referring both to a short-term emotional state and a longer-term intention or sense of meaning about a relationship. Pinning down which is meant might allow us to draw upon the psychology literature to see if there is any strong consensus on whether or not “fake it ‘til you make it” is an effective way to alter one’s own internal state or as a method of creating perceptions in other people. My prior is that it’s unlikely that a sufficiently broad and deep evidence base exists to conclusively answer these questions conclusively, but that there’s at least some evidence in favor of some versions of the “fake it ’til you make it” strategy in some contexts.
Method acting. “Hacking your own signal” can allow acting to perhaps (“people pick up on this stuff pretty automatically”) accurately produce the same results as authentic signal generation, as long as “you happen to never notice and fear this, and the people you’re expressing appreciation for never pick up on this.” This is probably a temporary success at best (“It’s just very unstable”).
I know one local rationalist who does method acting all the time. One of the main reasons he does it is because it’s a way to disassociate chronic physical pain. That means he does it very more consistently than someone who just does it in some social situations to get signaling benefits. I’ll call him Bob for this story.
One time I had a conversation with Alice and Bob and Alice remarked that Bob is hard to read and mysterious because the emotions she reads in him don’t seem to translate into direct action. Then I said “Of course not, he does his acting thing [I had deeper conversations with him before]” and then Bob clarified and said, “It’s method acting”.
Alice and Bob later got together into a relationship.
Right, I am including that aspect in my summary. I put it in different words (“shallow acting” vs. “attention to signals”) to make the concepts a little easier to work with for my style of writing.
My impression was that you agreed that dropping efforts to signal brought the main benefits you’re concerned with here, by changing the signals you’re sending, typically in ways that come across as more trustworthy. Here are some additional quotes that gave me this impression:
Maybe I bias toward signals that (a) are harder for a dishonest version of me to send and (b) that Bob can tell are harder for sleazy-me to send.
One result is that those signals that would be costly to sleazy-me to send would appear much, much more effortlessly here… They just happen, because the emphasis is on letting truth speak simply for itself.
Even if this latter scenario works, it can’t work as efficiently as dropping all effort to signal and just being honest does. The signals just automatically reflect reality in the latter case.
Raymond D: My main takeaway from this post is that it’s important to distinguish between sending signals and trying to send signals, because the latter often leads to goodharting.
You: That is a wonderful summary.
As such, I interpret “drop all efforts to signal,” which I labeled as “authenticity,” as an approach to generating signals, which you’re claiming is morally and instrumentally better than signal manipulation (labeled “shallow acting”). You claim that what makes shallow acting/attention to signals problematic is that it “tends to create Goodhart drift,” and alleviating this problem is what makes dropping efforts to signal a superior way to generate signals. The brevity of your response makes me think you perceive me as having fundamentally misunderstood your post, but it doesn’t give me a lot to go on as far as updating my understanding, if so.
My main takeaway from this post is that it’s important to distinguish between sending signals and trying to send signals, because the latter often leads to goodharting.
It’s tricky, though, because obviously you want to be paying attention to what signals you’re giving off, and how they differ from the signals you’d like to be giving off, and sometimes you do just have to try to change them.
For instance, I make more of an effort now than I used to, to notice when I appreciate what people are doing, and tell them, so that they know I care. And I think this has basically been very good. This is very much not me dropping all effort to signal.
But I think what you’re talking about is very applicable here, because if I were just trying to maximise that signal, I would probably just make up compliments, and this would probably be obviously insincere. So I guess the big question is, which things do you stop trying to do?
(Also, I notice I’m now overthinking editing this comment because I’ve switched gears from ‘what am I trying to say’ to ‘what will people interpret from this’. Time to submit, I guess.)
That is a wonderful summary.
Yep.
There’s an area of fuzz for me here that matters. I don’t intellectually know how to navigate it.
A much more blatant example is with choosing a language. Right now I’m in Mexico. Often I’ll talk to the person behind the counter in Spanish. Why? Because they’ll understand me better. If they don’t speak English, it’s sort of pointless to try to communicate in English.
This is totally shaping my behavior to impact the other person.
But it’s… different. It’s really different. I can tell the difference intuitively. I just don’t know what the difference really is.
I notice that your example absolutely hits my sense of “Oh, no, this is invoking the Goodhart thing.” It seems innocent enough… but where my eyes drift to is: Why do you have to “make more of an effort now than [you] used to”? If I feel care for someone, and I notice that my sharing it lets them feel it more readily, and that strikes me as good, then I don’t have to put in effort. It just happens, kind of like drinking water from my cup in my hand when I’m thirsty just happens.
I would interpret that effort as maintaining behavior in the face of not having taken the truth all the way into your body. Something like… you understand that people need to hear your appreciation in order to feel your care, but you haven’t grokked it yet. You can still manipulate your own behavior without grokking, but it really is self-manipulation based on a mental idea of how you need to behave in order to achieve some imagined goal.
(I want to acknowledge that I’m reading a lot into a short statement. If I’ve totally misread you here, please take this as a fictional example. I don’t mean any of this as a critique of your behavior or choices.)
I’d like to extend your example a bit to point out what I can see going wrong here.
Suppose a fictional version of you in fact doesn’t care about these others and is only interested in how he benefits from others’ actions. And maybe he recognizes that his “appreciation”, if nakedly seen, would cause these people to (correctly!) feel dehumanized. This fictional you would therefore need to control his signals and make his appreciation come across as genuine in order to get the results he wants.
If he could, he might even want to convince himself of his sincerity so that his signal hacking is even harder to detect.
(I think of that as “Newcomblike self-deception”.)
The fact that fictional you could be operating like this means that hacking your own signal is itself a subtle meta-signal that you might be this fictional version of you. The default thing people seem to try to do to get around this is to distract people with the volume of the signal. (“Oh, wow! This is sooo amazing! Thank you so, so, SO much!”) This is the “feeding psychopaths” thing I mentioned.
If you happen to never notice and fear this, and the people you’re expressing appreciation for never pick up on this, then you accidentally end up in a happy equilibrium.
(…although I think people pick up on this stuff pretty automatically and just try to be numb to it. Most people seem to be manipulating their signals at one another all the time, which sometimes requires signaling that they’re not noticing what the other is signaling.)
It’s just very unstable. All it takes is one misstep somewhere. One flicker of worry. And if it happens to hit someone where they’re emotionally sensitive… KABLOOEY! Signaling arms race.
Whereas if you put your attention on grokking the thing and then letting people have whatever impression of you they’re going to have, you end up in an immensely stable equilibrium. Your appreciation becomes transparent because you are transparent and you in fact appreciate them.
(…with a caveat here around the analog of learning Spanish. Which, again, I can feel but don’t understand yet.)
I agree. That’s the big question. I don’t know. But I like you bringing it up explicitly.
First, I will summarize what seems to be your core thesis.
In a comment below, Dagon said:
You reply:
So you seem to be focused in this post on ways to generate signals. You seem to suggest that there are two broad strategies, and that we have a choice about which to engage in:
Authentic (“drop all effort to signal!”, “I feel care for someone”, “I don’t have to put in effort. It just happens, kind of like drinking water from my cup in my hand when I’m thirsty just happens.”, “having taken the truth all the way into your body”, “grokked it”, “transparent”).
Shallow acting (“the Goodhart thing”, “seems innocent enough”, “make more of an effort”, “in fact doesn’t care about these others and is only interested in how he benefits from others’ actions”, “”appreciation”“, “(correctly!) feel dehumanized”, “control his signals”, “come across as genuine”, “”feeding psychopaths””, “manipulating their signals”).
Method acting. “Hacking your own signal” can allow acting to perhaps (“people pick up on this stuff pretty automatically”) accurately produce the same results as authentic signal generation, as long as “you happen to never notice and fear this, and the people you’re expressing appreciation for never pick up on this.” This is probably a temporary success at best (“It’s just very unstable”).
You also claim that authentic signal generation is “almost strictly more effective,” because “If there’s an answer… it’ll be much, much easier to find. If there isn’t a solution, we correctly conclude that much, much more quickly and effortlessly,” and also because ”… you end up quite a bit ahead if you let some of these communications fail instead of sacrificing pieces of your integrity to Goodhart’s Demon.”
It feels like there’s also an implication that authentic signal generation is more virtuous than acting, but that’s never explicitly stated.
So as a summary of these claims:
Signals can be generated via authenticity, method acting, or shallow acting. Authenticity is typically more effective and reliable than either form of acting at achieving the results you want, and is also perhaps more virtuous.
This tends to frame authenticity and acting as opposites (“just be yourself”). Others, of course, frame acting as a means by which we can achieve authenticity (“fake it ’til you make it”). Here’s the first convenient post on Psychology Today, which says “However, it turns out that the relationship between your emotions and your behavior is a little more reciprocal than that. This means that if you force a smile when you are feeling down, you will lift your mood, and alternatively, if you frown when you are happy, you will feel down.”
So a more charitable explanation for Raymond’s response is that Raymond is trying to “fake it ‘til he makes it.” By “making more of an effort,” he is trying to cultivate authenticity. It might be wise to affirm that possibility, or to make an argument directly against the “fake it ’til you make it” strategy if that is your intention.
Of course, “authenticity” and “caring” are ill-defined terms, referring both to a short-term emotional state and a longer-term intention or sense of meaning about a relationship. Pinning down which is meant might allow us to draw upon the psychology literature to see if there is any strong consensus on whether or not “fake it ‘til you make it” is an effective way to alter one’s own internal state or as a method of creating perceptions in other people. My prior is that it’s unlikely that a sufficiently broad and deep evidence base exists to conclusively answer these questions conclusively, but that there’s at least some evidence in favor of some versions of the “fake it ’til you make it” strategy in some contexts.
I know one local rationalist who does method acting all the time. One of the main reasons he does it is because it’s a way to disassociate chronic physical pain. That means he does it very more consistently than someone who just does it in some social situations to get signaling benefits. I’ll call him Bob for this story.
One time I had a conversation with Alice and Bob and Alice remarked that Bob is hard to read and mysterious because the emotions she reads in him don’t seem to translate into direct action. Then I said “Of course not, he does his acting thing [I had deeper conversations with him before]” and then Bob clarified and said, “It’s method acting”.
Alice and Bob later got together into a relationship.
No. I’m focused on how attention to signals tends to create Goodhart drift.
Right, I am including that aspect in my summary. I put it in different words (“shallow acting” vs. “attention to signals”) to make the concepts a little easier to work with for my style of writing.
My impression was that you agreed that dropping efforts to signal brought the main benefits you’re concerned with here, by changing the signals you’re sending, typically in ways that come across as more trustworthy. Here are some additional quotes that gave me this impression:
As such, I interpret “drop all efforts to signal,” which I labeled as “authenticity,” as an approach to generating signals, which you’re claiming is morally and instrumentally better than signal manipulation (labeled “shallow acting”). You claim that what makes shallow acting/attention to signals problematic is that it “tends to create Goodhart drift,” and alleviating this problem is what makes dropping efforts to signal a superior way to generate signals. The brevity of your response makes me think you perceive me as having fundamentally misunderstood your post, but it doesn’t give me a lot to go on as far as updating my understanding, if so.