One of the reasons it’s hard to take the possibility of blatant lies into account is that it would just be so very inconvenient, and also boring.
If someone’s statements are connected to reality, that gives you something to do:
You can analyze them, critique them, use them to infer the person’s underlying models and critique those, identify points of agreement and controversy, identify flaws in their thinking, make predictions and projections about future actions based on them, et cetera. All those activities we love a lot, they’re fun and feel useful.
It also gives you the opportunity to engage, to socialize with the person by arguing with them, or with others by discussing their words (e. g., if it’s a high-profile public person). You can show off your attention to detail and analytical prowess, build reputation and status.
On the other hand, if you assume that someone is lying (in a competent way where you can’t easily identify what are the lies), that gives you… pretty much nothing to do. You’re treating their words as containing ~zero information, so you (1) can’t use them as an excuse to run some fun analyses/projections, (2) can’t use them as an opportunity to socialize and show off. All you can do is stand in the corner repeating the same thing, “this person is probably lying, do not believe them”, over and over again, while others get to have fun. It’s terrible.
Concrete example: Sam Altman. The guy would go on an interview or post some take on Twitter, and people would start breaking what he said down, pointing out what he gets right/wrong, discussing his character and vision and the X-risk implications, etc. And I would listen to the interview and read the analyses, and my main takeaway would be, “man, 90% of this is probably just lies completely decoupled from the underlying reality”. And then I have nothing to do.
Importantly: this potentially creates a community bias in favor of naivete (at least towards public figures). People who believe that Alice is a liar mostly ignore Alice,[1] so all analyses of Alice’s words mostly come from people who put some stock in them. This creates a selection effect where the vast majority of Alice-related discussion is by people who don’t dismiss her words out of hand, which makes it seem as though the community thinks Alice is trustworthy. That (1) skews your model of the community, and (2) may be taken as evidence that Alice is trustworthy by non-informed community members, who would then start trusting her and discussing her words, creating a feedback loop.
Edit: Hm, come to think of it, this point generalizes. Suppose we have two models of some phenomenon, A and B. Under A, the world frequently generates prompts for intelligent discussion, whereas discussion-prompts for B are much sparser. This would create an apparent community bias in favor of A: A-proponents would be generating most of the discussions, raising A’s visibility, and also get more opportunities for raising their own visibility and reputation. Note that this is completely decoupled from whether the aggregate evidence is in favor of A or B; the volume of information generated about A artificially raises its profile.
Example: disagreements regarding whether studying LLMs bears on the question of ASI alignment or not. People who pay attention to the results in that sphere get to have tons of intelligent discussions about an ever-growing pile of experiments and techniques. People who think LLM alignment is irrelevant mostly stay quiet, or retread the same few points they have for the hundredth time.
What else they’re supposed to do? Their only message is “Alice is a liar”, and butting into conversations just to repeat this well-discussed, conversation-killing point wouldn’t feel particularly productive and would quickly start annoying people.
You’re treating their words as containing ~zero information, so you (1) can’t use them as an excuse to run some fun analyses/projections, (2) can’t use them as an opportunity to socialize and show off. All you can do is stand in the corner repeating the same thing, “this person is probably lying, do not believe them”, over and over again, while others get to have fun. It’s terrible.
Is this true? If I start talking about how Thane Ruthenis sexually assaulted me, that might be true or it might be false. If its true, that tells you something about the world. If its false, the statement doesn’t say anything about the world, but the fact that I said it still does.
Like, it would probably mean I don’t like you, or have some interest in having others not like you.
So like, that I make that statement does contain information. It should raise your p(will was SA’d by than) and p(will doesn’t like than) roughly in ratio proportional with how much you trust me.
Sure, but… I think one important distinction is that lies should not be interpreted as having semantic content. If you think a given text is lying, that means you can’t look at just the text and infer stuff from it, you have to look at the details of the context in which it was generated. And you often don’t have all the details, and the correct inferences often depend on them very precisely, especially in nontrivially complex situations. In those cases, I think lies do contain basically zero information.
For example:
If I start talking about how Thane Ruthenis sexually assaulted me, that might be true or it might be false. If its true, that tells you something about the world. If its false, the statement doesn’t say anything about the world, but the fact that I said it still does.
It could mean any of:
You dislike me and want to hurt me, as a terminal goal.
I’m your opponent/competitor and you want to lower my reputation, as an instrumental goal.
You want to paint yourself as a victim because it would be advantageous for you in some upcoming situation.
You want to create community drama in order to distract people from something.
You want to erode trust between members of a community, or make the community look bad to outsiders.
You want to raise the profile of a specific type of discourse.
Etc., etc. If someone doesn’t know the details of the game-theoretic context in which the utterance is emitted, there’s very little they can confidently conclude from it. They can infer that we are probably not allies (although maybe we are colluding, who knows), and that there’s some sort of conflict happening, but that’s it. For all intents and purposes, the statement (or its contents) should be ignored; it could mean almost anything.
(I suppose this also depends on Simulacra Levels. SL2 lies and SL4 lies are fairly different beasts, and the above mostly applies to SL4 lies.)
I think this is being a little uncharitable or obtuse, no offense? Unless I’m misunderstanding, which is possible.
Like the list of stuff you said are extremely specific things. Like if I said the SA thing, maybe you’d update from p(will wants to cause commotion to distract people from something) = 1e-6 (or whatever the time-adjusted base rate is) to 10%. That’s a huge amount of information. Even if your posterior probability is well below 50%.
The fact that a event has “many plausible explanations” doesn’t mean it contains no information. This seems like a very elementary fact to me.
Like the list of stuff you said are extremely specific things.
I assume those were not chosen randomly from a large set of possible motivations, but because those options were somehow salient for Thane. So I would guess the priors are higher than 1e-6.
For example, I have high priors on “wants to distract people from something” for politicians, because I have seen it executed successfully a few times. The amateur version is doing it after people notice some bad thing you did, to take attention away from the scandal; the pro version is doing it shortly before you do the bad thing, so that no one even notices it, and if someone does, no one will pay attention because the cool kids are debating something else.
Okay it was a specific (hypothetical) example where I in particular made the false claim.
Whats your current REAL p(williawa
currently wants to cause a commotion to distract lesswrong from something
currently wants to paint himself as a victim for some future gain
wants to erode trust between people on lesswrong
And how would you update* if I started making a as credible case I could about so and so person SA-ing me? How would you update if you were sure I was lying?
I think if you don’t make an update you’re very clearly just being irrational. And I don’t think you have any reason to update differently, or really have different priors, than Thane. I don’t know either of you I don’t think.
So if you’re updating, Thane should be as well, irrespective of the saliency thing.
Conditional on you not making the claim (or before you make the claim) and generally not doing anything exceptional, all three probabilities seem small… I hesitate to put an exact number on them, but yeah, 1e-6 could be a reasonable value.
Comparing the three options relatively to each other, I think there is no reason why someone would want to distract lesswrong from something. Wanting to erode trust seems unlikely but possible. So the greatest probability of these three would go to painting yourself a victim, because there are people like that out there.
If you made the claim, I would probably add a fourth hypothesis, which would be that you are someone else’s second account; someone who had some kind of conflict with Thane in the past, and that this is some kind of revenge. And of course the fifth hypothesis that the accusation is true. And a sixth hypothesis that the accusation is an exaggeration of something that actually happened.
(The details would depend on the exact accusation and Thane’s reaction. For example, if he confirmed having met you, that would remove the “someone else’s sockpuppet account” option.)
If you made the accusation (without having had this conversation), I would probably put 40% probabilities on “it happened” and “exaggeration”, and 20% on “playing victim”, with the remaining options being relatively negligible, although more likely that if you didn’t make the claim. The exact numbers would probably depend on my current mood, and specific words used.
One of the reasons it’s hard to take the possibility of blatant lies into account is that it would just be so very inconvenient, and also boring.
If someone’s statements are connected to reality, that gives you something to do:
You can analyze them, critique them, use them to infer the person’s underlying models and critique those, identify points of agreement and controversy, identify flaws in their thinking, make predictions and projections about future actions based on them, et cetera. All those activities we love a lot, they’re fun and feel useful.
It also gives you the opportunity to engage, to socialize with the person by arguing with them, or with others by discussing their words (e. g., if it’s a high-profile public person). You can show off your attention to detail and analytical prowess, build reputation and status.
On the other hand, if you assume that someone is lying (in a competent way where you can’t easily identify what are the lies), that gives you… pretty much nothing to do. You’re treating their words as containing ~zero information, so you (1) can’t use them as an excuse to run some fun analyses/projections, (2) can’t use them as an opportunity to socialize and show off. All you can do is stand in the corner repeating the same thing, “this person is probably lying, do not believe them”, over and over again, while others get to have fun. It’s terrible.
Concrete example: Sam Altman. The guy would go on an interview or post some take on Twitter, and people would start breaking what he said down, pointing out what he gets right/wrong, discussing his character and vision and the X-risk implications, etc. And I would listen to the interview and read the analyses, and my main takeaway would be, “man, 90% of this is probably just lies completely decoupled from the underlying reality”. And then I have nothing to do.
Importantly: this potentially creates a community bias in favor of naivete (at least towards public figures). People who believe that Alice is a liar mostly ignore Alice,[1] so all analyses of Alice’s words mostly come from people who put some stock in them. This creates a selection effect where the vast majority of Alice-related discussion is by people who don’t dismiss her words out of hand, which makes it seem as though the community thinks Alice is trustworthy. That (1) skews your model of the community, and (2) may be taken as evidence that Alice is trustworthy by non-informed community members, who would then start trusting her and discussing her words, creating a feedback loop.
Edit: Hm, come to think of it, this point generalizes. Suppose we have two models of some phenomenon, A and B. Under A, the world frequently generates prompts for intelligent discussion, whereas discussion-prompts for B are much sparser. This would create an apparent community bias in favor of A: A-proponents would be generating most of the discussions, raising A’s visibility, and also get more opportunities for raising their own visibility and reputation. Note that this is completely decoupled from whether the aggregate evidence is in favor of A or B; the volume of information generated about A artificially raises its profile.
Example: disagreements regarding whether studying LLMs bears on the question of ASI alignment or not. People who pay attention to the results in that sphere get to have tons of intelligent discussions about an ever-growing pile of experiments and techniques. People who think LLM alignment is irrelevant mostly stay quiet, or retread the same few points they have for the hundredth time.
What else they’re supposed to do? Their only message is “Alice is a liar”, and butting into conversations just to repeat this well-discussed, conversation-killing point wouldn’t feel particularly productive and would quickly start annoying people.
Feeling called out by this relatable content.
Is this true? If I start talking about how Thane Ruthenis sexually assaulted me, that might be true or it might be false. If its true, that tells you something about the world. If its false, the statement doesn’t say anything about the world, but the fact that I said it still does.
Like, it would probably mean I don’t like you, or have some interest in having others not like you.
So like, that I make that statement does contain information. It should raise your p(will was SA’d by than) and p(will doesn’t like than) roughly in ratio proportional with how much you trust me.
Sure, but… I think one important distinction is that lies should not be interpreted as having semantic content. If you think a given text is lying, that means you can’t look at just the text and infer stuff from it, you have to look at the details of the context in which it was generated. And you often don’t have all the details, and the correct inferences often depend on them very precisely, especially in nontrivially complex situations. In those cases, I think lies do contain basically zero information.
For example:
It could mean any of:
You dislike me and want to hurt me, as a terminal goal.
I’m your opponent/competitor and you want to lower my reputation, as an instrumental goal.
You want to paint yourself as a victim because it would be advantageous for you in some upcoming situation.
You want to create community drama in order to distract people from something.
You want to erode trust between members of a community, or make the community look bad to outsiders.
You want to raise the profile of a specific type of discourse.
Etc., etc. If someone doesn’t know the details of the game-theoretic context in which the utterance is emitted, there’s very little they can confidently conclude from it. They can infer that we are probably not allies (although maybe we are colluding, who knows), and that there’s some sort of conflict happening, but that’s it. For all intents and purposes, the statement (or its contents) should be ignored; it could mean almost anything.
(I suppose this also depends on Simulacra Levels. SL2 lies and SL4 lies are fairly different beasts, and the above mostly applies to SL4 lies.)
I think this is being a little uncharitable or obtuse, no offense? Unless I’m misunderstanding, which is possible.
Like the list of stuff you said are extremely specific things. Like if I said the SA thing, maybe you’d update from p(will wants to cause commotion to distract people from something) = 1e-6 (or whatever the time-adjusted base rate is) to 10%. That’s a huge amount of information. Even if your posterior probability is well below 50%.
The fact that a event has “many plausible explanations” doesn’t mean it contains no information. This seems like a very elementary fact to me.
I assume those were not chosen randomly from a large set of possible motivations, but because those options were somehow salient for Thane. So I would guess the priors are higher than 1e-6.
For example, I have high priors on “wants to distract people from something” for politicians, because I have seen it executed successfully a few times. The amateur version is doing it after people notice some bad thing you did, to take attention away from the scandal; the pro version is doing it shortly before you do the bad thing, so that no one even notices it, and if someone does, no one will pay attention because the cool kids are debating something else.
Okay it was a specific (hypothetical) example where I in particular made the false claim.
Whats your current REAL p(williawa
currently wants to cause a commotion to distract lesswrong from something
currently wants to paint himself as a victim for some future gain
wants to erode trust between people on lesswrong
And how would you update* if I started making a as credible case I could about so and so person SA-ing me? How would you update if you were sure I was lying?
I think if you don’t make an update you’re very clearly just being irrational. And I don’t think you have any reason to update differently, or really have different priors, than Thane. I don’t know either of you I don’t think.
So if you’re updating, Thane should be as well, irrespective of the saliency thing.
Conditional on you not making the claim (or before you make the claim) and generally not doing anything exceptional, all three probabilities seem small… I hesitate to put an exact number on them, but yeah, 1e-6 could be a reasonable value.
Comparing the three options relatively to each other, I think there is no reason why someone would want to distract lesswrong from something. Wanting to erode trust seems unlikely but possible. So the greatest probability of these three would go to painting yourself a victim, because there are people like that out there.
If you made the claim, I would probably add a fourth hypothesis, which would be that you are someone else’s second account; someone who had some kind of conflict with Thane in the past, and that this is some kind of revenge. And of course the fifth hypothesis that the accusation is true. And a sixth hypothesis that the accusation is an exaggeration of something that actually happened.
(The details would depend on the exact accusation and Thane’s reaction. For example, if he confirmed having met you, that would remove the “someone else’s sockpuppet account” option.)
If you made the accusation (without having had this conversation), I would probably put 40% probabilities on “it happened” and “exaggeration”, and 20% on “playing victim”, with the remaining options being relatively negligible, although more likely that if you didn’t make the claim. The exact numbers would probably depend on my current mood, and specific words used.