EDIT: Missed Raemon’s reply, I agree with at least the vibe of his comment (it’s a bit stronger than what I’d have said).
Oh huh, kinda surprised my phrasing was stronger than what you’d say.
Getting into a bit from a problem-solving angle, in a “first think about the problem for 5 minutes before proposing solutions” kinda way...
The reasons the problem is hard include:
New people keep coming in, and unless we change something significant about our new-user-acceptance process, it’s often a long progress to enculturate them into even having the belief they should be trying not to get tribally riled up.
Also, a lot of them are weaker at evaluating arguments, and are likely to upvote bad arguments for positions that they just-recently-got-excited-about. (“newly converted” syndrome)
Tribal thinking is just really ingrained, and slippery even for people putting in a moderate effort not to do it.
often, if you run a check “am I being tribal/triggered or do I really endorse this?”, there will be a significant part of you that’s running some kind of real-feeling cognition. So the check “was this justified?” returns “true” unless you’re paying attention to subtleties.”
relatedly: just knowing “I’m being tribal right now, I should avoid it” doesn’t really tell you what to do instead. I notice a comment I dislike because it’s part of a political faction I think is constantly motivatedly wrong about stuff. The comment seems wrong. Do I… not downvote it? Well, I still think it’s a bad comment, it’s just that the reason it flagged itself so hard to my attention is Because Tribalism.
(or, there’s a comment with a mix of good and bad properties. Do I upvote, downvote, or leave it alone? idk. Sometimes when I’m trying to account for tribalness I find myself upvoting stuff I’d ordinarily have passed over because I’m trying to out of my way to be gracious, but I’m not sure if that’s successfully countering a bias or just following a different one. Sometimes this results in mediocre criticism getting upvoted)
There’s some selection effect around “triggeredness” that produces disproportionate conversations. Even if most of the time people are pretty reasonable thinking about reasonable things together, the times people get triggered (politically or otherwise), result in more/bigger/more-self-propagating conversations.
There’s an existing equilibrium where there’s some factions that are upvoting/downvoting each other, and it feels scary to leave something un-upvoted since it might get ~brigaded.
It’s easy for the voting system to handle prominence-of-posts. It’s a lot harder for the voting system to actually handle prominence of comments. In a high-volume comment-thread, every new comment sends a notification to at least the author of the OP and/or previous commenter, and lots of people are just checking often, so even downvoted comments are going to get seen, (and people will worry that they’re get later upvoted by another cluster of people)
(we do currently hide downvoted comments from the homepage in most contexts)
Probably there’s more.
Meanwhile, the knobs to handle this are:
extremely expensive, persistent manual moderation from people (who realistically are going to sometimes be triggered themselves)
try to detect patterns of triggeredness/tribalness, and change something about people’s voting powers or commenting powers, or what comments get displayed.
change something about the UI for upvoting.
The sort of thing that seems like an improvement is changing something about how strong upvotes work, at least in some cases. (i.e. maybe if we detect a thread has fairly obvious factions tug-of-war-ing, we turn off strong-upvoting, or add some kind of cooldown period)
We’ve periodically talked about having strong-upvotes require giving a reason, or otherwise constrained in some way, although I think there was disagreement about whether that’d probably be good or bad.
Oh huh, kinda surprised my phrasing was stronger than what you’d say.
Idk the “two monkey chieftains” is just very… strong, as a frame. Like of course #NotAllResearchers, and in reality even for a typical case there’s going to be some mix of object-level-epistemically-valid reasoning along with social-monkey reasoning, and so on.
Also, you both get many more observations than I do (by virtue of being in the Bay Area) and are paying more attention to extracting evidence / updates out of those observations around the social reality of AI safety research. I could believe that you’re correct, I don’t have anything to contradict it, I just haven’t looked enough detail to come to that conclusion myself.
Tribal thinking is just really ingrained
This might be true but feels less like the heart of the problem. Imo the bigger deal is more like trapped priors:
The basic idea of a trapped prior is purely epistemic. It can happen (in theory) even in someone who doesn’t feel emotions at all. If you gather sufficient evidence that there are no polar bears near you, and your algorithm for combining prior with new experience is just a little off, then you can end up rejecting all apparent evidence of polar bears as fake, and trapping your anti-polar-bear prior.
A person on either “side” certainly feels like they have sufficient evidence / arguments for their position (and can often list them out in detail, so it’s not pure self-deception). So premise #1 is usually satisfied.
There are tons of ways that the algorithm for combining prior with new experience can be “just a little off” to satisfy premise #2:
When you read a new post, if it’s by “your” side everything feels consistent with your worldview so you don’t notice all the ways it is locally invalid, whereas if it’s by the “other” side you intuitively notice a wrong conclusion (because it conflicts with your worldview) which then causes you to find the places where it is locally invalid.[1] If you aren’t correcting for this, your prior will be trapped.
(More broadly I think LWers greatly underestimate the extent to which almost all reasoning is locally logically invalid, and how much you have to evaluate arguments based on their context.[2])
Even when you do notice a local invalidity in “your” side, it is easy enough for you to repair so it doesn’t change your view. But if you notice a local invalidity in “their” side, you don’t know how to repair it and so it seems like a gaping hole. If you aren’t correcting for this, your prior will be trapped.[3]
When someone points out a counterargument, you note that there’s a clear slight change in position that averts the counterargument, without checking whether this should change confidence overall.[4]
The sides have different epistemic norms, so it is just actually true that the “other” side has more epistemic issues as-evaluated-by-your-norms than “your” side. If you aren’t correcting for this, your prior will be trapped.
I don’t quite know enough to pin down what the differences are, but maybe something like: “pessimists” care a lot more about precision of words and logical local validity, whereas “optimists” care a lot more about the thrust of an argument and accepted general best practices even if you can’t explain exactly how they’re compatible with Bayesianism. Idk I feel like this is not correct.
I think this is the (much bigger) challenge that you’d want to try to solve. For example, I think LW curation decisions are systematically biased for these reasons, and that likely contributes substantially to the problem with LW group epistemics.
Given that, what kinds of solutions would I be thinking about?
Partial solution from academia: there are norms restricting people’s (influential) opinions to their domain of expertise. This creates a filter where the opinions you care about are much more likely to be the result of deep engagement with details on a given topic, and so are more likely to be correct. (Relatedly, my biggest critique of individual LW epistemics is a lack of respect for how much details matter.)
Partial solution from academia: procedural norms around what evidence you have to show for something to become “accepted knowledge” (typically enforced via peer review).[5]
For curation in particular: get some “optimists” to feed into curation decisions. (Buck, Ryan, and Lukas all seem like potential candidates, seeing as they aren’t as pessimistic as me and at least Buck + Ryan already put some effort into LW group epistemics.)
Tbc I also believe that there’s lots of straightforwardly tribal thinking going on.[6] People also mindkill themselves in ways that make them less capable of reasoning clearly.[7] But it doesn’t feel as necessary to solve. If you had a not-that-large set of good thinking going on, that feels like it could be enough (e.g. Alignment Forum at time of launch). Just let the tribes keep on tribe-ing and mostly ignore them.
I guess all of this is somewhat in conflict with my original position that sensationalism bias is a big deal for LW group epistemics. Whoops, sorry. I do still think sensationalism and tribalism biases are a big deal but on reflection I think trapped priors are a bigger deal and more of my reason for overall pessimism.
Though for sensationalism / tribalism I’d personally consider solutions as drastic as “get rid of the karma system, accept lower motivation for users to produce content, figure out something else for identifying which posts should be surfaced to readers (maybe an LLM-based system can do a decent job)” and “much stronger moderation of tribal comments, including e.g. deleting highly-upvoted EY comments that are too combative / dismissive”.
For example, I think this post against counting arguments reads as though the authors noticed a local invalidity in a counting argument, then commenters on the early draft pointed out that of course there was a dependence on simplicity that most people could infer from context, and then the authors threw some FUD on simplicity. (To be clear, I endorse some of the arguments in that post, and not others, do not take this as me disendorsing that post entirely.)
Habryka’s commentary here seems like an example, where the literal wording of Zach’s tweet is clearly locally invalid, but I naturally read Zach’s tweet as “they’re wrong about doom being inevitable [if anyone builds it]”. (I agree it would have been better for Zach to be clearer there, but Habryka’s critique seems way too strong.)
For example, when reading the Asterisk review of IABIED (not the LW comments, the original review on Asterisk), I noticed that the review was locally incorrect because the IABIED authors don’t consider an intelligence explosion to be necessary for doom, but also I could immediately repair it to “it’s not clear why these arguments should make you confident in doom if you don’t have a very fast takeoff” (that being my position). (Tbc I haven’t read IABIED, I just know the authors’ arguments well enough to predict what the book would say.) But I expect people on the “MIRI side” would mostly note “incorrect” and fail to predict the repair. (The in-depth review, which presumably involved many hours of thought, does get as far as noting that probably Clara thinks that FOOM is needed to justify “you only get one shot”, but doesn’t really go into any depth or figure out what the repair would actually be.)
As a possible example, MacAskill quotes PC’s summary of EY as “you can’t learn anything about alignment from experimentation and failures before the critical try” but I think EY’s position is closer to “you can’t learn enough about alignment from experimentation and failures before the critical try”. Similarly see this tweet. I certainly believe that EY’s position is that you can’t learn enough, but did the author actually reflect on the various hopes for learning about alignment from experimentation and failures and updated their own beliefs, or did they note that there’s a clear rebuttal and then stopped thinking? (I legitimately don’t know tbc; though I’m happy to claim that often it’s more like the latter even if I don’t know in any individual case.)
During my PhD I was consistently irritated by how often peer reviewers would just completely fail to be moved by a conceptual argument. But arguably this is a feature, not a bug, along the lines of epistemic learned helplessness; if you stick to high standards of evidence that have worked well enough in the past, you’ll miss out on some real knowledge but you will be massively more resistant to incorrect-but-convincing arguments.
For example, “various readers may be less cautious/paranoid/afraid than me, and think that it’s worth some risk of killing every child on Earth (and everyone else) to get progress faster or to avoid the costs of getting everyone to go slow”. If you are arguing for > 90% doom “if anyone builds it”, you don’t need rhetorical jujitsu like this! (And in fact my sense is that many of the MIRI team who aren’t EY/NS equivocate a lot between “what’s needed for < 90% doom” and “what’s needed for < 1% doom”, though I’m not going to defend this claim. Seems like the sort of thing that could happen if you mindkill yourself this way.)
This is the most compelling version of “trapped priors” I’ve seen. I agreed with Anna’s comment on the original post, but the mechanisms here make sense to me as something that would mess a lot with updating. (Though it seems different enough from the very bayes-focused analysis in the original post that I’m not sure it’s referring to the same thing.)
Yeah, I agree with “trapped priors” being a major problem.
The solution this angle brings to mind is more like “subsidize comments/posts that do a good job of presenting counterarguments in a way that is less triggering / feeding into the toxoplasma”.
Making a comment on solutions to the epistemic problems, in that I agree with these solutions:
Partial solution from academia: there are norms restricting people’s (influential) opinions to their domain of expertise. This creates a filter where the opinions you care about are much more likely to be the result of deep engagement with details on a given topic, and so are more likely to be correct. (Relatedly, my biggest critique of individual LW epistemics is a lack of respect for how much details matter.)
For curation in particular: get some “optimists” to feed into curation decisions. (Buck, Ryan, and Lukas all seem like potential candidates, seeing as they aren’t as pessimistic as me and at least Buck + Ryan already put some effort into LW group epistemics.)
But massively disagree with this solution:
Partial solution from academia: procedural norms around what evidence you have to show for something to become “accepted knowledge” (typically enforced via peer review).[5]
Oh huh, kinda surprised my phrasing was stronger than what you’d say.
Getting into a bit from a problem-solving angle, in a “first think about the problem for 5 minutes before proposing solutions” kinda way...
The reasons the problem is hard include:
New people keep coming in, and unless we change something significant about our new-user-acceptance process, it’s often a long progress to enculturate them into even having the belief they should be trying not to get tribally riled up.
Also, a lot of them are weaker at evaluating arguments, and are likely to upvote bad arguments for positions that they just-recently-got-excited-about. (“newly converted” syndrome)
Tribal thinking is just really ingrained, and slippery even for people putting in a moderate effort not to do it.
often, if you run a check “am I being tribal/triggered or do I really endorse this?”, there will be a significant part of you that’s running some kind of real-feeling cognition. So the check “was this justified?” returns “true” unless you’re paying attention to subtleties.”
relatedly: just knowing “I’m being tribal right now, I should avoid it” doesn’t really tell you what to do instead. I notice a comment I dislike because it’s part of a political faction I think is constantly motivatedly wrong about stuff. The comment seems wrong. Do I… not downvote it? Well, I still think it’s a bad comment, it’s just that the reason it flagged itself so hard to my attention is Because Tribalism.
(or, there’s a comment with a mix of good and bad properties. Do I upvote, downvote, or leave it alone? idk. Sometimes when I’m trying to account for tribalness I find myself upvoting stuff I’d ordinarily have passed over because I’m trying to out of my way to be gracious, but I’m not sure if that’s successfully countering a bias or just following a different one. Sometimes this results in mediocre criticism getting upvoted)
There’s some selection effect around “triggeredness” that produces disproportionate conversations. Even if most of the time people are pretty reasonable thinking about reasonable things together, the times people get triggered (politically or otherwise), result in more/bigger/more-self-propagating conversations.
There’s an existing equilibrium where there’s some factions that are upvoting/downvoting each other, and it feels scary to leave something un-upvoted since it might get ~brigaded.
It’s easy for the voting system to handle prominence-of-posts. It’s a lot harder for the voting system to actually handle prominence of comments. In a high-volume comment-thread, every new comment sends a notification to at least the author of the OP and/or previous commenter, and lots of people are just checking often, so even downvoted comments are going to get seen, (and people will worry that they’re get later upvoted by another cluster of people)
(we do currently hide downvoted comments from the homepage in most contexts)
Probably there’s more.
Meanwhile, the knobs to handle this are:
extremely expensive, persistent manual moderation from people (who realistically are going to sometimes be triggered themselves)
try to detect patterns of triggeredness/tribalness, and change something about people’s voting powers or commenting powers, or what comments get displayed.
change something about the UI for upvoting.
The sort of thing that seems like an improvement is changing something about how strong upvotes work, at least in some cases. (i.e. maybe if we detect a thread has fairly obvious factions tug-of-war-ing, we turn off strong-upvoting, or add some kind of cooldown period)
We’ve periodically talked about having strong-upvotes require giving a reason, or otherwise constrained in some way, although I think there was disagreement about whether that’d probably be good or bad.
Idk the “two monkey chieftains” is just very… strong, as a frame. Like of course #NotAllResearchers, and in reality even for a typical case there’s going to be some mix of object-level-epistemically-valid reasoning along with social-monkey reasoning, and so on.
Also, you both get many more observations than I do (by virtue of being in the Bay Area) and are paying more attention to extracting evidence / updates out of those observations around the social reality of AI safety research. I could believe that you’re correct, I don’t have anything to contradict it, I just haven’t looked enough detail to come to that conclusion myself.
This might be true but feels less like the heart of the problem. Imo the bigger deal is more like trapped priors:
A person on either “side” certainly feels like they have sufficient evidence / arguments for their position (and can often list them out in detail, so it’s not pure self-deception). So premise #1 is usually satisfied.
There are tons of ways that the algorithm for combining prior with new experience can be “just a little off” to satisfy premise #2:
When you read a new post, if it’s by “your” side everything feels consistent with your worldview so you don’t notice all the ways it is locally invalid, whereas if it’s by the “other” side you intuitively notice a wrong conclusion (because it conflicts with your worldview) which then causes you to find the places where it is locally invalid.[1] If you aren’t correcting for this, your prior will be trapped.
(More broadly I think LWers greatly underestimate the extent to which almost all reasoning is locally logically invalid, and how much you have to evaluate arguments based on their context.[2])
Even when you do notice a local invalidity in “your” side, it is easy enough for you to repair so it doesn’t change your view. But if you notice a local invalidity in “their” side, you don’t know how to repair it and so it seems like a gaping hole. If you aren’t correcting for this, your prior will be trapped.[3]
When someone points out a counterargument, you note that there’s a clear slight change in position that averts the counterargument, without checking whether this should change confidence overall.[4]
The sides have different epistemic norms, so it is just actually true that the “other” side has more epistemic issues as-evaluated-by-your-norms than “your” side. If you aren’t correcting for this, your prior will be trapped.
I don’t quite know enough to pin down what the differences are, but maybe something like: “pessimists” care a lot more about precision of words and logical local validity, whereas “optimists” care a lot more about the thrust of an argument and accepted general best practices even if you can’t explain exactly how they’re compatible with Bayesianism. Idk I feel like this is not correct.
I think this is the (much bigger) challenge that you’d want to try to solve. For example, I think LW curation decisions are systematically biased for these reasons, and that likely contributes substantially to the problem with LW group epistemics.
Given that, what kinds of solutions would I be thinking about?
Partial solution from academia: there are norms restricting people’s (influential) opinions to their domain of expertise. This creates a filter where the opinions you care about are much more likely to be the result of deep engagement with details on a given topic, and so are more likely to be correct. (Relatedly, my biggest critique of individual LW epistemics is a lack of respect for how much details matter.)
Partial solution from academia: procedural norms around what evidence you have to show for something to become “accepted knowledge” (typically enforced via peer review).[5]
For curation in particular: get some “optimists” to feed into curation decisions. (Buck, Ryan, and Lukas all seem like potential candidates, seeing as they aren’t as pessimistic as me and at least Buck + Ryan already put some effort into LW group epistemics.)
Tbc I also believe that there’s lots of straightforwardly tribal thinking going on.[6] People also mindkill themselves in ways that make them less capable of reasoning clearly.[7] But it doesn’t feel as necessary to solve. If you had a not-that-large set of good thinking going on, that feels like it could be enough (e.g. Alignment Forum at time of launch). Just let the tribes keep on tribe-ing and mostly ignore them.
I guess all of this is somewhat in conflict with my original position that sensationalism bias is a big deal for LW group epistemics. Whoops, sorry. I do still think sensationalism and tribalism biases are a big deal but on reflection I think trapped priors are a bigger deal and more of my reason for overall pessimism.
Though for sensationalism / tribalism I’d personally consider solutions as drastic as “get rid of the karma system, accept lower motivation for users to produce content, figure out something else for identifying which posts should be surfaced to readers (maybe an LLM-based system can do a decent job)” and “much stronger moderation of tribal comments, including e.g. deleting highly-upvoted EY comments that are too combative / dismissive”.
For example, I think this post against counting arguments reads as though the authors noticed a local invalidity in a counting argument, then commenters on the early draft pointed out that of course there was a dependence on simplicity that most people could infer from context, and then the authors threw some FUD on simplicity. (To be clear, I endorse some of the arguments in that post, and not others, do not take this as me disendorsing that post entirely.)
Habryka’s commentary here seems like an example, where the literal wording of Zach’s tweet is clearly locally invalid, but I naturally read Zach’s tweet as “they’re wrong about doom being inevitable [if anyone builds it]”. (I agree it would have been better for Zach to be clearer there, but Habryka’s critique seems way too strong.)
For example, when reading the Asterisk review of IABIED (not the LW comments, the original review on Asterisk), I noticed that the review was locally incorrect because the IABIED authors don’t consider an intelligence explosion to be necessary for doom, but also I could immediately repair it to “it’s not clear why these arguments should make you confident in doom if you don’t have a very fast takeoff” (that being my position). (Tbc I haven’t read IABIED, I just know the authors’ arguments well enough to predict what the book would say.) But I expect people on the “MIRI side” would mostly note “incorrect” and fail to predict the repair. (The in-depth review, which presumably involved many hours of thought, does get as far as noting that probably Clara thinks that FOOM is needed to justify “you only get one shot”, but doesn’t really go into any depth or figure out what the repair would actually be.)
As a possible example, MacAskill quotes PC’s summary of EY as “you can’t learn anything about alignment from experimentation and failures before the critical try” but I think EY’s position is closer to “you can’t learn enough about alignment from experimentation and failures before the critical try”. Similarly see this tweet. I certainly believe that EY’s position is that you can’t learn enough, but did the author actually reflect on the various hopes for learning about alignment from experimentation and failures and updated their own beliefs, or did they note that there’s a clear rebuttal and then stopped thinking? (I legitimately don’t know tbc; though I’m happy to claim that often it’s more like the latter even if I don’t know in any individual case.)
During my PhD I was consistently irritated by how often peer reviewers would just completely fail to be moved by a conceptual argument. But arguably this is a feature, not a bug, along the lines of epistemic learned helplessness; if you stick to high standards of evidence that have worked well enough in the past, you’ll miss out on some real knowledge but you will be massively more resistant to incorrect-but-convincing arguments.
I was especially unimpressed about “enforcing norms on” (ie threatening) people if they don’t take the tribal action.
For example, “various readers may be less cautious/paranoid/afraid than me, and think that it’s worth some risk of killing every child on Earth (and everyone else) to get progress faster or to avoid the costs of getting everyone to go slow”. If you are arguing for > 90% doom “if anyone builds it”, you don’t need rhetorical jujitsu like this! (And in fact my sense is that many of the MIRI team who aren’t EY/NS equivocate a lot between “what’s needed for < 90% doom” and “what’s needed for < 1% doom”, though I’m not going to defend this claim. Seems like the sort of thing that could happen if you mindkill yourself this way.)
This is the most compelling version of “trapped priors” I’ve seen. I agreed with Anna’s comment on the original post, but the mechanisms here make sense to me as something that would mess a lot with updating. (Though it seems different enough from the very bayes-focused analysis in the original post that I’m not sure it’s referring to the same thing.)
Yeah, I agree with “trapped priors” being a major problem.
The solution this angle brings to mind is more like “subsidize comments/posts that do a good job of presenting counterarguments in a way that is less triggering / feeding into the toxoplasma”.
Making a comment on solutions to the epistemic problems, in that I agree with these solutions:
But massively disagree with this solution:
My general issue here is that peer review doesn’t work nearly as well as people think it does for catching problems, and in particular I think that science is advanced much more by the best theories gaining into prominence rather than suppressing the worst theories, and problems with bad theories taking up too much space are much better addressed at the funding level than the theory level.