But Have They Engaged With The Arguments? [Linkpost]

Noosphere892 Sep 2025 18:25 UTC

74 points

There’s an interestingly pernicious version of a selection effect that occurs in epistemology, where people can be led into false claims because when non-believers try to engage with arguments, the unconvinced will drop out at random steps, and past a few steps or so, the believers/evangelists who believe in all the arguments will have a secure-feeling position that the arguments are right, and that people who object to the arguments are (insane/ridiculous/obviously trolling), no matter whether the claim is true:

What’s going wrong, I think, is something like this. People encounter uncommonly-believed propositions now and then, like “AI safety research is the most valuable use of philanthropic money and talent in the world” or “Sikhism is true”, and decide whether or not to investigate them further. If they decide to hear out a first round of arguments but don’t find them compelling enough, they drop out of the process. (Let’s say that how compelling an argument seems is its “true strength” plus some random, mean-zero error.) If they do find the arguments compelling enough, they consider further investigation worth their time. They then tell the evangelist (or search engine or whatever) why they still object to the claim, and the evangelist (or whatever) brings a second round of arguments in reply. The process repeats.

As should be clear, this process can, after a few iterations, produce a situation in which most of those who have engaged with the arguments for a claim beyond some depth believe in it. But this is just because of the filtering mechanism: the deeper arguments were only ever exposed to people who were already, coincidentally, persuaded by the initial arguments. If people were chosen at random and forced to hear out all the arguments, most would not be persuaded.

Perhaps more disturbingly, if the case for the claim in question is presented as a long fuzzy inference, with each step seeming plausible on its own, individuals will drop out of the process by rejecting the argument at random steps, each of which most observers would accept. Believers will then be in the extremely secure-feeling position of knowing not only that most people who engage with the arguments are believers, but even that, for any particular skeptic, her particular reason for skepticism seems false to almost everyone who knows its counterargument.

In particular, if we combine this with a heavy tailed distribution of performance at fields, where people have exponential-drop off in intelligence, meaning that a few people matter a lot more in progress than most people, it means that it’s very difficult to distinguish cases where the small/insular group arguing for something extreme relative to their current distribution is correct and everyone else doesn’t get the arguments/data, and the cases where the small group is being fooled by a selection effect and the conclusion is actually false.

I’ll just quote it in full, since there’s no other better way to summarize this/link to it:

Yeah. In science the association with things like scientific output, prizes, things like that, there’s a strong correlation and it seems like an exponential effect. It’s not a binary drop-off. There would be levels at which people cannot learn the relevant fields, they can’t keep the skills in mind faster than they forget them. It’s not a divide where there’s Einstein and the group that is 10 times as populous as that just can’t do it. Or the group that’s 100 times as populous as that suddenly can’t do it. The ability to do the things earlier with less evidence and such falls off at a faster rate in Mathematics and theoretical Physics and such than in most fields.

Yes, people would have discovered general relativity just from the overwhelming data and other people would have done it after Einstein.

No, that intuition is not necessarily correct. Machine learning certainly is an area that rewards ability but it’s also a field where empirics and engineering have been enormously influential. If you’re drawing the correlations compared to theoretical physics and pure mathematics, I think you’ll find a lower correlation with cognitive ability.

There’s obviously implications for our belief in AI risk/AI power in general, but this is pretty applicable to a lot of fields, and probably explains at least some of the skepticism lots of people have towards groups that make weird/surprising/extreme claims (relative to their world model).

What links here?

The main way I’ve seen people turn ideologically crazy [Linkpost] by Noosphere89 (23 Oct 2025 20:09 UTC; 111 points)

Noosphere892 Sep 2025 18:25 UTC

74 points

17 comments2 min readLW link

World Modeling Epistemology Rationality

Raemon 2 Sep 2025 21:55 UTC
16 points
0
This seems like it’s engaging with the question of “what do critics think?” in a sort of model-free, uninformed, “who to defer to” sort of way.
For awhile, I didn’t fully update on arguments for AI Risk being a Big Deal because the arguments were kinda complex and I could imagine clever arguers convincing me of it without it being true. One of the things that updated me over the course of 4 years was actually reading the replies (including by people like Hanson) and thinking “man, they didn’t seem to even understand or address the main points.”
i.e. it’s not that they didn’t engage with the arguments, it’s that they engaged with the arguments badly which lowered my credence on taking their opinion seriously.
(I think nowadays I have seen some critics who do seem to me to have engaged with most of the real points. None of their counterarguments seem like they’ve added up to “AI is not a huge fucking deal that is extremely risky” in a way that makes any sense to me, but, some of them add up to alternate frames of looking at the problem that might shift what is the best thing(s) to do about it)
What links here?
- The main way I’ve seen people turn ideologically crazy [Linkpost] by Noosphere89 (23 Oct 2025 20:09 UTC; 111 points)
- Noosphere89 3 Sep 2025 2:09 UTC
  5 points
  2
  Parent
  ~~I agree that critics engaging with arguments badly is an update towards the arguments being real~~, but I am essentially claiming that the fact that this selection effect exists and is very difficult to eliminate/reduce to a useful level implicitly means that you can only get a very limited amount of evidence from arguments.
  One particular part of my model here is that selection effects are usually very strong and difficult to eliminate by default, unfortunately, and thus one of the central problems of science in general is how to deal with this sort of effect.
  But it’s nice to hear from you on how you’ve come to believe in AI risk being a big deal.
  Edit: I wrote a linkpost on the main way people turn ideologically crazy that explains why you can only get a very limited amount of evidence from arguments, and retracted the statement that “I agree that critics engaging with arguments badly is an update towards the arguments being real”.
  What links here?
  - The main way I’ve seen people turn ideologically crazy [Linkpost] by Noosphere89 (23 Oct 2025 20:09 UTC; 111 points)
james oofou 4 Sep 2025 10:41 UTC
14 points
5
How good is the argument for an AI moratorium? Tools exist which would help us get to the bottom of this question. Obviously, the argument first needs to be laid out clearly. Once we have the argument laid out clearly, we can subject it to the tools of analytic philosophy.
But I’ve looked far-and-wide and, surprisingly, have not found any serious attempt at laying the argument out in a way that makes it easily susceptible to analysis.
Here’s an off-the-cuff attempt:
P1. ASI may not be far off
P2. ASI would be capable of exterminating humanity
P3. We do not know how to create an aligned ASI
P4. If we create ASI before knowing how to align ASI, the ASI will ~certainly be unaligned
P5. Unaligned ASI would decide to exterminate humanity
P6. Humanity being exterminated by ASI would be a bad thing
C. Humanity should implement a moratorium on AI research until we know how to create an aligned ASI
My off-the-cuff formulation of the argument is obviously far too minimal to be helpful. Each premise has a wide literature associated with it and should itself have an argument presented for it (and the phrasing and structure can certainly be refined).
If we had a canonical formulation of the argument for an AI moratorium, the quality of discourse would immediately, immensely improve.
Instead of constantly talking past each other, retreading old ground, and spending large amounts of mental effort just trying to figure out what exactly the argument for a moratorium even is, one can say “my issue is with P6”. Their interlocutor would respond “What’s your issue with the argument for P6?”, and the person would say “Subpremise 4, because it’s question-begging”, and then they are in the perfect position for an actually very productive conversation!
I’m shocked that this project has not already been carried out. I’m happy to lead such a project if anyone wants to fund it.
- StanislavKrym 8 Sep 2025 4:54 UTC
  1 point
  −1
  Parent
  Didn’t The Problem try to do something similar by summarizing the essay in the following five bullet points:
  The summary
  Key points in this document:
  There isn’t a ceiling at human-level capabilities.
  ASI is very likely to exhibit goal-oriented behavior.
  ASI is very likely to pursue the wrong goals.
  It would be lethally dangerous to build ASIs that have the wrong goals.
  Catastrophe can be averted via a sufficiently aggressive policy response.
  Each point is a link to the corresponding section.
  - Noosphere89 8 Sep 2025 20:12 UTC
    3 points
    0
    Parent
    I basically agree with the 1st and 2nd points, somewhat disagree with the 3rd point (I do consider it plausible that ASIs develop goals that are incompatible with human survival, but I don’t think it’s very likely), ~~the 4th point is right but the argument is locally invalid,~~ ~~because processor clock speeds are not how fast AIs think~~, and I basically agree with the point that sufficiently aggressive policy responses can avert catastrophe, but don’t agree with the premise that wait and see is utterly unviable for AI tech, and also disagree with the premise that ASI is a global suicide bomb.
  - james oofou 8 Sep 2025 13:50 UTC
    2 points
    0
    Parent
    Someone approaches you with a question:
    “I have read everything I could find that rationalists have written on AI safety. I came across many interesting ideas, I studied them carefully until I understood them well, and I am convinced that many are correct. Now I’m ready to see how all the pieces fit together to show that an AI moratorium is the correct course of action. To be clear, I don’t mean a document written for the layperson, or any other kind of introductory document. I’m ready for the real stuff now. Show me your actual argument in all its glory. Don’t hold back.”
    After some careful consideration, you:
    (a) helpfully provide a link to A List of Lethalities
    (b) suggest that he read the sequences
    (c) patiently explain that if he was smart enough to understand the argument then he would have already figured it out for himself
    (d) leave him on read
    (e) explain that the real argument was written once, but it has since been taken down, and unfortunately nobody’s gotten around to rehosting it since
    (f) provide a link to a page which presents a sound argument[0] in favour of an AI moratorium
    ===
    Hopefully, the best response here is obvious. But currently no such page exists.
    It’s a stretch to expect to be taken seriously without such a page.
    [0]By this I mean an argument whose premises are all correct and which collectively entail the conclusion that an AI moratorium should be implemented.
    - ceba 25 Oct 2025 3:19 UTC
      1 point
      0
      Parent
      I infer StanislavKrym’s reply isn’t what you’re looking for. Could you explain why? It’s not obvious to me
Noosphere89 2 Sep 2025 19:55 UTC
10 points
5
@1a3orn goes deeper into another dynamic that causes groups to have false beliefs while believing they are true, and it’s the fact that some bullshit beliefs help you figure out who to exclude, which is the people who don’t currently hold the belief, and in particular assholery also helps people who don’t want their claims checked, and it’s a reason I think politeness is actually useful in practice for rationality:
(Sharmake’s first tweet): I wrote something on a general version of this selection effect, and why it’s so hard to evaluate surprising/extreme claims relative to your beliefs, and it’s even harder if we expect heavy-tailed performance, as happens in our universe.
(1a3orn’s claims) This is good. I think another important aspect of the multi-stage dynamic here is that it predicts that movements with *worse* stages at some point have fewer contrary arguments at later points...
...and in this respect is like an advance-fee scam, where deliberately non-credible aspects of the story help filter people early on so that only people apt to buy-in reach later parts.
Paper on Why do Nigerian Scammers Say They are from Nigeria?
So it might be adaptive (survivalwise) for a memeplex to have some bullshit beliefs because the filtering effect of these means that there will be fewer refutations of the rest of the beliefs.
It can also be adaptive (survivalwise) for a leader of some belief system to be abrasive, an asshole, etc, because fewer people will bother reading them ⇒ “wow look how no one can refute my arguments”
(Sharmake’s response) I didn’t cover the case where the belief structure is set up as a scam, and instead focused on where even if we are assuming LWers are trying to get at truth and aren’t adversarial, the very fact that this effect exists combined with heavy-tails makes it hard to evaluate claims.
But good points anyway.
(1a3orn’s final point)
Yeah tbc, I think that if you just blindly run natural selection over belief systems, you get belief systems shaped like this regardless of the intentions of the people inside it. It’s just an effective structure.
Quotes from this tweet thread.
p.b. 3 Sep 2025 8:33 UTC
6 points
1
There’s an interestingly pernicious version of a selection effect that occurs in epistemology, where people can be led into false claims because when people try to engage with arguments, people will drop out at random steps, and past a few steps or so, the people who believe in all the arguments will have a secure-feeling position that the arguments are right, and that people who object to the arguments are (insane/ridiculous/obviously trolling), no matter whether the claim is true:
I find this difficult to parse: people, people, people, people, people.
These seem to be at least three different kind of people: The evangelists, the unconvinced (who drop out) and the believers (who don’t drop out). Not clearly distinguishing between these groups makes the whole post more confusing than necessary.
- Noosphere89 3 Sep 2025 13:52 UTC
  5 points
  0
  Parent
  Rewrote this paragraph to this:
  
  There’s an interestingly pernicious version of a selection effect that occurs in epistemology, where people can be led into false claims because when non-believers try to engage with arguments, the unconvinced will drop out at random steps, and past a few steps or so, the believers/evangelists who believe in all the arguments will have a secure-feeling position that the arguments are right, and that people who object to the arguments are (insane/ridiculous/obviously trolling), no matter whether the claim is true:
DirectedEvolution 3 Sep 2025 3:05 UTC
5 points
1
The majority of those who best know the arguments for and against thinking that a given social movement is the world’s most important cause… are presumably members of that social movement.
Knowing the arguments for and against X being the World’s Most Important Cause (WMIC) is fully compatible with concluding X is not the WMIC, even a priori. And deeply engaging with arguments about any X being the WMIC is an unusual activity, characteristic of Effective Altruism. If you do that activity a lot, then it’s likely you know the arguments for and against many causes, making it unlikely you’re a member of all causes for which you know the arguments for and against.
If they decide to hear out a first round of arguments but don’t find them compelling enough, they drop out of the process.
The simple hurdle model presented by OP implies that there is tremendous leverage in coming up with just one more true argument against a flawed position. Presented with it, a substantial number of the small remaining number of true believers in the flawed position will accept it and change their mind. My perception is that this is not at all what we typically assume when arguing with a true believer in some minority position—we expect that they are especially resistant to changing their mind.
I think a commonsense point of view is that true believers in flawed positions got there under the influence of systematic biases that dramatically increased the likelihood that they would adopt a flawed view. Belief in a range of conspiracy theories and pseudoscientific views appear to be correlated both in social groups and within individuals, which would support the hypothesis of systematic biases accounting for the existence of minority groups holding a common flawed belief. Possibly, their numbers are increased by a few unlucky reasoners who are relatively unbiased but made a series of unfortunate reasoning mistakes, and will hopefully see the light when presented with the next accurate argument.
Noosphere89 7 Sep 2025 23:19 UTC
3 points
0
BTW, this sort of selection effect combined with the massive uncertainties around AI is a big part of why my p(doom) is below 50%, and why I will never be as confident as MIRI as a whole until maybe the last few months.
You could reasonably argue that how to deal with selection effects are the Hamming question for a lot of science fields, and fields that ignore/don’t deal with it will predictably produce lots of nonsense (as seen in nutrition studies/most social science and psychology studies).
- StanislavKrym 8 Sep 2025 6:09 UTC
  8 points
  0
  Parent
  I suspect that selection effects can be dealt with by easy access to a ground truth. One wouldn’t need to be Einstein to calculate how Mercury’s perihelion would behave according to Newton’s theory. In reality the perihelion rotates at a different rate with no classical reason in sight, so Newton’s theory had to be replaced by something else.
  Nutrition studies and psychology studies are likely difficult because they require careful approach to avoid a biased selection of people subjected to the investigation. And social studies are, as their name suggests, supposed to study the evolution of whole societies and find it hard to construct a big data set or to create a control group. In addition, humanities-related fields^[1] could also be easier affected by death spirals.
  Returning to AI alignment, we don’t have any AGIs yet, we only have philosophical arguments which arguably prevent some alignment methods and/or targets^[2] from scaling to the ASI. However, we do have LLMs which we can finetune and whose outputs and chains of thought we can read.
  Studying the LLMs already yields worrying results, including the facts that LLMs engage in things like self-preservation, alignment faking or reward hacking.
  1. ^
    Nutrition studies are supposed to predict how food consumed by a person affects the person’s health. I doubt that one can use such studies for ideology-related goals or that these studies can be affected by a death spiral.
  2. ^
    SOTA discourse around AI alignment assumes that the AI can be aligned to nearly every imaginable target, including ensuring that its hosts grab absolute power and hold it forever without even needing to care about others. Which is amoral, but one conjecture in the AI-2027 forecast has Agent-3 develop moral reasoning.
  What links here?
  - The main way I’ve seen people turn ideologically crazy [Linkpost] by Noosphere89 (23 Oct 2025 20:09 UTC; 111 points)
  - Noosphere89 8 Sep 2025 16:50 UTC
    6 points
    1
    Parent
    For what it’s worth, I agree that empirical results have made me worry more relative to last year, and it’s part of the reason I no longer have p(doom) below 1-5%.
    But there are other important premises which I don’t think are supported well by empirics, and are arguably load-bearing for the confidence that people have.
    One useful example from Paul Christiano is there’s a conflation between solving the alignment problem on the first critical try, and not being able to experiment at all, and while this makes AI governance way harder, it doesn’t make the science problem nearly as difficult:
    Eliezer often equivocates between “you have to get alignment right on the first ‘critical’ try” and “you can’t learn anything about alignment from experimentation and failures before the critical try.” This distinction is very important, and I agree with the former but disagree with the latter. Solving a scientific problem without being able to learn from experiments and failures is incredibly hard. But we will be able to learn a lot about alignment from experiments and trial and error; I think we can get a lot of feedback about what works and deploy more traditional R&D methodology. We have toy models of alignment failures, we have standards for interpretability that we can’t yet meet, and we have theoretical questions we can’t yet answer.. The difference is that reality doesn’t force us to solve the problem, or tell us clearly which analogies are the right ones, and so it’s possible for us to push ahead and build AGI without solving alignment. Overall this consideration seems like it makes the institutional problem vastly harder, but does not have such a large effect on the scientific problem.
    From this list of disagreements
    I mostly agree with the rest of your comment.
Noosphere89 24 Oct 2025 16:57 UTC
2 points
0
I wrote a second linkpost titled the main way I’ve seen people turn ideologically crazy, which responds to @Raemon’s comment and gives practical advice for how to deal with selection effects when you have to ignore most critics, meaning the automatic safeguards against falsehood are deactivated.
Charlie Steiner 3 Sep 2025 2:02 UTC
2 points
0
Great post.
I wanted to pick on the model of “sequentially hear out arguments, then stop when get fed up with one,” but I think it doesn’t make too much difference compared to a more spread-out model where people engage with all the arguments but at different rates, and get fed up globally rather than locally.
ceba 25 Oct 2025 3:08 UTC
1 point
0
I may have experienced this. I was reading a recent discussion about AGI doom, where Eliezer Yudkowsky and others were debating whether one could use aligned human-level AGI to solve alignment before strong ASI is developed.
After reading this thread, I went for a walk and thought about it.
- The no arguments seemed straightforward and elegant in comparison to the yes arguments, which seemed contingent on on lots of little details.
- Straightforward and elegant ideas often represent reality better in my experience. Is that why yes seems more convincing?
- Perhaps instead it’s because the yes arguments fit in my head better
- But didn’t I engage with the arguments? I read them, tried to understand, and remained unconvinced.
I still haven’t resolved this. Did I do the dumb thing?
Discussion heavily using a metaphor about dragons, from the last 3 months, does anyone recall? I looked briefly.