Noah Birnbaum

Karma: 702

I’m a rising junior at the University of Chicago where I co-run the EA group, founded the rationality group, and am studying philosophy and economics/cog sci. I’m largely interested in formal epistemology, metaethics, formal ethics, decision theory, and I have minor interests in a few other areas—I think LessWrong ideas are heavily underrated in philosophy academia, though I have some contentions. I also have a blog where I post about philosophy (and other stuff sometimes) here: https://substack.com/@irrationalitycommunity?utm_source=user-menu.

Noah Birnbaum 31 Mar 2026 5:31 UTC
1 point
2
in reply to: Elizabeth’s comment on: nikola’s Shortform
Im thinking to see it themselves.

Noah Birnbaum 28 Mar 2026 5:45 UTC
3 points
0
in reply to: Thomas Larsen’s comment on: Thomas Larsen’s Shortform
I think these are fair criticisms of the “defer to superforecasters” view (which I share), and I think you helped me clarify some of my views here (thanks!), but I feel like it’s missing a few things. The best case for it, in my view, goes something like this:
The world is very hard to predict, and expertise is often overrated in complex domains.
Your first argument—that superforecasters lack domain-specific track records—doesn’t carry much weight if the relevant forecasting questions require broad expertise across regulation, diffusion dynamics, and technical capabilities simultaneously. No one has a verified track record across all of these, and “domain expert” here often just means “has strong inside views in a complicated area.”
On the selection effect: I don’t buy the strong version of your second claim. The fact that there’s low or no signal in non-verifiable domains (ie. philosophy) doesn’t really vindicate inside views—it weakens both. Any somewhat independent signal aggregated across multiple actors is probably better than a single inside view, even an expert one.
On track record: The benchmark underperformance is a real update against superforecasters in the AI case, but the question is how large that update should be. My framing: superforecasters are the prior; evidence of their underperformance updates you toward domain experts with better track records—and yes, I do think it updates toward people like yourself, Ryan, Eli, Daniel, and Peter Wilderford, who have been more right. But by how much? That’s the crux, and I’d genuinely like to see someone work through the math. A few people being more accurate than superforecasters on a hard problem doesn’t automatically license large updates toward their broader worldviews—we should be asking what reference class of questions they outperformed on, and whether that tracks the specific claims we care about. I’d also note that knowing how benchmarks saturate is less relevant to AI risk than you seem to think—the revenue point is stronger.
I’d also push back on a common error I see: people often conclude that “no clear expert → my inside view gets more weight.” This is probably true at the margin, but massively overrated in practice.
On your argument that object-level reasoning obsoletes base rates: This is somewhat circular. You have inside views about what it means to reason well about AI progress, and superforecasters disagree. You’re partially bootstrapping from your own beliefs to dismiss theirs.
On inside views and group epistemics: I agree that deference cascades are bad, but the fix isn’t “everyone uses their inside view”—it’s that people should be clearer about what’s inside vs. outside view reasoning (I agree this is complicated and maybe idealistic, but I don’t think the default for rationalists here should be to take the inside view of the community). I’m also skeptical that inside-view reasoning escapes the groupthink problem. Epistemic bubbles shape which counterarguments you seek, what your priors are, which information you weight. The AI safety/rationalist community isn’t immune to this.
I do think people should build inside views on AI—and the move of not doing so because it’s “not relevant to my field” is more often cope than a principled stance. But I’m genuinely uncertain about what the right policy is after you’ve built one. Surely the answer isn’t just “act on it fully”—the outside view still has to do some work. One practical resolution: argue on inside view, but take actions that at least partially reflect outside-view uncertainty.
A real remaining question: in non-verifiable domains, who counts as an expert? This is, I think, just an open and hard problem.
Happy to hear counters.

Noah Birnbaum 20 Mar 2026 22:04 UTC
1 point
0
in reply to: Zach Stein-Perlman’s comment on: Zach Stein-Perlman’s Shortform
Credit also matters because it helps us identify which interventions actually produce good outcomes. For example, understanding whether I or 80,000 Hours was responsible for someone in my EA group securing a high-impact job informs where we should allocate resources to most effectively place more people in such roles.

Noah Birnbaum 11 Mar 2026 16:14 UTC
4 points
0
on: Daniel Birnbaum’s Shortform
Experts currently treat being persuaded as reasonably good evidence that something is true — their judgment is calibrated enough that when they find an argument convincing, that’s correlated with the argument actually being correct. This allows them to update readily in light of new evidence, and is a big part of how intellectual progress happens: lots of innovation and advances in basically every subject come down to experts taking sometimes weird new ideas seriously.
One worry I have about superpersuasive AI is that it could erode this. If a superpersuasive AI can convince experts of things regardless of whether those things are true, experts may cease to see themselves being persuaded as good evidence that something is true — and start treating it the way laypeople do. Laypeople are typically hesitant to take on new, truth-tracking beliefs in light of new information, and (to some degree) rationally so: the fact that someone was able to convince a layperson of something is just not very strong evidence that it is in fact true. Experts might end up in the same position — only updating rarely, and in ways that are often unrelated to the truth.
This would be quite bad. If experts lose their capacity to reliably update on genuine evidence, we could significantly slow the rate of intellectual progress (which could be very important for making AI go well!). This is, I think, an underappreciated argument for caring about AI for epistemics — curious what others think.

Noah Birnbaum 27 Feb 2026 6:08 UTC
1 point
0
on: Daniel Birnbaum’s Shortform
A software intelligence explosion might be asymmetrically good for safety if you think safety research is absolute > relative:
Ajeya Cotra and others have argued that ML research being automated first could mean that safety work gets a boost during a software intelligence explosion. The typical worry is that this doesn’t help if capabilities race ahead. But I think the picture is more nuanced depending on how you model safety progress — and the distinction matters not just for intelligence explosion dynamics but for how you allocate resources between safety approaches today.
There are roughly two ways to think about safety difficulty:
- Relative: safety is hard insofar as capabilities keep pulling the goalposts. More capable systems mean harder alignment problems, more deceptive alignment risk, etc.
- Absolute: safety is about hitting some fixed technical bar — interpretability reaching a certain threshold, formal verification of certain properties, etc. — largely independent of how fast capabilities move.
If you lean toward the absolute view, an intelligence explosion looks surprisingly good for safety. You’re not just getting more capable AI systems — you’re getting dramatically better alignment researchers, better interpretability tools, faster progress on whatever technical problems currently bottleneck safety. The “bar” doesn’t move much, but your ability to clear it does.
The obvious counterexample is control, where the relative view pretty clearly dominates. More capable systems have more situational awareness, are better at strategic deception, better at identifying and exploiting oversight gaps. Capability gains directly translate into harder control problems, roughly one-for-one or worse.
Mech interp might be partially in the relative camp too, since understanding what a model is doing gets harder as the model gets better at obfuscating or reasoning in ways that outpace our interpretive tools. This feels less clearly true than for control — it seems to require more situational awareness and steps of reasoning on the model’s part — but it’s an open empirical question worth investigating.
One implication for resource allocation: if you think a given safety approach is more relative, its value degrades as capabilities advance, even if it looks competitive today. Suppose mech interp has a 1% chance of success and control has a 2% chance over the next x years. Control looks better naively — but if the capability-to-safety ratio gets significantly worse over a large portion of that window, and control’s difficulty tracks capabilities more closely, mech interp could still be the better investment. A similar (and somewhat unrelated) dynamic applies when comparing general vs. narrow safety research: even if a more general approach looks worse by current benchmarks, it may dominate if you expect the landscape of future models to shift in ways that erode the value of narrow solutions.
(Open to hearing why this is totally wrong, has been stated many times before, or not useful, ofc)

Noah Birnbaum 25 Feb 2026 22:40 UTC
8 points
1
on: Daniel Birnbaum’s Shortform
This might feel obvious, but I think it’s under-appreciated how much disagreement on AI progress just comes down to priors (in a pretty specific way) rather than object-level reasoning.
I was recently arguing the case for shorter timelines to a friend who leans longer. We kept disagreeing on a surprising number of object-level claims, which was weird because we usually agree more on the kinda stuff we were arguing about.
Then I basically realized what I think was going on: she had a pretty strong prior against what I was saying, and that prior is abstract enough that there’s no clear mechanism by which I can push against it. So whenever I made a good object-level case, she’d just take the other side — not necessarily because her reasons were better all else equal, but because the prior was doing the work underneath without either of us really knowing it.
There’s something clearly rational here that’s kinda unintuitive to get a grip on. If you have a strong prior, and someone makes a persuasive argument against it, but you can’t identify the specific mechanism by which their argument defeats it, you should probably update that the arguments against their case are better than they appear, even if you can’t articulate them yet. From the outside, this totally just looks like motivated reasoning (and often is), but I think it can be pretty importantly different.
The reason this is so hard to disentangle is that (unless your belief web is extremely clear to you, which seems practically impossible) it’s just enormously complicated. Your prior on timelines isn’t an isolate thing — it’s load-bearing for a bunch of downstream beliefs all at once. So the resistance isn’t obviously irrational, it’s more like… the system protecting its own coherence.
I think this means that people should try their best to disentangle whether some object level argument they’re having comes from real object level beliefs or pretty abstract priors (in which case, it seems less worthwhile to press on them).

Noah Birnbaum 5 Feb 2026 18:51 UTC
6 points
0
in reply to: zroe1’s comment on: Preparing for a Warning Shot
Yea. In light of this, someone should start a AIS replication org 👀👀

Noah Birnbaum 23 Jan 2026 17:18 UTC
32 points
−28
on: Daniel Birnbaum’s Shortform
Confidence level: strongly held, mostly opinionated, based on observation of (imo) bad LW norms.
We should stop using the phrase “epistemic status” and start using “confidence level.” In principle, “epistemic status” is meant to convey richer meta-information than confidence alone (i.e. the kind of evidence or how seriously a claim should be taken). In practice, it almost never does—on LW it’s usually just a clunkier way of saying “x confidence.”
If we actually want to convey more with less, we should just say “confidence level” and briefly qualify it with the relevant epistemic details (I.e. “low confidence, based on analogy,” or “high confidence, but mostly theoretical”). That’s clearer, less in-group-y, and lower friction. I think this is a good way to save up some weirdness points.
(Alternatively, one can used “qualified confidence”—a bit more jargony but traded for a bit more accuracy, though I perosonally like confidence level most).

Noah Birnbaum 22 Jan 2026 4:31 UTC
43 points
17
on: Daniel Birnbaum’s Shortform
I was pretty unimpressed with Dario Amodei in the recent conversation with Demis Hassabis at the World Economic Forum about what comes after AGI.
I don’t know how much of this is a publicity thing, but it felt he wasn’t really taking the original reasons for going into AI seriously (i.e. reducing x-risk). The overall message seemed to be “full speed ahead,” mostly justified by some kinda hand-wavy arguments about geopolitics, with the more doomy risks acknowledged only in a pretty hand-wavy way. Bummer.

Noah Birnbaum 27 Dec 2025 20:39 UTC
9 points
0
in reply to: Raemon’s comment on: Opinionated Takes on Meetups Organizing
As someone who runs a (university) rationality group, I am pretty unsure about this point. While we started off being more accommodating to those who haven’t (starting with someone basically summarizing the reading), I feel like—for a reason unbeknownst to me—the standard changed and now everyone does the readings by default. Not accommodating people, then, seems like something that pushes people in the right direction.

Noah Birnbaum 2 Nov 2025 18:22 UTC
5 points
0
on: Emergent Introspective Awareness in Large Language Models
In case anyone wants it, Rob Long wrote an excellent summary and analysis of this paper here.

Noah Birnbaum 31 Oct 2025 19:41 UTC
17 points
7
on: The Memetics of AI Successionism
I appreciate the memetic-evolution framing, but I’m somewhat skeptical of the strong emphasis on tension-reduction as the primary (or even a major) explanatory driver of successionist beliefs. Given that you take successionism to be “false and dangerous,” it seems natural that your preferred explanation foregrounds memetics; but that sits a bit uneasily with the stated goal of analyzing why people hold these views irrespective of their truth value, which you state you’re doing at the beginning.
Even if we bracket the object level, a purely memetic or cognitive-dissonance-based explanation risks drifting into an overly broad epistemic relativism/skepticism. Under many accounts of truth—process reliabilism being one—what makes a belief true is precisely that it’s formed by a reliable process. If we exclude the possibility that people arrive at their views through such processes and instead explain them almost entirely via dissonance-reduction pressures, we risk undermining (almost) all belief formation, not just things like successionism.
There’s a related danger: sociological/memetic explanations of belief formation can easily shade into ad hominem-esque critiques if not handled carefully (of course, ad hominems in some forms—i.e. talking about someones likelihood to get to a true belief—is relevant to evidence, but it’s bad for good epistemic hygiene and discourse). One could tell a similar story about why people believe in, say, AI x-risk—Tyler Cowen has suggested that part of the appeal is the feeling of possessing secret, high-stakes insight. And while this may capture a fragment of the causal picture for some individuals, to me, it’s clearly not the dominant explanation for most thoughtful, epistemically serious people. And if it were the main cause, we would be right to distrust the resulting beliefs, and yet this doesn’t seem particularly more convincing in one case or another as an explanation (unless you already think one is false and one is true).
So while memetic fitness and tension-resolution offer part of an explanation, I’m not convinced they do most of the work for most people. For most, object-level reasoning—about value theory, metaethics, consciousness, agency, and long-run trajectories—plays a substantial role in why they end up where they do. To the extent that successionist ideologies spread, part of that spread will track memetic dynamics, but part will also track genuine and often rigorous attempts to reason about the future of value and the structure of possible worlds.
Curious what people think about this, though, and very open to constructive criticism/I don’t feel very confident about this.

Noah Birnbaum 23 Oct 2025 17:22 UTC
44 points
47
on: Daniel Birnbaum’s Shortform
While I think LW’s epistemic culture is better than most, one thing that seems pretty bad is that occasionally mediocre/shitty posts get lots of upvotes simply because they’re written by [insert popular rationalist thinker].
Of course, if LW were truly meritocratic (which it should be), this shouldn’t matter — but in my experience, it descriptively does.
Without naming anyone (since that would be unproductive), I wanted to know if others notice this too? And aside from simply trying not to upvote something because it’s written by a popular author, anyone have good ideas for preventing this?

Noah Birnbaum 18 Sep 2025 18:56 UTC
12 points
1
on: Daniel Birnbaum’s Shortform
“Albania has introduced its first artificial intelligence “minister”, who addressed parliament on Thursday in a debut speech.” lol, what???
Not sure how much this really matters vs is just a PR thing, but it’s maybe something people on here should know about.

Noah Birnbaum 14 Aug 2025 1:42 UTC
4 points
−4
in reply to: Steven Byrnes’s comment on: Mech Interp Wiki Page and Why You Should Edit Wikipedia
Thanks for the comment!
I hear the critique, but I’m not sure I’m as confident as you are that it’s a good one.
The first reason is that I’m unsure whether the trade-off between credibility for having a wiki page doesn’t outweigh the loss of control.
The second reason is that I don’t really think there is much losing control (minus in extreme cases like you mention) - you can’t be super ideological on wiki sites, minus saying things like “and here’s what critics say”. On that point, I think it’s just pretty important for the standard article on a topic to have critiques of it (as long as they are honest/ good rebuttals, which I’m somewhat confident that the wiki moderators can ensure). Another point on this is that LWers can just be on top of stuff to ensure that the information isn’t clearly outdated or confused.
Curious to hear pushback, though.

Noah Birnbaum 13 Aug 2025 14:46 UTC
1 point
0
in reply to: gjm’s comment on: Mech Interp Wiki Page and Why You Should Edit Wikipedia
Good call—that was from an earlier version of this post.

Noah Birnbaum 28 Jul 2025 20:54 UTC
4 points
1
in reply to: anaguma’s comment on: How to Update If Pre-Training is Dead
Good point, and I think I somewhat agree. If you think we just reach an intelligence explosion at some level (seems pretty plausible), you wouldn’t update to previous pre-training levels because we’d be closer and what really matters is hitting that point (and post-training can possibly take you to that point). While it means that you shouldn’t update towards before pre-training, I still think the general point of being a large update back still stands (perhaps this point—the degree—depends on some other priors, though, which I didn’t want to get into).

Noah Birnbaum 5 Jun 2025 15:41 UTC
1 point
0
on: [Stub] The problem with Chesterton’s Fence
In a similar vain, I’ve always thought Chesterton’s fence reasoning was a bit self defeating — in that, using chestertons fence as a conceptual tool is, in itself, often breaking it. Often people do the tradition thing for cultural, familial, religious reasons. While I understand it’s a heuristic and this doesn’t actually undermine the fence, this seems like an underrated point.

Noah Birnbaum 23 May 2025 5:45 UTC
2 points
0
on: Daniel Birnbaum’s Shortform
I saw this good talk on the Manifest youtube channel about using historical circumstances to calibrate predictions—this seems better for training than regular forecasting because you have faster feedback loop between the prediction and the resolution.
I wanted to know if anyone had recommendations on where to find some software or site where I can do more examples of this (I already know about the estimation game). I would do this myself, but it seems like it would be pretty difficult to do the research on the situation without learning the outcome. I would also appreciate people giving takes about why this might be a bad way to get better at forecasting.

Noah Birnbaum 15 May 2025 5:38 UTC
9 points
0
on: The Intelligence Curse
Here’s an argument against this view—yes, there is some cost associated with helping the citizens of a country and the benefit becomes less great as you become a rentier state. However, while the benefits do go down and economic prosperity becomes greater and greater for the very few due to AGI, the costs of quality life become significantly cheaper to help others in the society. It is not clear that the rate at which the benefits diminish actually outpaces the reduction in costs of helping people.
In response to this, one might be able to say something like regular people become totally obsolete wrt efficiency and the costs, while reduced stay positive. However, this really depends on how you think human psychology works—while some people would turn on humans the second they can, there are likely some people who will just keep being empathetic (perhaps this is merely a vestigial trait from the past, but it irrelevant—the value exists now, and some people might be willing to pay some cost to avoid shaping this value even beyond their own lives). We have a similar situation in our world: namely, animals—while people aren’t motivated to care about animals for power reasons (they could do all the factory farming they want, and it would be better), some still do (I take it that this is a vestigial trait of generalizing empathy to the abstract, but as stated, the description for why this comes to be seems largely irrelevant).
Because of how cheap it is to actually help someone in this world, you may just need one or a few people to care just a little bit about helping people and that could make everyone better off. Given that we have a bunch of vegans now (the equivalent to empathetic but powerful people post AGI), depending on how low the costs are to make lives happy (presumably there is a negative correlation between the costs to make lives better and the inequality of power, money, etc), it might be the case that regular citizens end up pretty alright on the other side.
Curious what people think about this!
Also, many of the links at beginning (YouTube, World Bank, Rentier states, etc) don’t work.