I’m a rising junior at the University of Chicago where I co-run the EA group, founded the rationality group, and am studying philosophy and economics/cog sci. I’m largely interested in formal epistemology, metaethics, formal ethics, decision theory, and I have minor interests in a few other areas—I think LessWrong ideas are heavily underrated in philosophy academia, though I have some contentions. I also have a blog where I post about philosophy (and other stuff sometimes) here: https://substack.com/@irrationalitycommunity?utm_source=user-menu.
Noah Birnbaum
Experts currently treat being persuaded as reasonably good evidence that something is true — their judgment is calibrated enough that when they find an argument convincing, that’s correlated with the argument actually being correct. This allows them to update readily in light of new evidence, and is a big part of how intellectual progress happens: lots of innovation and advances in basically every subject come down to experts taking sometimes weird new ideas seriously.
One worry I have about superpersuasive AI is that it could erode this. If a superpersuasive AI can convince experts of things regardless of whether those things are true, experts may cease to see themselves being persuaded as good evidence that something is true — and start treating it the way laypeople do. Laypeople are typically hesitant to take on new, truth-tracking beliefs in light of new information, and (to some degree) rationally so: the fact that someone was able to convince a layperson of something is just not very strong evidence that it is in fact true. Experts might end up in the same position — only updating rarely, and in ways that are often unrelated to the truth.
This would be quite bad. If experts lose their capacity to reliably update on genuine evidence, we could significantly slow the rate of intellectual progress (which could be very important for making AI go well!). This is, I think, an underappreciated argument for caring about AI for epistemics — curious what others think.
A software intelligence explosion might be asymmetrically good for safety if you think safety research is absolute > relative:
Ajeya Cotra and others have argued that ML research being automated first could mean that safety work gets a boost during a software intelligence explosion. The typical worry is that this doesn’t help if capabilities race ahead. But I think the picture is more nuanced depending on how you model safety progress — and the distinction matters not just for intelligence explosion dynamics but for how you allocate resources between safety approaches today.
There are roughly two ways to think about safety difficulty:
Relative: safety is hard insofar as capabilities keep pulling the goalposts. More capable systems mean harder alignment problems, more deceptive alignment risk, etc.
Absolute: safety is about hitting some fixed technical bar — interpretability reaching a certain threshold, formal verification of certain properties, etc. — largely independent of how fast capabilities move.
If you lean toward the absolute view, an intelligence explosion looks surprisingly good for safety. You’re not just getting more capable AI systems — you’re getting dramatically better alignment researchers, better interpretability tools, faster progress on whatever technical problems currently bottleneck safety. The “bar” doesn’t move much, but your ability to clear it does.
The obvious counterexample is control, where the relative view pretty clearly dominates. More capable systems have more situational awareness, are better at strategic deception, better at identifying and exploiting oversight gaps. Capability gains directly translate into harder control problems, roughly one-for-one or worse.
Mech interp might be partially in the relative camp too, since understanding what a model is doing gets harder as the model gets better at obfuscating or reasoning in ways that outpace our interpretive tools. This feels less clearly true than for control — it seems to require more situational awareness and steps of reasoning on the model’s part — but it’s an open empirical question worth investigating.
One implication for resource allocation: if you think a given safety approach is more relative, its value degrades as capabilities advance, even if it looks competitive today. Suppose mech interp has a 1% chance of success and control has a 2% chance over the next x years. Control looks better naively — but if the capability-to-safety ratio gets significantly worse over a large portion of that window, and control’s difficulty tracks capabilities more closely, mech interp could still be the better investment. A similar (and somewhat unrelated) dynamic applies when comparing general vs. narrow safety research: even if a more general approach looks worse by current benchmarks, it may dominate if you expect the landscape of future models to shift in ways that erode the value of narrow solutions.
(Open to hearing why this is totally wrong, has been stated many times before, or not useful, ofc)
This might feel obvious, but I think it’s under-appreciated how much disagreement on AI progress just comes down to priors (in a pretty specific way) rather than object-level reasoning.
I was recently arguing the case for shorter timelines to a friend who leans longer. We kept disagreeing on a surprising number of object-level claims, which was weird because we usually agree more on the kinda stuff we were arguing about.
Then I basically realized what I think was going on: she had a pretty strong prior against what I was saying, and that prior is abstract enough that there’s no clear mechanism by which I can push against it. So whenever I made a good object-level case, she’d just take the other side — not necessarily because her reasons were better all else equal, but because the prior was doing the work underneath without either of us really knowing it.
There’s something clearly rational here that’s kinda unintuitive to get a grip on. If you have a strong prior, and someone makes a persuasive argument against it, but you can’t identify the specific mechanism by which their argument defeats it, you should probably update that the arguments against their case are better than they appear, even if you can’t articulate them yet. From the outside, this totally just looks like motivated reasoning (and often is), but I think it can be pretty importantly different.
The reason this is so hard to disentangle is that (unless your belief web is extremely clear to you, which seems practically impossible) it’s just enormously complicated. Your prior on timelines isn’t an isolate thing — it’s load-bearing for a bunch of downstream beliefs all at once. So the resistance isn’t obviously irrational, it’s more like… the system protecting its own coherence.
I think this means that people should try their best to disentangle whether some object level argument they’re having comes from real object level beliefs or pretty abstract priors (in which case, it seems less worthwhile to press on them).
Yea. In light of this, someone should start a AIS replication org 👀👀
Confidence level: strongly held, mostly opinionated, based on observation of (imo) bad LW norms.
We should stop using the phrase “epistemic status” and start using “confidence level.” In principle, “epistemic status” is meant to convey richer meta-information than confidence alone (i.e. the kind of evidence or how seriously a claim should be taken). In practice, it almost never does—on LW it’s usually just a clunkier way of saying “x confidence.”
If we actually want to convey more with less, we should just say “confidence level” and briefly qualify it with the relevant epistemic details (I.e. “low confidence, based on analogy,” or “high confidence, but mostly theoretical”). That’s clearer, less in-group-y, and lower friction. I think this is a good way to save up some weirdness points.
(Alternatively, one can used “qualified confidence”—a bit more jargony but traded for a bit more accuracy, though I perosonally like confidence level most).
I was pretty unimpressed with Dario Amodei in the recent conversation with Demis Hassabis at the World Economic Forum about what comes after AGI.
I don’t know how much of this is a publicity thing, but it felt he wasn’t really taking the original reasons for going into AI seriously (i.e. reducing x-risk). The overall message seemed to be “full speed ahead,” mostly justified by some kinda hand-wavy arguments about geopolitics, with the more doomy risks acknowledged only in a pretty hand-wavy way. Bummer.
As someone who runs a (university) rationality group, I am pretty unsure about this point. While we started off being more accommodating to those who haven’t (starting with someone basically summarizing the reading), I feel like—for a reason unbeknownst to me—the standard changed and now everyone does the readings by default. Not accommodating people, then, seems like something that pushes people in the right direction.
In case anyone wants it, Rob Long wrote an excellent summary and analysis of this paper here.
I appreciate the memetic-evolution framing, but I’m somewhat skeptical of the strong emphasis on tension-reduction as the primary (or even a major) explanatory driver of successionist beliefs. Given that you take successionism to be “false and dangerous,” it seems natural that your preferred explanation foregrounds memetics; but that sits a bit uneasily with the stated goal of analyzing why people hold these views irrespective of their truth value, which you state you’re doing at the beginning.
Even if we bracket the object level, a purely memetic or cognitive-dissonance-based explanation risks drifting into an overly broad epistemic relativism/skepticism. Under many accounts of truth—process reliabilism being one—what makes a belief true is precisely that it’s formed by a reliable process. If we exclude the possibility that people arrive at their views through such processes and instead explain them almost entirely via dissonance-reduction pressures, we risk undermining (almost) all belief formation, not just things like successionism.
There’s a related danger: sociological/memetic explanations of belief formation can easily shade into ad hominem-esque critiques if not handled carefully (of course, ad hominems in some forms—i.e. talking about someones likelihood to get to a true belief—is relevant to evidence, but it’s bad for good epistemic hygiene and discourse). One could tell a similar story about why people believe in, say, AI x-risk—Tyler Cowen has suggested that part of the appeal is the feeling of possessing secret, high-stakes insight. And while this may capture a fragment of the causal picture for some individuals, to me, it’s clearly not the dominant explanation for most thoughtful, epistemically serious people. And if it were the main cause, we would be right to distrust the resulting beliefs, and yet this doesn’t seem particularly more convincing in one case or another as an explanation (unless you already think one is false and one is true).
So while memetic fitness and tension-resolution offer part of an explanation, I’m not convinced they do most of the work for most people. For most, object-level reasoning—about value theory, metaethics, consciousness, agency, and long-run trajectories—plays a substantial role in why they end up where they do. To the extent that successionist ideologies spread, part of that spread will track memetic dynamics, but part will also track genuine and often rigorous attempts to reason about the future of value and the structure of possible worlds.
Curious what people think about this, though, and very open to constructive criticism/I don’t feel very confident about this.
While I think LW’s epistemic culture is better than most, one thing that seems pretty bad is that occasionally mediocre/shitty posts get lots of upvotes simply because they’re written by [insert popular rationalist thinker].
Of course, if LW were truly meritocratic (which it should be), this shouldn’t matter — but in my experience, it descriptively does.
Without naming anyone (since that would be unproductive), I wanted to know if others notice this too? And aside from simply trying not to upvote something because it’s written by a popular author, anyone have good ideas for preventing this?
“Albania has introduced its first artificial intelligence “minister”, who addressed parliament on Thursday in a debut speech.” lol, what???
Not sure how much this really matters vs is just a PR thing, but it’s maybe something people on here should know about.
Thanks for the comment!
I hear the critique, but I’m not sure I’m as confident as you are that it’s a good one.
The first reason is that I’m unsure whether the trade-off between credibility for having a wiki page doesn’t outweigh the loss of control.
The second reason is that I don’t really think there is much losing control (minus in extreme cases like you mention) - you can’t be super ideological on wiki sites, minus saying things like “and here’s what critics say”. On that point, I think it’s just pretty important for the standard article on a topic to have critiques of it (as long as they are honest/ good rebuttals, which I’m somewhat confident that the wiki moderators can ensure). Another point on this is that LWers can just be on top of stuff to ensure that the information isn’t clearly outdated or confused.
Curious to hear pushback, though.
Good call—that was from an earlier version of this post.
Good point, and I think I somewhat agree. If you think we just reach an intelligence explosion at some level (seems pretty plausible), you wouldn’t update to previous pre-training levels because we’d be closer and what really matters is hitting that point (and post-training can possibly take you to that point). While it means that you shouldn’t update towards before pre-training, I still think the general point of being a large update back still stands (perhaps this point—the degree—depends on some other priors, though, which I didn’t want to get into).
In a similar vain, I’ve always thought Chesterton’s fence reasoning was a bit self defeating — in that, using chestertons fence as a conceptual tool is, in itself, often breaking it. Often people do the tradition thing for cultural, familial, religious reasons. While I understand it’s a heuristic and this doesn’t actually undermine the fence, this seems like an underrated point.
I saw this good talk on the Manifest youtube channel about using historical circumstances to calibrate predictions—this seems better for training than regular forecasting because you have faster feedback loop between the prediction and the resolution.
I wanted to know if anyone had recommendations on where to find some software or site where I can do more examples of this (I already know about the estimation game). I would do this myself, but it seems like it would be pretty difficult to do the research on the situation without learning the outcome. I would also appreciate people giving takes about why this might be a bad way to get better at forecasting.
Here’s an argument against this view—yes, there is some cost associated with helping the citizens of a country and the benefit becomes less great as you become a rentier state. However, while the benefits do go down and economic prosperity becomes greater and greater for the very few due to AGI, the costs of quality life become significantly cheaper to help others in the society. It is not clear that the rate at which the benefits diminish actually outpaces the reduction in costs of helping people.
In response to this, one might be able to say something like regular people become totally obsolete wrt efficiency and the costs, while reduced stay positive. However, this really depends on how you think human psychology works—while some people would turn on humans the second they can, there are likely some people who will just keep being empathetic (perhaps this is merely a vestigial trait from the past, but it irrelevant—the value exists now, and some people might be willing to pay some cost to avoid shaping this value even beyond their own lives). We have a similar situation in our world: namely, animals—while people aren’t motivated to care about animals for power reasons (they could do all the factory farming they want, and it would be better), some still do (I take it that this is a vestigial trait of generalizing empathy to the abstract, but as stated, the description for why this comes to be seems largely irrelevant).
Because of how cheap it is to actually help someone in this world, you may just need one or a few people to care just a little bit about helping people and that could make everyone better off. Given that we have a bunch of vegans now (the equivalent to empathetic but powerful people post AGI), depending on how low the costs are to make lives happy (presumably there is a negative correlation between the costs to make lives better and the inequality of power, money, etc), it might be the case that regular citizens end up pretty alright on the other side.
Curious what people think about this!
Also, many of the links at beginning (YouTube, World Bank, Rentier states, etc) don’t work.
Makes sense. Good clarification!
I think people should know that this exists (Sam Harris arguing for misaligned AI being an x-risk concern on Big Think YouTube channel):
Credit also matters because it helps us identify which interventions actually produce good outcomes. For example, understanding whether I or 80,000 Hours was responsible for someone in my EA group securing a high-impact job informs where we should allocate resources to most effectively place more people in such roles.