Thomas Larsen comments on Thomas Larsen’s Shortform

Thomas Larsen 9 Mar 2026 16:30 UTC
110 points
77
I think that people overrate bayesian reasoning and underrate “figure out the right ontology”.
Most of the way good thinking happens IMO is by finding and using a good ontology for thinking about some situation, not by probabilistic calculation. When I learned calculus, for example, it wasn’t mostly that I had uncertainty over a bunch of logical statements, which I then strongly updated on learning the new theorems, it was instead that I learned a bunch of new concepts, which I then applied to reason about the world.
I think AI safety generally has much better concepts for thinking about the future of AI than others, and this is a key source of alpha we have. But, there are obviously still a huge number of disagreements remaining within AI safety. I would guess that debates would be more productive if we more explicitly focused on the ontology/framing that each other are using to reason about the situation, and then discussed to what extent that framing captures the dynamics we think are important.
I think it would be good if more people say things like “I think that’s a bad concept, because it obscures consideration X, which is important for thinking about the situation”.
Here are some widely used concepts I think are bad and I wish became less load bearing in AI safety discourse:
- “Fast” and “slow” takeoff; takeoff speeds in general. I think these concepts are very unclear and not super natural. There are various operationalizations of these (e.g. Paul’s “slow takeoff” = a gdp doubling over the course of the 4 years before the first single year that gdp doubles). This is obviously arbitrary, and I don’t see why world’s that meet this definition are worthwhile reasoning about separately from worlds that don’t meet this definition. I also think its easy to smuggle in lots of other correlations here, like slow takeoff = people are woken up, or slow takeoff = alignment is much easier, etc.
  - I feel more excited about talking about things like “I think milestone X will happen at date Y”, or “My median is that milestone X and Z are Y time apart”. For example, I think the concepts of automating coding, automating all AI research, automating ~the whole economy, and increasing earth’s energy output 1000x are all useful capability milestones, and it’s useful to talk about these.
- “scheming”. I think the definition of scheming is pretty unclear, and changes a bunch depending on the context.
  - Under some definitions I believe that the AIs are always going to be scheming, under others, it seems kind of narrow and unnatural.
  - I somewhat prefer the concepts from the “alignment over time” box in AI 2027.
- “gradual disempowerment”
  - I think that this conflates a bunch of scenarios / threat models together, some of which don’t make sense, some of which do make sense, but I don’t think that the solutions are very related.
  - Gradual disempowerment is often presented as not requiring any misalignment on behalf of the AIs. If there are AIs that are aligned with any human principals, we get a situation where, even without any coordination, AIs compete on behalf of their principals, and then give the principals whatever surplus they are able to produce.
  - Given that framing, we can now talk specific threat models. For example, maybe their is no surplus: warfare/competition eats away all the additional resources, and space is consumed purely by the optimal self replicators. Alternatively, maybe the AIs weren’t actually acting in the interests of humanity. Finally, maybe the process of competing hard was existentially catastrophic early on, e.g., maybe it resulted in the oceans being boiled (and humans didn’t take appropriate countermeasures), resulting in extinction.
Note: I’m not saying that all concepts that I think are bad can be saved or crystallized into good concept, often the best idea is to just forget about the original concept and use better/crisper concepts.
I should also maybe give some concepts I think are generally good: p(doom) from AI takeover, timelines, x-risk, s-risk, recursive self improvement, internal/external deployment, AI control, AI alignment. I don’t think any of these concepts are fully crisp, all have somewhat important edge cases, but I think they are good enough abstractions to be very useful for thinking about the future.
- TsviBT 9 Mar 2026 19:58 UTC
  24 points
  14
  Parent
  I would guess that debates would be more productive if we more explicitly focused on the ontology/framing that each other are using to reason about the situation, and then discussed to what extent that framing captures the dynamics we think are important.
  
  I strongly agree with this. However, I’ll note as one aspect of the discourse problem, that, at least in my personal experience, people are not very open to this. People’s eyes tend to glaze over. I do not mean this as a dig on them. In fact, I also notice this in myself; and because I think it’s important, I try to incline towards being open to such discussions, but I still do it. (Sometimes endorsedly.)
  
  Some things that are going on, related to this:
  - It’s quite a lot of work to reevaluate basic concepts. In one straightforward implementation, you’re pulling out foundations of your building. Even if you can avoid doing that, you’re still doing an activity that’s abnormal compared to what you usually think about. Your reference points for thinking about the domain have probably crystallized around many of your foundational concepts and intuitions.
  - Often, people default to “questioning assumptions” when they just don’t know about a domain but want to sound smart / don’t want to try to do the difficult work of understanding the domain. That can be tiring / irrelevant for an expert.
  - The Criteria for concepts being good are quite muddled and difficult, at least AFAIK.
  (Cf. https://www.lesswrong.com/posts/TNQKFoWhAkLCB4Kt7/a-hermeneutic-net-for-agency )
  
  I think it would be good if more people say things like “I think that’s a bad concept, because it obscures consideration X, which is important for thinking about the situation”.
  
  Totally agree, but I think it’s pretty difficult to explain these things. Part of what’s going on is that, if I have concept X and you don’t, and therefore you don’t think about Y as well as you could, that doesn’t mean I can justify X to you, necessarily. You probably have alternate concepts to partially think about Y. For one thing, maybe your concepts actually are as good or better than my X! In which case I should be trying to learn from you, not teach you. For another thing, your specific pattern of thinking about Y in a partially-correct but impoverished way is particular way of being bad (“each unhappy family...”). So, I would have to track your specific errors / blindspots, in order to make a clear + concise case to you that you should use X. (This is a scenario where live convo is just strictly better than text walls.)
- romeostevensit 11 Mar 2026 21:26 UTC
  6 points
  4
  Parent
  As Robin Hanson put it: finding new considerations often trumps fine tuning existing considerations.
  
  I’d say this is expected in worlds with high dimension complexity, large differences in rewards, hidden information (both external and internal), and adversarial dynamicss.
- David Matolcsi 9 Mar 2026 18:41 UTC
  6 points
  0
  Parent
  Can you say more about how you think about scheming and what would be a useful definition in that space?
  - Thomas Larsen 21 Mar 2026 19:26 UTC
    2 points
    0
    Parent
    Sorry for the slow response. I wrote up some of my thoughts on scheming here: https://www.lesswrong.com/posts/q8fdFZSdpruAYkhZi/thomas-larsen-s-shortform?commentId=P8GTDD5CLMxr9tczv
- Vladimir_Nesov 9 Mar 2026 18:08 UTC
  6 points
  0
  Parent
  Key constructions can often be made from existing ingredients. A framing rather than “ontology” is emphasis on key considerations, a way of looking at the problem. And finding which framings are more useful to lean on feels more like refinement of credence, once you have the ingredients.
  
  Inventing or learning new ingredients can be crucial for enabling the right framing. But the capstone of deconfusion is framing the problem in a way that makes it straightforward.
- Richard_Ngo 10 Mar 2026 13:40 UTC
  5 points
  0
  Parent
  Strong agree. In case you haven’t read it yet, I argue similarly here and here. Except that I’m also more skeptical of the concepts you listed as good: I’d say most of them used to be good concepts, but we now have much more conceptual clarity on AGI and the path leading to it and so need higher-resolution concepts.
- testingthewaters 9 Mar 2026 20:34 UTC
  5 points
  1
  Parent
  Some additional hurdles: “I think your ontology is not well adapted for this issue” sounds a lot like “I think you are wrong”, and possibly also “I think you are stupid”. Ontologies are tied into value sets very deeply, and so attempts to excavate the assumptions behind ontologies often resemble socratic interrogations. The result (when done without sufficient emotional openness and kindness) is a deeply uncomfortable experience that feels like someone trying to metaphysically trip you up and then disassemble you.
- Jan_Kulveit 10 Mar 2026 19:44 UTC
  4 points
  0
  Parent
  I agree “figure out the right ontology” is underrated, but from the list of examples my guess is I would disagree whats right and expect in practice you would discard useful concepts, push toward ontologies making clear thinking harder, and also push some disagreements about whats good/bad to the level of ontology, which seems destructive.
  
  - Fast and Slow takeoffs are bad names, but the underlying spectrum “continuous/discontinuous” (“smooth/sharp”) is very sensible and one of the main cruxes for disagreements about AI safety for something like 10 years. “I think milestone X will happen at date Y” moves the debate from understanding actual cruxes/deep models to ~~dumb~~ shallow timeline forecasting.
  
  - “scheming” has become too broad, yes
  
  - “gradual disempowerment”—possibly you just don’t understand the concept/have hard time translating it to your ontology? If you do understand Paul’s “What failure looks like”, the diff to GD is we don’t need ML to find greedy/influence-seeking pattern; our current world already has many influence-seeking patterns/agencies/control systems other than humans, and these patterns may easily differentially gain power over humans.
  -- usually people who don’t get GD are stuck at the ontology where they think about “human principals” and gloss over groups of humans or systems composed of humans not being the same as humans
  
  p(doom) is memetically fit and mosty used for in-group signalling; not really that useful variable to communicate models; large difference in “public perception” (like between 30% and 90%) imply just a few bits in logspace
  
  xrisk and srisk are useful and reasonably crisp
  
  AI alignment had a meaning but is currently mostly a conflationary alliance
  
  AI control is a sensible concept which increases xrisk when pursued as a strategy
- Daniel Kokotajlo 9 Mar 2026 21:51 UTC
  4 points
  2
  Parent
  I mostly agree I think—but, how do you teach/train to get good at finding the right ontology? Bayesian reasoning is at least something that can be written down and taught, there’s rules for it.
  - David James 10 Mar 2026 18:47 UTC
    1 point
    0
    Parent
    Recognizing the importance of choosing and comparing models / concepts might be a prerequisite concept. People learn this in various ways … When it comes to choosing what parameters to include in a model, statisticians compare models in various ways. They care a lot about predictive power for prediction, but also pay attention to multicollinearity for statistical inference. I see connections between a model’s parameters and an argument’s concepts. First, both have costs and benefits. Second, any particular combination has interactive effects that matter. Third, as a matter of epistemic discipline, it is important to recognize the importance of trying and comparing frames of reference: different models for the statistician and different concepts for an argument.
- Oliver Sourbut 11 Mar 2026 13:01 UTC
  2 points
  −2
  Parent
  nit: Christiano operationalised ‘slow takeoff’ via ‘world product’, not GDP. I’m not sure exactly what he meant by that (or if he had a more concrete operationalisation), but it does strike me as wise to not anchor to GDP which is awfully fraught and misleadingly conservative.
  
  ETA: fake news! I checked and while he starts talking about ‘output’, he later seems to operationalise it as GDP specifically
- Mateusz Bagiński 10 Mar 2026 13:03 UTC
  2 points
  0
  Parent
  Generally strongly agree.
  One caveat:
  I should also maybe give some concepts I think are generally good: p(doom) from AI takeover, timelines, x-risk, s-risk, recursive self improvement, internal/external deployment, AI control, AI alignment. I don’t think any of these concepts are fully crisp, all have somewhat important edge cases, but I think they are good enough abstractions to be very useful for thinking about the future.
  There’s a difference between [a concept in the sense that it was originally coined or in the sense that some specific group uses it] being good and [a concept as it is used across different communities or as an indefinite socioepistemic blob of meaning and associations being good]. Alignment is a useful concept in something like its original formulation, but it has been incredibly diluted and expanded. https://x.com/zacharylipton/status/1771177444088685045 https://www.lesswrong.com/posts/p3aL6BwpbPhqxnayL/the-problem-with-the-word-alignment-1
  More recently, “loss of control” met the same fate. From https://www.apolloresearch.ai/research/loss-of-control/ (emphasis mine):
  First, the report assesses existing definitions of LoC in AI literature, as well as in other safety-critical industries such as aviation, nuclear, and cybersecurity, to arrive at a common conceptualization of LoC. However, we learnt that existing definitions of LoC are diverse. Some focus on loss of reliable direction or oversight; others emphasize situations in which there is no clear path to regaining control. Some implicitly include failures that are already occurring in current systems, while others implicitly limit LoC to scenarios involving highly advanced or even superintelligent AI.
  [...]
  This exercise allowed us to infer that LoC is not a single point but a spectrum that can cluster into three qualitatively distinct bands. On this basis, the report proposes a taxonomy with three degrees: Deviation, Bounded LoC, and Strict LoC.
  Deviation captures events that cause some harm or inconvenience but lack the requisite severity and persistence to reach the economic consequences threshold that the U.S. Department of Homeland Security, federal agencies, and the intelligence community use to demarcate national-level events in the Strategic National Risk Assessment.
- ChristianKl 9 Mar 2026 23:11 UTC
  2 points
  0
  Parent
  When it comes to good ontology, more people should understand what Basic Formal Ontology is. When it comes to AI alignment, it might be productive if someone writes out a Basic Formal Ontology compatible ontology of it.
  - CstineSublime 10 Mar 2026 10:03 UTC
    3 points
    0
    Parent
    I have never heard of this before let alone understand it, can you recommend any good primers? All the resources I can find speak in annoyingly vague and abstract sense like “a top-level ontology that provides a common framework for describing the fundamental concepts of reality.” or “realist approach… based on science, independent of our linguistic conceptual, theoretical, cultural representations”.
    - ChristianKl 11 Mar 2026 10:36 UTC
      2 points
      0
      Parent
      I think the general issue is that while people in this community and the AI alignment community have quite seriously thought about epistemology but not about ontology.
      There’s nothing vague about the sentence. It’s precise enough that’s it’s a ISO/IEC standard. It’s however abstract. If you have a discussion about Bayesian epistemology, you are also going to encounter many abstract terms.
      BFO grew out of the practical needs that bioinformaticians had at around 2000. The biologists didn’t think seriously about ontology, so someone needed to think seriously about it to enable big data applications where unclear ontology would produce problems. Since then BFO has been most more broadly and made into the international standard ISO/IEC 21838-2:2021.
      This happens in a field that calls themselves applied ontology. Books like Building Ontologies with Basic Formal Ontology by Robert Arp, Barry Smith, and Andrew D. Spear explain the topic in more detail. Engaging with serious conceptual framework is work but I think if you buy the core claim of ‘I think that people overrate bayesian reasoning and underrate “figure out the right ontology”’ you shouldn’t just try to develop your ontology based on your own naive assumptions about ontology but familiarize yourself with applied ontology. For AI alignment that’s probably both valuable on the conceptual layer of the ontology of AI alignment but might also be valuable for thinking about the ontological status of values and how AI is likely going to engage with that.
      After Barry Smith was architecting BFO and first working in bioinformatics he went to the US military to do ontology for their big data applications. You can’t be completely certain what the military does internally but I think there’s a good chance that most of the ontology that Palantir uses for the big data of the military is BFO-based. When Claude acts within Palantir do engage in acts of war in Iran, a complete story about how that activity is “aligned” includes BFO.
      - CstineSublime 12 Mar 2026 9:29 UTC
        3 points
        2
        Parent
        There’s nothing vague about the sentence.
        I strongly disagree. “describing the fundamental concepts of reality” is unhelpfully vague, what are these fundamental concepts? I don’t know and can’t guess what it is from that sentence, which is ironic considering it is an Ontological framework.
        ChristianKl 12 Mar 2026 9:47 UTC
        2 points
        0
        Parent
        The word reality has a clear meaning in ontological realism. If you lack that background then it feels vague.
        This is similar to saying that when someone speaks about something being statistically significant they are vague because significant is a vage word. You actually need to understand something about statistics for the term not to feel vague.
- MKodama 31 Mar 2026 17:49 UTC
  1 point
  0
  Parent
  Most of the way good thinking happens IMO is by finding and using a good ontology for thinking about some situation, not by probabilistic calculation.
  As a side point, this is a trendy view in epistemology. Most of our learning in real life is not a matter of reallocating credence among hypotheses we were already aware of according to Bayes’s theorem. Rather, most of our learning is becoming aware of new hypotheses that weren’t even in the domain of our prior credence function.
  Beyond Uncertainty by Steele & Stefánsson is a good (and short) overview of approaches to awareness growth in formal epistemology.
- MinusGix 10 Mar 2026 22:27 UTC
  1 point
  0
  Parent
  I somewhat agree, but I also do think “apply your Bayesian reasoning to figuring out what hypotheses to privilege” is how people decide which structural hypotheses (ontology) describe the world better. So I feel you’re taking an overly narrow view. Like, for scheming, you ask how these different notions inform what you can observe, the way the AI behaves, and methods to avoid it.