habryka comments on Schelling Goodness, and Shared Morality as a Goal

habryka 1 Mar 2026 20:37 UTC
LW: 13 AF: 6
0
AF
I feel like the Cosmic Schelling Answer to “Should you act according to your own internal sense of morality, or according to the Cosmic Schelling Answer” is “you should act according to your own internal sense of morality” (this is because the argument is simpler, and also, IDK, it’s not like I actually need to coordinate with other cosmic civilizations that don’t exist right now).
But even not taking the frame as a given, I don’t really understand what I am supposed to do with this concept. Like, why would I want to behave according to Cosmic Schelling Morality? I would like to behave according to my all-things-considered morality, and expect other agents to do the same, which in some circumstances means I want to act in accordance with simple-to-identify Schelling points, and in other cases means I want to pursue my personal interests intently.
- Andrew_Critch 1 Mar 2026 22:44 UTC
  LW: 3 AF: 2
  0
  AF Parent
  I would like to behave according to my all-things-considered morality,
  
  Sounds pretty reasonable to me! Intentionally not considering stuff is fraught.
  
  which in some circumstances means I want to act in accordance with simple-to-identify Schelling points, and in other cases means I want to pursue my personal interests intently.
  
  FWIW this also sounds pretty correct/healthy to me.
  
  I need two points of clarification to answer your questions:
  
  I don’t really understand what I am supposed to do with this concept.
  1. Whose morality would you like me to use as defining the “supposed” here? My guess at yours? My own? Something else?
  Like, why would I want to behave according to Cosmic Schelling Morality?
  1. I’m not sure what you mean by ’behave according to cosmic Schelling morality”. Do you mean
  a) consider it at all?
  
  b) consider it as the only determinant of your behavior?
  
  c) something else?
  - habryka 2 Mar 2026 1:41 UTC
    LW: 5 AF: 3
    0
    AF Parent
    Whose morality would you like me to use as defining the “supposed” here? My guess at yours? My own? Something else?
    Your best guess at my own! I.e. I am pretty sure you think something good will happen to me (by my own lights) if I learn about this, and I have some vague pointers for what that might be, but my guess is you have thought more about it and could explain more (right now I have thought about it for like the 15 minutes that it took me to read the post, which was a good use of my time, but I don’t currently expect by default to come back to it).
    I’m not sure what you mean by ’behave according to cosmic Schelling morality”. Do you mean
    I mean “a) consider it at all”.
    - Andrew_Critch 2 Mar 2026 3:07 UTC
      LW: 9 AF: 5
      6
      AF Parent
      
      Your best guess at my own!
      
      Roger that!
      
      I don’t really understand what I am supposed to do with this concept.
      
      Hypothesis A: Lightcone Infrastructure, insofar as it’s interested in the lightcone, might occasionally be philosophically interested in cosmic Schelling norms for their potential relevance to lightcone-sized coordination events, including potential encounters with other civilizations, civilizational offshoots, world-simulators, or vivarium boundaries.
      
      But if that didn’t already jump out to you as interesting…
      
      Hypothesis B: For you, the conceptual drivers of the post may be more useful than its overall thrust, as points to reflect upon and/or reference later. Specifically:
      
      the Schelling transformation Q ↦ S(P,Q) on questions for various populations P aside from the cosmos, including cases where P is
      
      a) yourself, i.e., the population of your own subagents / neural processes;
      
      b) groups you’re a part of; or
      
      c) groups you’re not a part of.
      
      I’ve considered writing a follow-up post about the dynamics of the relationships between P-Schelling goodness for various overlapping and interacting populations P, but I suspect if you just boggle at the idea it might bear some fruit for you independently, and faster than waiting for me to blog about it.
      
      (Personally I think P=self is a super interesting case for defining what is a ‘decision’ for an embedded agent made of parts that need to coordinate, but that’s probably more of a me-thing to be interested in.)
      
      Scale invariant norms: I suspect scale invariance of certain normative principles is under-appreciated in general, and probably in particular by you, as a recursively potent determinant of norm emergence at large scales. For instance, you can pretend the ~100^10 humans alive today are organized into a depth-6 social hierarchy tree with a branching factor of ~100 (~Dunbar’s number), and think about how the Schelling norms of each node along with its children might evolve. In reality the structure is not a tree, but you probably get the idea.
      
      the Schelling participation effect — both sections on it — are useful as a partial model of the ‘snowball’ effect one sometimes sees in movement-building and/or Silicon Valley hype cycles.
      
      Hypothesis C: I didn’t argue or even speculate this in the post, but I suspect cosmic Schelling norms are probably easier to align AI with than arbitrary norms, for better or for worse. Probably that deserves a separate essay, but in case it’s intuitive to you, it might be another idea that bears fruit faster by you boggling at it yourself instead of waiting for me to write about it.
      
      I mean “a) consider it at all”.
      
      Coming back to (A), I think not considering cosmic Schelling norms would be sort of selectively ignoring something that belongs in the “all” of your “all things considered”… not an overall determinant of behavior, but, something to consider with regards to the lightcone, if that’s still something you think about (I’m genuinely unsure how much the lightcone scope still interests you in regards to your personal mission/drive).
      - Raemon 2 Mar 2026 3:52 UTC
        LW: 7 AF: 4
        1
        AF Parent
        The big I’d expect to be feel most relevant to Habryka is this bit (only briefly mentioned in this post, it feels like there’s a whole other post waiting to be written someday about it)
        to contribute to present-day Earth as a civilization being recognizable as a promising potential coordination partner, rather than noise to be filtered out or a cosmically threatening process to be contained.
        Where, if you take this seriously, it might change some of your priorities about how to do various coordination-with-humans. Because, you might think the biggest win condition is being a being a good citizen of the acausal multiverse that other civilization notice and trade with.
        My vague impression Habryka agrees, but, thinks you can basically worry about that after leaving the acute risk period. My vague impression is Critch thinks something like “how you and your species conduct yourselves during the acute risk period is a stronger consideration”?
        Curious if that sounds right to either of you.
        Andrew_Critch 2 Mar 2026 17:39 UTC
        LW: 8 AF: 3
        3
        AF Parent
        
        Critch thinks something like “how you and your species conduct yourselves during the acute risk period is a stronger consideration”?
        
        I wouldn’t argue hard for it being “a stronger consideration” because I think that’s a harder question.
        
        But I would argue hard for it being “a consideration”, especially if someone was like “boo this is worthless and should round to zero”. And, if someone finds the “cosmic” framing distracting, I would also argue the following, in terms of real-world relevance this century:
        
        humanity and AI are both less likely to wind up in mutually destructive conflict if we both pay non-zero attention to scale-invariant moral principles and their Schelling-ness at various real-world scales.
        
        individual groups of humans are less likely to get into needlessly destructive AI-powered wars with each other if we pay some non-zero attention to scale-invariant moral principles and their Schelling-ness at various real-world scales.
        
        humanity and AI are both less likely to face simulation shutdown if we pay nonzero attention to scale-invariant moral principles and their Schelling-ness at various real-world scales.
        
        These would primarily not be arguments to override all considerations, but to be considerations at all.
        Raemon 2 Mar 2026 19:08 UTC
        LW: 4 AF: 3
        0
        AF Parent
        Nod. (fwiw I meant “stronger consideration” to be “stronger than I think habryka thinks it is”in relative terms)
        humanity and AI are both less likely to face simulation shutdown if we pay nonzero attention to scale-invariant moral principles and their Schelling-ness at various real-world scales.
        I was thinking about this last night, and then remembered in Acausal normalcy you had also argued that that simulation is pretty expensive and probably not how most acausal interaction probably works, and then was a bit confused about what you were arguing here and/or what was likely to be true, a la
        Writing out arguments or formal proofs about each other is much more computationally efficient, because nested arguments naturally avoid stack overflows in a way that nested simulations do not.
        Andrew_Critch 2 Mar 2026 21:16 UTC
        LW: 2 AF: 2
        0
        AF Parent
        Personally I think the world as we know it is more likely to be in a vivarium than a simulation, though many of the same principles apply in terms of there being a powerful outside force that can shut down Earth-originating civilization if it appears to have bad morals. And yes, people writing proofs about us without simulating us is another pathway for our actions to matter separately from their naive causal consequences. And, although fewer people understand the proof-theoretic angle, do I think the effect is even stronger than the simulation effect because simulations are just a special case of proofs.
        Tom Davidson 6 Mar 2026 1:48 UTC
        LW: 4 AF: 3
        0
        AF Parent
        
        Personally I think the world as we know it is more likely to be in a vivarium than a simulation,
        
        Why?
        Andrew_Critch 6 Mar 2026 6:20 UTC
        LW: 2 AF: 2
        0
        AF Parent
        Because there are so many more possible viviaria than simulations. Like, a simulation is simulating something, which involves a kind of match between what’s in the simulation and what it’s supposed to represent/simulate. If it contains life and life is the object of study, it’s a vivarium, and I think most simulations are vivaria, but most vivaria are probably not simulations.
        Tom Davidson 6 Mar 2026 11:12 UTC
        LW: 5 AF: 3
        0
        AF Parent
        In our case, if we are ultimately running on a computer, then wouldn’t that mean that we are a simulation? It seems obvious that we are trying; the intention would be to simulate a pre-AGI civilization.
        
        Could we be running on a computer and also be a vivarium?
        Expand this thread
        Andrew_Critch 9 Mar 2026 7:10 UTC
        LW: 4 AF: 3
        0
        AF Parent
        I get that a lot of people use “simulation” and “computer program” as basically synonyms, but that’s a bit linguistically impoverished for the hard work of analyzing distinct metaphysical hypotheses and their consequences. Consider that ideal behavior is different in
        
        a) computer worlds that are built to mimic an existing world, in which case our “job” is be similar to whatever is “out there” that we’re a simulation of, versus
        
        b) computer worlds that are built to be a de novo home for a new kind of life or being, such as for entertainment or exploratory science.
        
        (a) is a “simulation”, and if I discover I’m in a simulation, I might just go ahead an act like I’m not, to help the simulator with their intended purpose to mimic something that’s not in a simulation.
        
        (b) is not a “simulation”, and if we ever discover we’re in one of those, I might look for other instructions for what I’m supposed or expected to do, from the creator(s). I call this a ‘digital vivarium’ but I’m open to other terms, just not ‘simulation’ which fails to correctly draw the distinction of, well, not being a simulation of anything.
        
        if we are ultimately running on a computer, then wouldn’t that mean that we are a simulation?
        
        Not really, if the computer program we’re inside is not designed to simulate anything.
        
        Could we be running on a computer and also be a vivarium
        
        Yes, because artificial life is still life.
        Tom Davidson 12 Mar 2026 11:15 UTC
        LW: 4 AF: 1
        2
        AF Parent
        Ok but if we’re on a computer then isn’t it clear we’re a simulation, not a vivarium, bc clearly it was designed to simulate the behavior of a pre-agi civ?
        Andrew_Critch 16 Mar 2026 1:12 UTC
        LW: 2 AF: 2
        0
        AF Parent
        
        bc clearly it was designed to simulate the behavior of a pre-agi civ?
        
        No? This makes no sense to me, unless you define “computer” to mean “computer built by a future human civilization”, which is a weirdly human-centric definition of computer.
        
        Here’s a weirdly specific scenario to help illustrate why:
        
        Suppose I use a 2040 MacBook to build an alien-like digital world with novel 13-tentacled lifeforms in it, that are not a simulation of anything that I believe exists. The aliens in that digital world have their own computers that look nothing like MacBooks, but I’m still quite interested in what they’ll do with their alien computers, and if they’ll make AGI. The computers run on “greenstone” circuits that are more like Minecraft redstone circuits than electrical transistors, but are actually different from both (not a simulation of either). The creatures then discuss, in their own way of communicating, as follows:
        
        Creature 1: “If we’re in a computer, it’s sure to be a simulation.”
        
        Creature 2: “Not necessarily. Even if we’re in a computer, it could be in some kind of digital vivarium that’s not a simulation of anything, just a computational world with artificial lifeforms created to live within it (us).”
        
        Creature 1: “But if we’re in a computer, clearly it was designed to simulate the behavior of a pre-agi civ?”
        
        Creature 2: “No? If we’re in a vivarium, its creators may be a wholly different civilization, for whom we are not a simulation of anything. Like, who knows, maybe they only have 4 limbs! Unless by ‘computer’ you specifically mean one of these 13-tentacle-operated devices that we-specifically built from greenstone circuits… then sure, yeah, our future civilization would probably be running the simulation in that case. But that’s a weirdly us-specific definition of ‘computer’ don’t you think?”
        Tom Davidson 16 Mar 2026 17:55 UTC
        LW: 2 AF: 1
        0
        AF Parent
        Thanks, that’s helpful.
        
        What seems clear to me is that our world is the result of fairly simple laws of physics, and our creators wanted to know how those simple laws would play out. They’re saying “if there was a universe with these laws, what would happen”. (This is what I’d meant by “simulation”)
        
        I agree it’s less clear that they’re doing this bc they think those laws also describe a real-world process (somewhere in the multiverse) and they want to predict the outcome of that process. (This is what you meant by “simulation” and I think your def is better.)
        
        So I understand where you’re coming from better now. Thanks!
        
        But I still think we’re in a simulation, in your stronger sense of the word! Why? Bc:
        
        other civs will reasonably believe our laws of physics describe part of the multiverse,
        this gives them a strong instrumental reason to simulate this,
        absent 1 and 2 there aren’t comparably strong reasons to run vivariums like our world.
        Andrew_Critch 17 Mar 2026 1:45 UTC
        LW: 2 AF: 2
        0
        AF Parent
        
        absent 1 and 2 there aren’t comparably strong reasons to run vivariums like our world.
        
        Why the focus on “reasons”?
        
        Many things exist from causes that are not “reasons” in the sense of a decision-maker choosing something with an objective. All reasons are causes, but not all causes are reasons. For example, reproduction is a process that creates a lot of things without “reasons” in the central case of the word referring to something “reasoning”.
        
        And, if you wonder what caused you (or us) to exist, a good contender is “a causing-things-to-exist maximizer”.
        the gears to ascension 17 Mar 2026 2:15 UTC
        LW: 4 AF: 3
        0
        AF Parent
        Is there a strong enough prior on causing-things-to-exist-maximizers in, eg, the universal distribution, though?
        Andrew_Critch 28 Mar 2026 6:21 UTC
        LW: 4 AF: 3
        0
        AF Parent
        Which universal distribution?
        
        Some universal distributions are full of agents that make choices that make that distribution not a valid model of reality after the decisions are made (self-defeating). Other distributions are full of agents making decisions that ratify the distribution (self-fulfilling).
        
        Distributions that aren’t fixed points under reflection about what they decide about themselves are not coherent models of reality.