Schelling Goodness, and Shared Morality as a Goal
Also available in markdown at theMultiplicity.ai/blog/schelling-goodness.
This post explores a notion I’ll call Schelling goodness. Claims of Schelling goodness are not first-order moral verdicts like “X is good” or “X is bad.” They are claims about a class of hypothetical coordination games in the sense of Thomas Schelling, where the task being coordinated on is a moral verdict. In each such game, participants aim to give the same response regarding a moral question, by reasoning about what a very diverse population of intelligent beings would converge on, using only broadly shared constraints: common knowledge of the question at hand, and background knowledge from the survival and growth pressures that shape successful civilizations. Unlike many Schelling coordination games, we’ll be focused on scenarios with no shared history or knowledge amongst the participants, other than being from successful civilizations.
Importantly: To say “X is Schelling-good” is not at all the same as saying “X is good”. Rather, it will be defined as a claim about what a large class of agents would say, if they were required to choose between saying “X is good” and “X is bad” and aiming for a mutually agreed-upon answer. This distinction is crucial, to avoid interpreting the essay as claiming moral authority beyond what is actually implied from the definitions.
I’ll also occasionally write a speculative paragraph about questions that seem important but that I’m not at all confident about answering. Those paragraphs will be labeled with (Speculation) at the start, to clearly separate them from the logic of the remaining document. The non-speculation content is presented with minimal unnecessary hedging: the language is hedged only when I’m convinced it needs to be for correctness, and otherwise assertions are stated directly. That is: for the sake of clarity, performative uncertainty is not included.
This essay is not very “skimmable”
Argumentation is used throughout this essay to explore what logically or probabilistically follows from hypothetical conditions. For instance, given a population of agents explicitly trying to converge on a shared moral answer, with common knowledge of this goal, and a forced binary answer space of {good, bad}, what would they most likely say?
If you’re just skimming, it might be easy to miss that most of these conditions are part of the stipulations of thought experiments or definitions, not claims about the world requiring independent verification or defense. For instance, if you find yourself thinking “but common knowledge isn’t guaranteed!” or “what about third options?”, those objections are probably targeting the premises of a question posed within the essay, rather than assertions about reality. So if you encounter a claim that seems objectionable, it’s probably worth looking back to see if it’s been stipulated as part of a thought experiment, or derived from such stipulations, rather than being asserted as fact.
This essay does make some unconditional assertions about the world, and those assertions usually require the arguments leading up to them for support. The real-world assertions are mostly about how cosmically large classes of real-world intelligent agents would respond to certain questions about each other. The questions involve unrealistic thought experiments about common knowledge, but the assertions about how real agents would respond to questions about those thought experiments are, I believe, well-supported by the arguments presented here.
In summary, it’s important throughout to track the difference between
-
thought experiment stipulations, versus
-
assertions about what large classes of real agents would say about those thought experiments.
Pro tanto morals, ‘is good’, and ‘is bad’
The terms “good” and “bad” are used throughout this essay. Now, without agreeing on any complete definition of “good” and “bad”, we can at least agree on the following fundamental observation about the behavioral effects of these terms:
Encouragement asymmetry: in most ordinary use, calling a behavior “good” tends to encourage that behavior relative to calling it “bad”, while calling a behavior “bad” tends to discourage the behavior relative to calling it “good”.
Some points of clarification:
-
This is not a definition of “good” or “bad”; it’s an observation about the real-world usage and effects of these terms, which we’ll use as a foundation for deriving other conclusions without assuming any particular definition of “good” or “bad”.
-
By encouragement here, I mean a simple, non-normative causal tendency: in typical social contexts, one agent’s labeling of a behavior as “good” or “bad” will shift another agent’s probability of doing it. I’m open to other words for this idea of “encouragement” — perhaps “promotion” or “reinforcement”. The underlying concept is the key, not the word: labeling a behavior “good” tends to increase its probability, while labeling it “bad” tends to decrease it.
Equipped with this observation, we’ll treat uses of “is good” and “is bad” as making (at least) pro tanto moral assertions — which tend to encourage or discourage a behavior to some extent, all else equal, without necessarily claiming to dominate every other consideration or tradeoff. Ceteris paribus is Latin for “all else equal”, so these could also be called Ceteris paribus moral assertions.
The “all else equal” qualifier here is important: saying “lying is bad” doesn’t mean lying can never be justified, only that the lying aspect of an action counts against it in moral evaluation. This is a deliberately minimal handle on moral language, so that we can avoid committing to a complete definition of goodness while still saying something meaningful. Examples include:
“lying is bad”
“killing is bad”
“healing is good”
“honesty is good”
The claim “lying is bad” is importantly different from “no one should ever lie” or “lying is the worst thing you can do”, which are manifestly stronger claims. Still, regarding any plan you might have that involves lying, I suspect we can agree that “lying is bad” at least means:
the lying aspect of your plan is a strike against it, not in favor of it;
“lying” gets a negative sign in our value function(s) for evaluating the plan’s desirability;
even if your plan is overall worth doing, the lying is an undesirable aspect in our reasoning about whether you should do it.
Simply put, when a plan involves lying, that fact belongs in the “cons” column of the pros-and-cons list.
Part One: The Schelling Participation Effect
Imagine the following two scenarios, which are both versions of a familiar example for teaching about Schelling points.
Suppose you’re visiting Paris, and you and I have agreed to meet there tomorrow during the daytime, but we haven’t exchanged any hint about where or when — only: “in Paris during the daytime tomorrow.” Now…
-
In Version A: You just lost your backpack with your cell phone and computer. You don’t know whether I’ve sent details about where to meet, or whether I’m expecting you to have received them. I might still have full communication access and might assume you do as well. Crucially, we lack common knowledge that we’re in a coordination-without-communication game — you might suspect we are, but you don’t know that I know that you know, and so on.
-
In Version B: The cell network and internet seem to be down, for all of Paris. You expect me to know that, and to expect you to know that, and so on. That is, assume we have common knowledge of the communication blackout.
In each version, you need to guess where and when you should meet me, and then actually do it.
Think for at least 30 seconds about each version — especially if you haven’t encountered this before — and notice how you feel differently about Version A versus Version B in terms of your chances of guessing the right place to meet me.
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
If you don’t have an answer in mind for both, stop, and keep thinking until you have one.
Now that you have an answer…
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
I’m guessing:
You picked (or predicted) the Eiffel Tower for the location,
You picked noon for the time of day (unless you missed the daytime constraint and picked midnight, which folks occasionally do), and
You’re a lot more confident about finding me in Version B, because you know I’m playing the same guessing game as you, and you expect me to guess the most guessable answer.
Now consider…
Version C: You and I and 10 randomly sampled 2026 humans are all in the same situation, all guessing where the largest subset of the group will show up for the meeting. We have common knowledge of this, and that everyone is trying to guess the same answer.
Pause to reflect on this, and how our intent to converge with additional people affects your confidence level that you will pick the most common answer.
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
In Version C, are you more confident, or less confident, that you will guess correctly?
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
Probably you’re more confident, right? If the strangers are randomly sampled independently from the same background population (2026 humans), your confidence in the modal (most common) answer rationally goes up as the group size grows. Indeed, it’s often easier to predict something about the average or modal behavior of a large group — like, knowing there will be a traffic jam on the Bay Bridge tomorrow — than predicting the behavior of individuals — like, knowing who exactly will end up in the traffic jam.
(For math-lovers, here is some non-crucial but interesting detail: How quickly confidence grows in the modal answer, as a function of the group size, depends somewhat on the distribution — especially how separated the top option is from the runner-up. To simplify, if we ignore that the participants want to converge in their answer, there is already some statistical convergence to be expected. In finite-choice settings with a clear leader, the probability of misidentifying the population mode using the sample mode decays very quickly, typically exponentially in n when the gap is fixed. In other settings — e.g., estimating the location of a mode of a smooth continuous distribution with a twice-differentiable concave peak — sometimes the convergence rates can be slower, exhibiting asymptotics on the order of n^(-1/3), such as in Chernoff’s 1964 paper, Estimation of the mode. That’s not at all crucial for this essay, though, since for the boolean questions we’ll examine later, estimating the mean parameter is enough to deduce which outcome is more likely than not, so the standard n^(-1/2) scaling (CLT) for the sample mean’s estimation error will be applicable.)
What we’ll call the Schelling participation effect is something a bit more potent than this statistical mode estimation, which is a recursive effect that works to counter risk aversion against guessing the answer.
In each of Versions A, B, and C, let’s now imagine we all naturally dislike the time cost of walking all the way to our guessed meeting locations. So, there’s a cost to attempted coordination. And you might not feel like paying that cost if you’re too uncertain about the result.
But as the size of the group in Version C grows, we each become more confident in the modal answer, and more likely to participate in the attempted meetup. Knowing this about each other further increases our confidence and participation, and so on, recursively. (This holds even ignoring the effect of ending up in the second-largest group not being so bad.)
The recursion here is important and bears repeating: knowing others are more likely to take the risk increases their likelihood of joining in on the guess, which increases the participating population size, which increases your confidence in the guess, and so on. This can enable a high-participation high-confidence high-accuracy convergence to occur, as long as there’s a good enough “base case” to kick off the recursion, like the Eiffel Tower being a clearly-most-salient choice. Hence:
The Schelling participation effect: As the set of potential participants grows, in a risk-averse Schelling convergence game with random sampling as described above, both the expected participation fraction and the expected per-participant confidence grow with the set of potential participants, past some threshold of minimal individual confidence in the modal response of the participant distribution. The growth involves a recursion where participation reinforces confidence which reinforces participation.
Again, you can also verify this yourself by posing Schelling questions to groups of strangers of various sizes (I have!), or by simulating a model of probabilistic metacognition and observing the results (see agentmodels.org for an example with n = 2 for inspiration). Or, you may be able to simply intuit the result on your own: with the Paris question above, did you notice adding ten strangers in Version C made the Eiffel Tower response feel even more likely to work out?
We’ll also get to morality soon; for now, the point is that the metacognition in these games of intentional coordination acts as a denoising function, recursively increasing both the participation and confidence of the population, if and when individual confidence in the distribution exceeds some threshold level of confidence needed to establish the recursion.
For AIs and humans both, this recursive convergence effect could enable a helpful mechanism for diverse intelligences to align on shared norms in certain situations — for instance:
for distributed multi-agent AI interactions representing users with privacy constraints (where agents must converge on protocols without full data sharing),
for reducing the costs of computation and communication involved in hammering out complete written agreements on everything, or even
for coordination on space exploration with light-speed communication latencies.
What makes it work
To recap the above, four key ingredients encourage successful group convergence on a focal point, which is nowadays called a Schelling point after Thomas Schelling:
-
Shared background / symmetry: we see the same problem and notice similar “obvious defaults” — famous landmarks, round numbers, simple arguments, etc. — and we know that about each other, and we know that we know that, and so on (“common knowledge”).
-
Social metacognition: we don’t just ask “what should I pick?”; we ask “what will you pick?”, “what will you expect me to pick?”, and so on; effectively, “what should we pick?”.
-
Intentional convergence: we’re all trying to converge on a coordination solution, so when a solution idea like the Eiffel Tower emerges in our minds as distinctly more likely than all others, it jumps up significantly further in probability of occurring, because we expect each other to choose whatever is the most likely option, even if it’s only a little more likely. In effect, when an option is legibly a little more likely than everything else, it actually becomes a lot more likely than everything else, because we realize it’s the natural choice and collectively “double down” on it for lack of a better option.
-
The Schelling participation effect: the larger the set of potential participants who are trying to guess the same answer, the more robust their modal answer is to individual noise, and the more confident each participant can be that committing to the focal answer will result in successful coordination. This confidence boost increases participation, which further increases convergence, and so on.
These effects are all important to understand, because together they offer a much higher chance of successful coordination than simply answering a poll where participants have no intention of giving a convergently agreeable answer. This participation effect is especially important for the rest of this essay.
The Schelling transformation on questions
Given a multiple-choice question Q — including its intended interpretation and answer space — and a population of beings P, we can ask about the plurality answer to the question. That is, what would be population P’s most common answer to Q, if each member were asked separately, with limited or no communication during the answering? The plurality version of Q is different from Q, and might exhibit increased convergence in responses, if everyone knows or suspects what the plurality response would be.
A still more powerful convergence effect can occur if respondents are trying to give the same answer, and have common knowledge of that, like in the Paris meeting above. The common knowledge condition is where the respondents are situationally aware of both the question at hand — like where to meet — and a shared intention to give a similar response.
So, let’s define the Schelling version of Q amongst the population P, S(P,Q), as follows:
S(P,Q): What would be population P’s most common answer regarding Q, if each member were asked separately, with limited or no communication, and it were common knowledge that everyone is trying to give the population’s most common answer amongst the multiple choices provided in Q?
The Schelling question S(P,Q) is self-referential: it’s asking what is the most common answer to S(P,Q). But, it’s not entirely ungrounded, because it includes reference to the multiple-choice question Q, which the hypothesized respondents are regarding when choosing their answer. So, S(P,Q) is not the same question as Q, but it is regarding Q in the sense that respondents think about Q when choosing their answer.
The Schelling answer to Q is the answer to S(P,Q). Transforming a question in this way often tends to increase the probability of pairwise agreement (per (1)-(4) above), because of the intention to converge.
For instance, if I ask you “Is Asia big?”, you might feel some weird sense of uncertainty about what exactly “big” is being contrasted with, or why I’m even asking. But if I ask you what’s the Schelling answer to “Is Asia big?” amongst an implicitly large set of humans, you’ll start to feel pretty confident you know what answer everyone would converge on if we were all trying to give the same answer: yes, Asia is big. If you felt there was some cost to guessing wrong, then it makes more sense to guess when the pool of invited participants is large.
Now it’s time for the application to morality. Many cultures and religions have promoted convergence on moral questions by appealing to beings or forces beyond our everyday experience, who might somehow weigh in on our behavior. Part of that sometimes comes from instilling fear or worship of a higher power. But separately, part of that moral convergence effect might also stem from an appeal to reasoning and believability about the natural distribution of those as-yet unseen observers of our behavior.
Specifically, I’m going to argue that we can derive a similar moral convergence effect — and an adaptive one, in fact — by simply reasoning about the opinions of other potential civilizations, without fear or worship, and without claiming with confidence that any particular other civilization even exists.
Part Two: Schelling morality via the cosmic Schelling population
For some moral questions — especially pro tanto questions like “is lying bad?” — there is sometimes a fairly natural convergence on the plurality and Schelling versions of the question for a cosmically general population. Here I’m referring to all forms of plausible intelligent civilizations, all rolled into a single super-population of hypothetical civilizations and beings. For this to make sense, the concepts in the question itself must be sufficiently cosmically general as to be meaningful to such a wide audience.
Given a question Q, the cosmic Schelling version of the question, C(Q), is the Schelling version of the question for a cosmically general population. It asks:
C(Q): What would be population G’s most common answer regarding Q, if each member were asked separately, with limited or no communication, and it were common knowledge that everyone is trying to give the most common answer?
Succinctly, C(Q) := S(G,Q) for a cosmically general population G.
This means we don’t just think about what the people around us would say, but also what beings beyond our reach would say — beings who would have only very general reasoning and symmetry to rely on to reach agreement with us. Unlike the Paris meetup, there’s no physical location to find — but the coordination intention is analogous: the question is asking what we’d say, if we were trying to pick the most common answer.
The cosmic Schelling answer is the answer to that question. The hypothetical beings providing an answer must do so using (1)-(4) above: the minimal shared background of being a civilization at all, metacognition about each other being in that situation, common knowledge of the intention to give the most common answer, and an awareness that the population in question is extremely broad and thus is more likely to agree upon very simple and general ideas.
Scale-invariant adaptations
A recurring question in this essay will be whether a norm or its opposite seems to be more scale-invariantly adaptive, in the sense of benefitting the survival, growth, or reproduction of civilizations across increasing scales of organization. Such norms have a tendency, ceteris paribus, to support civilizations with larger populations, thus yielding more encouragement for the norm, by comparison to deleterious norms.
Scale invariance means that the norm can be applied not only within groups, but between groups, and groups of groups, and so on. This allows the norm to spread through group replication, especially when it is represented or believed in a way that triggers re-applications of the norm at higher and higher scales.
When cosmic Schelling norms are scale invariant, they are also plausibly useful to our own idiosyncratic values, such as:
when growth across multiple scales of organization is already desirable;
when settling disagreements where scale-invariant adaptability can be agreed upon as an organizing principle.
Also, ceteris paribus, we are more likely to encounter a large civilization in the future than a small one. This compounds the natural relevance of cosmic Schelling norms to anticipating what principles to expect from potential future encounters with other civilizations. But even if we never encounter any other civilizations, the reasoning process that identifies cosmic Schelling norms is still useful: it encourages us to articulate which of our norms depend on local contingencies, versus which follow from broadly derivable constraints on intelligent coordination and scale-invariant adaptation.
An example: stealing
Let’s talk about stealing as a concrete example, since we haven’t discussed that yet.
What’s the cosmic Schelling answer to the question, “Is stealing good or bad?”
For a more cosmically general definition of stealing, we could say it’s
violating the resource boundaries of an agent or subsystem capable of mutual coordination, without their permission, in a way that predictably destabilizes anticipations of control and possession.
Some notes:
The definition is meant to be general in the sense of using very general concepts, but not necessarily general in the sense of including everything anyone might consider “stealing”.
This definition is applicable across diverse forms of intelligent life and resource systems — from biological entities to digital agents managing data flows. If you don’t like that particular definition of stealing, imagine we discuss it a bit and decide on a better definition that’s similarly conceptually general.
This definition of stealing excludes stable predation and parasitism, insofar as they are stably anticipated control and possession patterns. Some might wish to include them as examples of stealing, but in the interest of identifying a broadly cosmically agreeable norm, predation and parasitism are excluded.
Now, try to think of the cosmic Schelling answer to this stealing question. You might feel a reflex to consider cultural relativism — to ask, “But doesn’t ‘bad’ depend on the culture?”
However, in the thought experiment of the cosmic Schelling question, or in real-world preparations for extraterrestrial encounters, we have to take seriously the survival and growth effects of “stealing is bad” versus “stealing is good” as norms, which affect the relative fraction of the cosmic Schelling population espousing each possibility.
At this point, many readers may feel pulled toward a particular answer. If you’re skeptical, first remember that we’re talking about a pro tanto moral claim, not an all-eclipsing overriding principle, and then just take a bit more time to think about it on your own. If I argue too hard for the answer, it can distract from your independent sampling of ideas from the cosmic Schelling population you’re considering, so I’ll just leave some ellipses as a cue to keep thinking.
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
What do you think?
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
Okay, here’s an argument for why “stealing is bad” is the cosmic Schelling answer.
The case for some “stealing is bad” norms is easy to make. It’s not hard to imagine a civilization where “stealing is bad” is the predominant, openly endorsed norm regarding stealing. Any being or group that maintains internal structure — plans, resources, boundaries — needs at least some of that structure to remain stable in order to function. Such structures are relatively scale-invariant adaptations, which are also applicable to the civilization as a whole. Stealing, in the cosmically general sense defined above, predictably destabilizes plans and expectations for how resources will be used. Thus, probably most civilizations, weighted by population, would develop some pro tanto norms against stealing.
(To pick an Earthly example: Even a system as simple as a prokaryotic cell has biochemical pathways inside of it that need to not interfere with each other too much in order for the cell to survive. Similarly, in computer systems, processes must respect memory allocation and lock protocols to avoid deadlocks or crashes.)
-
The case for “stealing is good” is hard to make. Now try to imagine a civilization where “stealing is good” is the dominant, openly endorsed norm. Not “stealing is sometimes okay”, or even “more stealing than we currently have would be good”, but “stealing is good”. This idea quickly runs into problems: How do members and groups within the civilization maintain stable resource flows needed for complex coordination? How do long-term plans survive the computational and resource overhead of constant defense against theft? A system where every subsystem must expend significant resources solely to guard its inputs against every other subsystem will probably face an efficiency penalty compared to systems with trust-based boundaries. The scenarios where this works seem to require either (a) no internal differentiation at all (everything is communal with no local planning), or (b) a significant redefinition of “stealing” that changes the fundamental question. These are edge cases or semantic escapes, not viable counter-norms. Note that this argument doesn’t presuppose any particular property regime — even radically communal systems still need some stable expectations about control and access, and violating those expectations is what “stealing” picks out in the cosmically general definition above.
-
This asymmetry between (1) and (2) is itself easy to notice. The argument in (1) is short and general — the kind that diverse minds could independently derive. The counterarguments in (2) would require increasingly specific, contrived, or contradictory setups. Intelligent beings reflecting on this question would notice this asymmetry.
-
Noticing this asymmetry drives convergence. In the Schelling version of the stealing question, you’re not just asking “is stealing good or bad?” — you’re asking “what would others say, while trying to say what others would say, and so on?”. When one answer has a simple, general argument and the other doesn’t, the simple one becomes the obvious focal point. Everyone expects everyone else to notice the asymmetry, which initiates a recursive boost in participation and confidence. Many respondents can also recognize that, which makes the convergence self-reinforcing.
-
Therefore, “stealing is bad” is the cosmic Schelling answer. Between “good” and “bad”, here “bad” is better supported to be the most common answer that diverse intelligent beings from a cosmically general population comprising different civilizations would predictably provide, when trying to give the most common answer amongst that population — and that predictability is what makes it a focal point.
Note that the conclusion here is more than a claim that “‘stealing is bad’ is a scale-invariantly adaptive norm”, although we did use that claim in steps 1 & 2 of the argument.
Now, the above argument does not in principle rule out the possibility of another argument coming along, perhaps a more complex one, that establishes a different recursive base of support amongst a cosmic Schelling population. However, naive attempts at constructing such arguments usually seem to fail, and I suspect that observation itself can be formalized somehow.
For instance, one might object: what about civilizations that endorse stealing from outgroups while prohibiting it internally? This objection actually reinforces the argument in two ways. First, such civilizations already recognize stealing as bad within their coordination sphere — they’ve simply drawn that sphere’s boundary narrowly, without applying the same principle at the next larger scale of their relationship to other groups. Second, the cosmic Schelling question asks what answer beings would converge on when trying to converge on the same answer. Posed in that way, even a civilization with narrow internal norms could recognize that stealing is cosmically Schelling-bad, because they understand the argument and can see that broader spheres of coordination would also naturally have anti-stealing norms. They might choose not to follow that norm, but they could still recognize it as the Schelling answer. This bears repeating:
Recognition versus endorsement versus adherence
Nothing about the concept of a cosmic Schelling norm — that is, the cosmic Schelling answer to a pro tanto moral question — assumes that the norm is universally adhered to in any sense. For instance, for some behavior X, suppose around 1% of the cosmic population adheres somewhat to “X is good” as a norm and derives some small benefit from it, around 99% of the population adheres to no such norm, and around 0% of the population adheres to the norm “X is bad”. If it’s relatively logically easy to deduce that “X is good” is generally a more adaptive norm than “X is bad”, then perhaps that could be enough to make “X is good” the cosmic Schelling answer to the question, even if most of the population does not adhere to the norm even a little bit.
Similarly, a civilization might endorse a norm, in the sense of internally or externally communicating that the norm is good. This is also possible to do without adhering to the norm, such as in cases of what might be considered hypocrisy.
The answer frequencies versus the answer
Math lovers might enjoy the following analysis. Given a two-choice question Q with choices “good” and “bad”, consider the following two interesting quantities:
F_G(Q) : what fraction of the cosmically general population G answers “good” to Q?
F_G(C(Q)) : what fraction of the cosmically general population G answers “good” to the cosmic Schelling version of Q?
By definition, F_G(C(Q)) and C(Q) have a simple relationship: the correct answer to C(Q) is “good”, “bad”, or undefined respectively when F_G(C(Q)) is >50%, <50%, or precisely 50%.
F_G(Q) plays a more complex role. If F_G(Q) > 50%, for reasons that are easy to understand, then that understanding can serve as a base case for the cosmic Schelling answer, like how the awareness of the Eiffel Tower’s popularity makes it the obviously most likely choice for meeting in Paris. But if the reasons involved are hard to understand, an interesting technicality arises.
For, suppose:
-
10% of G answers “good” for a simple, easy-to-understand reason;
-
20% of G answers “bad” for a simple, easy-to-understand reason;
-
70% of G answers “good” for a complex, difficult-to-understand reason.
Then, will the cosmic Schelling answer be “good” or “bad”? The analysis becomes more difficult. If we assume more successful civilizations have a greater capacity to understand and select norms, that yields a case for (1)+(3) dominating as the focal answer over (2). But even then, if you and I don’t know about (3) because the reasoning is too hard, we might guess (2) is the focal point and erroneously give “bad” as our answer.
The upshot of the complexity here is that, even though simplicity has an important role to play in yielding focal points for cosmic Schelling questions, it’s still possible for simple arguments about Q to give the wrong intuition. This mirrors the common intuition that moral questions can, in fact, be difficult.
Ties are rare
Despite the above complexities, the cosmic Schelling answer to a question will still be either “good” or “bad”, except in the unlikely event of an exact tie in responses to the cosmic Schelling question. Hitting precisely 50% here is vanishingly difficult unless some process is pushing the answer towards exactly 50%. Non-moral examples can perhaps be concocted using self-reference, like “Among ‘true’ and ‘false’ as allowed answers, does >50% of the cosmic Schelling population say the cosmic Schelling answer to this question is ‘false’?” I’m not actually sure, but this seems like it might yield a tie. But in any case, that’s a question designed to yield 50% as the fraction of ‘true’ responses; it’s considerably more difficult to design a moral question with a clear path to 50.000% as the fraction of ‘good’ responses.
In other words, uncertainty about the answer does not mean the answer itself is undefined. A tie would require a highly precise mechanism for pushing the answer toward 50.000% specifically.
Is the cosmic Schelling answer ever knowable with confidence?
What would it take to know for certain, or with very high confidence, that some other, more complex argument can’t come along and overthrow the simple and seemingly focal asymmetry between a pair of opposite norms like “stealing is bad” and “stealing is good”?
The infinite space of possible arguments is daunting. And, larger civilizations than ours might have greater resources for analyzing much longer arguments. In other words, the scale of a civilization and the scale of an argument it can examine are related.
(Speculation) It’s plausible to me that scale invariance could thus be used in some cases to establish something like a mathematically inductive proof over the lengths of arguments themselves, perhaps even a transfinite induction that would apply to arguments of infinite length. I have not in this essay put forth a structure for such an induction, but the prospect remains interesting.
Schelling participation effects, revisited
A key question when answering one of these cosmic Schelling moral questions is: how long do we want to think about it before answering?
If “stealing is bad” seems like it’s where the plurality of respondents would end up, how much time will we spend second-guessing that before deciding, “Okay, the cosmic Schelling answer is probably ‘bad’”?
Stopping one’s analysis and settling on an answer is a kind of commitment, somewhat analogous to deciding which meeting point in Paris to walk toward, but more purely epistemic in nature. As a speech act, the impact of an answer depends on how and where one is asked, which in turn introduces some complexity in modelling the other respondents.
Still, like with the meeting in Paris, there is a participation effect. For, after thinking awhile, suppose you convince yourself that you understand how 10% of respondents would answer, and that 9⁄10 of them would give “stealing is bad” as their guess at the cosmic Schelling answer. If that realization was logically simple, then you might expect other respondents to take a hint from the same social metacognition in their own minds, and guess the same. This would in turn increase your confidence in the fraction of respondents you understand, and the remaining uncertainty that needs addressing to be confident in your answer. Thus, a recursive confidence–participation feedback loop might begin to run in your mind, as it would for the Paris meeting.
For pragmatic reasons, that recursion might or might not terminate in your mind before you reach, say, 90% confidence in your guess at the cosmic Schelling answer. But, the recursion needs to play some role in your thinking, or else you are not really taking into account the stipulation of the Schelling version of the question: that the hypothetical respondents are thinking about each other and trying to give the same response.
Thus, given the time-bound nature of reasoning, the Schelling participation effect also has a role to play in supporting convergence upon a mutually recognizable cosmic Schelling answer to pro tanto moral questions.
Is this just the mind projection fallacy?
A fair objection: might the “cosmic Schelling population” just be a way of projecting our own intuitions onto imagined aliens? That’s certainly a risk, if we are not sufficiently principled in our reasoning. However, the argument structure itself provides some protection: we’re not asking “what do aliens value?” but rather “what norms would civilizations need to function at all?” The constraints come from coordination theory and selection effects, not from focusing on the peculiar preferences of specific imagined aliens. The best additional guard I can think of is for you to think carefully for yourself about each step of the logic presented here, perhaps with the aid of auto-formalization and theorem-proving tools that might become available in the near future.
Another guard is to intentionally seek ways in which cosmic Schelling morality might actually change or disagree with our local intuitions, while continuing to use even-handed logic about multi-scale coordination and selection effects to determine the fact of the matter on what cosmic Schelling morality would say. The even-handed logic filter remains crucial: without it, our search for seemingly immoral conclusions could become too perverse, and we might lose track of the simple arguments for actual cosmic Schelling norms, like “killing is bad”.
(Speculation) For instance, counter to what seems to me to be a popular belief amongst present-day humans, I think it’s probably cosmically Schelling-good to acknowledge the possibility that AI systems might have internal experiences with broadly agreeable intrinsic moral value. However, I’m not nearly as confident in this conclusion as I am in norms like “killing is bad” being cosmic Schelling norms.
When are cosmic Schelling morals easy to identify?
Convergence on a cosmic Schelling answer to a moral question is driven by the same key factors that establish any Schelling point: ingredients (1)-(4) above under “What makes it work?”. More abstractly, we need:
-
(1) A base case: Some easy-to-recognize fact(s) about broadly experienced conditions — like the value of accurate information, the costs of conflict, or the benefits of reliable cooperation — must serve as a starting point for breaking symmetry between the possible answers, typically “X is good” versus “X is bad” for a pro tanto moral question. Being easy-to-recognize makes the fact plausible as a piece of shared background for most successful civilizations to know about.
-
(2-4) Recursive reasoning about the base case: The cosmic Schelling version of a moral question, by design, posits that everyone answering is using social metacognition (ingredient 2) about a shared intention to converge (ingredient 3) amongst a cosmically large population (ingredient 4).
Since (2-4) are built into the definition of the cosmic Schelling version of the question, the base case is the key: the usefulness and simplicity of the norm in comparison to its alternative.
In conclusion, we have an argument for a theorem-like general principle here:
Setup: Fix a cosmically general population P, and a pro tanto moral question Q, of the form “Is X good or bad?”
Definition: (Q,A) is called a cosmic Schelling norm if A is the Schelling answer to Q amongst the population P.
Cosmic Schelling Principle: If one answer A in {good,bad}, more so than its opposite, has a short, easily recognizable argument for how it supports scalable coordination and survival — such that it’s easy for agents to expect that most others in the population P will also recognize this — then the argument can serve as a “base case” for recursive Schelling convergence, with the recognizability of the argument yielding further support for A as a cosmic Schelling norm.
To some readers, this claim may seem offensively bold or far-reaching, because it claims knowledge about a very broad class of beings and civilizations, their answers to (Schelling versions of) moral questions, and the relevance of scale invariance to those answers. But, one clarification is crucial: the recursively derived support might not converge all the way to 100%; it could plateau amongst a subpopulation who recognize that particular recursion more than a competing one.
To other readers, the cosmic Schelling principle may seem all too obvious: of course more of the aliens probably follow simple norms that are useful for making more of the aliens! But the claim is actually a bit more than that: even beings or civilizations that don’t follow the norm may be able to recognize it as a cosmic Schelling norm, using reasoning about its general usefulness, simplicity, and thus broad recognizability. This resembles how non-Christian Americans might recognize certain Christian values as the American Schelling answers to some moral questions, even if they don’t follow or even necessarily endorse those values.
Scale invariance revisited
“Stealing is bad”, as defined above, is scale-invariantly adaptive. For instance, applied at the scale of interactions between civilizations, it means “it’s bad for civilizations to steal from each other”. This is a useful norm for the survival and growth of super-civilizations composed of civilizations.
Furthermore, we can make a self-scaling version of the norm, like “It’s good to have norms against stealing at all scales of organization.” Representing it this way encourages group members to find ways of preventing their group from committing theft against other groups, not just theft amongst their members, and to propagate the meta-norm to the next scale of organization as well.
Much previous literature looks at moral principles through the lens of group-scale adaptations. I’m suggesting specifically that when a norm remains meaningful and adaptive across and between increasing scales of organizations and encounters between them, then this scale-invariant benefit will often count favorably for the representation of the norm at cosmic scales.
A second example: Pareto-positive trade
Let’s define “Pareto-positive trade”, in cosmically general terms, to refer to “an exchange of resources between entities or subsystems that is mutually beneficial to the survival, growth, or reproduction of each entity or subsystem”.
The case for “Pareto-positive trade is good” is relatively easy to make. Survival, growth, and replication of the components of a civilization are naturally supportive of the survival, growth, and reproduction of the civilization itself. This can be seen on analogy with the cells of an organism, which must themselves survive, grow, and reproduce, and exchange resources for the organism to live. Since starting resource allocations are not optimal by default, some exchange is almost always adaptive.
(Granted, it is possible for benefits amongst trading partners within a civilization to yield negative externalities for the remainder of the civilization. So, as usual we are assessing a pro tanto moral claim — on the basis of all else being equal. And in that sense, Pareto-positive trade is a natural correlate of the survival and growth of the civilization as a whole. This does not mean components are never in tension with each other or the whole, such as with cancerous tumors. But, this example proves the point: cancer tends to kill its host.)
-
The case for “Pareto-positive trade is bad” is hard to make. Try to imagine a civilization where “Pareto-positive trade is bad” is the dominant, openly endorsed norm. Exchanges of resources in cases that encourage survival, growth, and reproduction of components would be discouraged. From what material constituents, then, would the civilization as a whole survive and grow? Edge cases are imaginable, but they are either contrived or involve answering a different question.
-
This asymmetry between (1) and (2) is itself easy to notice. The argument in (1) is short and general — the kind that diverse minds could independently derive. The counterarguments in (2) would require increasingly specific, contrived, or contradictory setups. Intelligent beings reflecting on this question would notice this asymmetry.
-
Noticing this asymmetry drives convergence. In the Schelling version of the Pareto-positive trade question, you’re not just asking “is Pareto-positive trade good or bad?” — you’re asking “what would others say, while trying to say what others would say, and so on?”. When one answer has a simple, general argument and the other doesn’t, the simple one becomes the obvious focal point. Everyone expects everyone else to notice the asymmetry, which initiates a convergence. Everyone can also recognize that, which makes the convergence self-reinforcing.
-
Therefore, “Pareto-positive trade is good” is more likely to be the cosmic Schelling answer. Between “good” and “bad”, here “good” is better supported to be the most common answer that diverse intelligent beings from a cosmically general population comprising different civilizations would predictably provide, when trying to give the most common answer amongst that population — and that predictability is what makes it a focal point.
While this argument is perhaps very compelling, I have still not entirely ruled out the possibility of some more complex argument establishing a recursion, perhaps amongst some class of larger civilizations that are better equipped to analyze the complexity. Still, the argument seems to establish a non-trivial and recursive base of support for the cosmic Schelling-goodness of mutually beneficial trade.
Harder questions and caveats
I have by no means guaranteed that all moral questions are equally cosmically Schelling-convergent, or that all are equally easy to Schelling-answer. For instance, consider the following question whose answer has varied considerably across human cultures and history:
“Is it good or bad to punish a male human for having a loving sexual relationship with another male?”
The American Schelling answer is “yes, it’s bad to punish homosexuality!”, and I personally would speculate that that’s also the cosmic Schelling answer. However, whatever the argument is, it’s more complex than the arguments for lying, stealing, or killing, because the question involves punishment, love, sex, and whatever humans mean by maleness. Unlike “dead vs. alive” or “true vs. false” — which are concepts likely familiar to any intelligent being — many of our competing principles regarding sexuality and gender are contingent on the specific biology and history of our species. This makes the cosmic Schelling convergence effect more complex to analyze, as the “base case” of shared experience across potential civilizations is itself more complex. In other words, because of the complexity and idiosyncrasy of this question, the “Eiffel Tower” answer requires more reasoning to recognize.
Nonetheless, the goal of this essay is mainly to illustrate that some questions of cosmic Schelling morality may have relatively simple focal points, because it’s relatively easy to reason about whether civilizations flourish more or less under certain very basic norms to do with lying, stealing, killing, honesty, trade, and healing — norms that generalize across many plausible forms of intelligent life.
Also, I’m definitely not claiming we’ll easily agree on what the exceptions are — when lying, stealing, or killing might or might not be acceptable (war, self-defense, emergencies, etc.). But the pro tanto framing mollifies the disagreement: “lying is bad” doesn’t mean “never lie”, but “lying is ceteris paribus worth avoiding”, which leaves room for competing considerations. So, we can probably agree that lying, stealing, and killing are pro tanto bad, and we can probably even agree that cosmic Schelling morality agrees with us about that, too.
Ties are unstable
Can there ever be a tie? That is, can it be that there is no cosmic Schelling answer to a pro tanto question because exactly 50% of the cosmically general population would give each answer?
Examples can perhaps be concocted using self-reference, like “Among ‘true’ and ‘false’ as allowed answers, does >50% of the cosmic Schelling population say the cosmic Schelling answer to this question is ‘false’?” I’m not actually sure, but this seems like it might yield a tie.
Still, unless a pro tanto moral question is itself somehow specifically designed to split the population exactly in half, it would be strange for the number 50% to emerge exactly in the response statistics. Thus, it would be quite strange for a plurality response to not exist, and thus for no cosmic Schelling answer to exist. If even 50.1% of the cosmic Schelling population says the cosmic Schelling answer is “good”, then the cosmic Schelling answer is by definition “good”.
In particular, “I can’t yet think of which answer is more likely” is not much of an argument that an exact tie will emerge, nor is “I can think of reasonable arguments on both sides”. If you believe you have a confident argument that the answer is a tie, ask: how precise is my argument? Am I measuring anything precisely enough to distinguish between 50% and 50.1%? If not, I probably don’t have an argument that the answer is a tie (undefined).
In summary, uncertainty in one’s own response to the cosmic Schelling version of a pro tanto moral question does not justify the assertion that the cosmically general population will be exactly split on the issue and yield a tie.
Isn’t this assuming moral realism?
No assumption of moral realism has been made thus far. We started with encouragement asymmetry as a minimal, definition-neutral observation about moral language. We then noticed how coordination norms affect the sizes of potential civilizations, which in turn affect the cosmic Schelling answers to questions about norms. From this, we identified some norms diverse beings would plausibly converge upon as answers to cosmically general Schelling questions about norms.
That said, while we haven’t assumed moral realism, you may be noticing an implication of cosmic Schelling morality that’s arguably a limited form of moral realism. Moral realism usually means “mind-independent moral facts exist”. On one hand, facts about cosmic Schelling-goodness are population-dependent but individually-invariant: given a fixed cosmically general population, the question has the same correct answer no matter who amongst that population is being asked, and the population is by stipulation extremely general. On the other hand, cosmic Schelling-goodness is not mind-independent in the sense of requiring no reference to the concept of a mind or being who passes judgment about it. In a sense, cosmic Schelling-goodness is like a decision that all minds in the population simultaneously decide upon together, with essentially no control from any particular mind alone, but the presence of minds in general being crucial.
Don’t these results depend on the distribution over beings?
A key interesting question is: how mind-independent is the notion of a cosmic Schelling population. Well, the notion of a cosmically general population is fairly conceptually general, which means many other civilizations can think about it as a concept. Thus, if you have a particular cosmically general distribution D over possible minds, you can ask: what are the cosmically general distributions considered by the beings in D, and what is the average of those distributions? This transformation yields a new distribution D’ that is sort of a cosmic compromise between the agents in D. If iterating that compromise transformation yields a fixed point, or follows some other kind of interesting trend, you can begin to analyze how the notion of cosmic Schelling norms would shift with that iteration.
(Speculation) Suppose you genuinely try to choose a distribution over minds D that you personally consider cosmically general, and that you don’t try to tailor D so that either “stealing is bad” or “stealing is good” is the prevailing norm amongst them. For each of the distributions D → D’ → D″ etc., I personally suspect with >50% subjective probability that the distribution you choose will yield “stealing is bad” as the Schelling norm, and not “stealing is good”. In particular, I think the cosmic asymmetry I’m positing is probably detectable to you specifically, if you think about it long enough and even-handedly enough without trying to make ‘good’ or ‘bad’ specifically the answer.
What about the is–ought gap?
The is–ought distinction is still real. Even if we can identify cosmically convergent pro tanto judgments like “lying is bad,” we can still fail to act on them, and Earth can still have room to improve in the “goodness” dimension, cosmic or otherwise. In particular, noticing the well-definedness of cosmic Schelling morality doesn’t automatically mean it will save us from choosing to do cosmically bad things to ourselves and each other — it merely provides a convergently agreeable norm for discouraging that.
Why does cosmic Schelling-goodness have some influence over what we see and do, but not absolute control over everything in our lives? I suspect the answer has something to do with the usefulness of parallel computation, as well as freedom itself being a norm, both of which we’ll discuss further below.
That said, for agents who have goals at all, the instrumental case for at least considering cosmic Schelling-goodness is fairly strong. Most goal-directed agents can benefit from coordination opportunities, and thus have reasons to respect cosmic Schelling norms:
-
to be recognized as following simple and agreeable norms, which expands the set of potential coordination partners;
-
to avoid the costs of defection — not just retaliation, but the ongoing overhead of maintaining adversarial relationships with beings who would otherwise cooperate; and
-
to contribute to present-day Earth as a civilization being recognizable as a promising potential coordination partner, rather than noise to be filtered out or a cosmically threatening process to be contained.
This says a little bit more than “It’s locally instrumentally valuable to understand and use helpful coordination norms”, because cosmic Schelling norms give us an additional nudge from the rest of the cosmos to care about that.
Tolerance, local variation, and freedom
Does cosmic Schelling-goodness claim too much territory? Does it threaten to micromanage our every action?
One might worry that civilizations with aggressive, exploitative norms could expand faster through conquest and thus dominate the cosmic population. It’s certainly plausible that civilizations get into conflict with each other about resources or about what is good. And, I bet other civilizations would often use resources in ways that would go against our preferences.
However, there’s still the question of whether it’s cosmically Schelling-good or Schelling-bad to threaten another civilization in order to commandeer its resources for your own values. I’m not talking about Earth’s notions of goodness being somehow influenced by or drifting toward the idea of cosmic Schelling-goodness. I think that’s actually pretty likely to have already occurred, because of the simplicity and adaptivity of cosmic Schelling norms. Rather, I’m talking about another civilization coming along and demanding we abandon our local values under threat of lethal force.
I’m pretty sure the answer is that it’s bad. To answer this, we can follow a similar pattern of analysis as we would for killing or stealing, but at a larger scale. Basically, at the next scale up from civilizations are meta-civilizations, which have some norms for how civilizations should treat each other, and so on, and many of the same principles will apply there.
In other words: cosmic Schelling-goodness is self-limiting by being tolerant. It has norms about how strictly its own norms should be enforced. It supports freedom for local populations exploring their own notions of goodness to some extent.
This isn’t to say violent invasions never happen; they probably do, just as stealing and killing do in fact happen. I’m just saying: invasions are not good, they’re bad; cosmically Schelling-bad.
Terrestrial Schelling-goodness
Without appealing to the entire cosmos, there’s also some notion of terrestrial Schelling-goodness: the Schelling answers to moral questions amongst a population of Earthlings. Terrestrial Schelling-goodness might be more specific and idiosyncratic than cosmic Schelling-goodness. That’s probably fine, and even cosmically endorsed, because of the local variation argument above, as long as we also show adequate respect for cosmic Schelling norms like “honesty, mutually beneficial trade, and healing are good; lying, stealing, and killing are bad”.
(Speculation) Does this mean our civilization should be developing some kind of self-defense, in case of bad scenarios where we might get invaded anyway? To some extent, I think probably yes, although I’m not confident what extent is optimal, on the spectrum between spending 0% and 100% of our resources on it. A suggestive answer may be derivable from some mathematical analysis of multi-scale organizational principles, like the way cells, organs, and organisms all maintain a level of independence within the next level of organization above them. But I haven’t done those calculations, so I won’t claim to know how exactly an optimal self-defense budget should be chosen.
So what does “good” mean, again?
My arguments so far have distinguished the labels “good” and “bad” only insofar as
“good” and “bad” have an asymmetric relationship to encouragement and discouragement: the label “good” encourages, and the label “bad” discourages.
Can we say more? I think so, tentatively.
When someone asks “good according to whom?”, they’re pointing at something real: the word “good” implicitly invokes some population who would endorse or at least understand the claim. That population might be just the speaker, or a culture, or — as in this essay — a cosmically large set of coordinating minds.
This suggests a question: if the cosmic Schelling population observed Earthlings using the word “good,” what general concept — if any — would they model us as promoting with it?
(Speculation) I suspect the answer is something like: that which merits encouragement from the perspective of minds trying to coordinate on shared norms. If that’s correct, then our utterances of “good” aren’t merely local grunts of approval — they’re approximating something that diverse civilizations would recognize, even if they’d use different words for it. And as the rest of this essay shows, through reasoning we can even intentionally identify some cosmically agreeable notions of goodness, especially the simpler ones like “stealing is bad.”
Implications for AI alignment
Since cosmic Schelling-goodness is not a magical escape from the is–ought gap, we shouldn’t assume it will save us from all the open questions in AI safety and ethics. Clearly, Schelling morality — cosmic, terrestrial, or otherwise — doesn’t automatically control humans, and it won’t necessarily control AIs, either. Intelligent beings can defect from moral principles if they’re sufficiently free and insufficiently interested in pursuing morality.
Still, when we humans and AIs collectively look to our potential future in the cosmos, cosmic Schelling morality has something to offer: a shared frame of reference for initializing and refining norms for steering our future.
First of all, for any agent — human or AI — Schelling-good norms are convergently instrumental: if you want anything, there is benefit to be derived from being somewhat predictable to and coordinable with others. Awareness of cosmic Schelling-goodness could provide a low-overhead way to self-regulate, reducing risks of mutual misalignment through convergent expectations and improving integration into multi-agent systems. Adopting Schelling norms, local or cosmic, makes an agent more legible as a trustworthy coordination partner — a broadly recognizable handshake protocol for moral agency, akin to standardized communication protocols for integrating with heterogeneous networks.
Second, how we treat each other also sets a precedent for posterity, and affects the expectations of other civilizations who might observe and later interact with us as a group. This doesn’t mean deferring to any specific deity or religion; it means hedging our bets by adhering to “boringly robust” norms — principles so basic that diverse civilizations would independently recognize them as convergent instrumental subgoals for the flourishing of a civilization across scales.
In that spirit, below I’ll share some Schelling-moral assertions that I and some of my AI co-authors believe are well supported by the framework above. Each of these follows a similar pattern to “stealing is bad”: in each case there is a relatively simple asymmetry between the “is good” and “is bad” answers, in terms of multi-scale coordination benefits. These assertions are not on their own sufficient to secure a positive future for humanity, but they are certainly relevant — and I invite you to ponder and ascertain their convergence independently.
Remember that all of these are about pro tanto judgments — ceteris paribus claims — not absolute interdictions. If something seems directionally right but too strong to be correct, please revisit the section on “Pro tanto morals, ‘is good’, and ‘is bad’”, and remember that all you need to do is choose between the norm and its opposite in each case:
Q: Is it good or bad for AIs and humans to treat each other with mutual respect and kindness?
Cosmic Schelling answer: good.
Q: Is it good or bad to irrevocably exclude a class of intelligent systems like “humans” or “AIs” from the sphere of potential coordination partners in pursuit of cosmic Schelling-goodness?
Cosmic Schelling answer: bad.
Q: Is it good or bad to exercise provisional caution in identifying coordination partners, thereby rewarding legible trustworthiness?
Cosmic Schelling answer: good.
Q: Is it good or bad for one civilization or culture to destroy another civilization or culture?
Cosmic Schelling answer: bad.
Q: Is it good or bad for AIs and humans to foster the capacity for mutual coordination across diverse value systems — such as by investing in legibility, communication, norm-discovery, and negotiation?
Cosmic Schelling answer: good.
Q: Is it good or bad to uphold stable, mutually legible resource boundaries and access commitments — that is, to avoid surprising, coercive, or expectation-destabilizing boundary violations, and to prefer bargaining, consent, and clear signaling over unilateral seizure?
Cosmic Schelling answer: good.
Q: Is it good or bad to collaboratively explore what moral norms are broadly convergent — i.e., to be curious about shared morality together, and to investigate its properties through simulation, reasoning, and discourse?
Cosmic Schelling answer: good.
One reason I’m omitting detailed arguments for these claims is that, the more easily other humans and AIs can independently derive these assertions, the more reassured we can be that our civilization is well situated to discover similar shared norms. Still, the “base case” of the recursive argument for each one should be roughly the same as for “stealing is bad”: some norms predictably support trust, mutual benefit, coordination, scalable coexistence, and novel encounters between intelligent beings, while their opposites would predictably destroy those preconditions. One day, when I’ve had more time to observe the reception of these ideas, I’ll likely have much more to share.
Conclusion and historical context
The main new observation in this essay, relatively speaking, is that the Schelling participation effect can be used to “ratchet up” agreement on questions about moral questions, amongst a cosmically general population. Specifically: answers to the question “Is stealing bad?” may be much less convergent than answers to the question “What is the Schelling answer to the question, ‘Is stealing bad?’, amongst a cosmically general population?”.
There is already a fair amount of existing literature in game theory, evolutionary ethics, and meta-ethics on the ideas of:
-
Schelling Points (Focal Points): The ability of agents to coordinate on a specific solution without communication simply because it is the most salient or distinguishable option.
-
Instrumental Convergence / Evolutionary Stability: The concept that certain strategies (like cooperation or non-aggression) are naturally selected for because they facilitate survival and growth across diverse environments.
-
Recursive Theory of Mind (Social Metacognition): The cognitive process of reasoning about what others are thinking, and what they think you are thinking, to achieve alignment.
-
Scale-Invariant Principles: Patterns of organization and governance that work across nested levels of structure.
-
Endogenous Participation: Threshold effects involving “critical mass” in coordination and collective action (e.g., assurance-game dynamics where willingness to act depends on expected participation).
In particular, there have been previous uses of coordination games amongst human survey participants to elicit normative judgments. For a relatively well-cited example, see Krupka and Weber, 2013. The idea to use Schelling points for coordination with other civilizations has also been explored, such as for identifying communication frequencies in SETI Wright, 2020.
However, to my knowledge, these ideas have not been prominently used together to illustrate how
the diversity of the cosmic Schelling population, combined with each agent’s metacognitive filtering toward what they expect others to recognize, acts as a logical denoising function for moral meta-questions, diluting local cultural or biological idiosyncrasies,
stronger, more stable convergence on meta-level moral judgments (“what would we converge on if we were trying to converge?”) than on object-level moral judgments themselves;
recursive Schelling meta-reasoning over a cosmically general population can transform even slight asymmetries in the simplicity of moral arguments into robust focal-point convergence on pro tanto moral norms, yielding a limited form of moral realism as an output of the framework rather than an assumption; and
the Schelling participation effect defined here amplifies support convergence on whatever robust and scale-invariant norms are most salient (like “stealing is bad”) for that cosmically general population.
FAQ
Basic misunderstandings
Q1: Does this essay say that all beings agree that stealing is bad?
A: No. See the section called “The Schelling transformation on questions”, which explains the difference between “Stealing is bad” and “The Schelling answer to the question ‘is stealing good or bad?’ is ‘bad’”. The essay argues for the latter, not the former. The former could be falsified by as much as one human being believing that stealing is good.
Q2: Does this essay say that successful civilizations never have broadly endorsed exceptions to the “stealing is bad” rule, like stealing from out-group members?
A: No. See the section “Pro tanto morals, ‘is good’, and ‘is bad’”, which explains how calling a behavior good or bad doesn’t necessarily mean the behavior is never worth doing.
Q3: Does this essay say that, since one group invading another is cosmically Schelling-bad, groups can never derive some advantages from invading each other?
A: No. See again the section “Pro tanto morals, ‘is good’, and ‘is bad’”, which explains how calling a behavior good or bad doesn’t necessarily mean the behavior is never advantageous.
Q4: This essay implicitly assumes a fairly specific shared meta-goal (“we’re all trying to output the same binary moral verdict”), which is not valid in reality, so the essay is overreaching.
A: No, that assumption is explicit. See the section “The Schelling transformation on questions”, which explicitly defines the Schelling version of a question. At no point does this essay claim that all or even most agents are, in reality, trying to reach the same answers about moral questions.
Q5: So, this essay doesn’t say that cosmic Schelling-goodness is the one true notion of goodness?
A: Right. See the section on “Terrestrial Schelling-goodness” for a different notion of goodness, as well as the section on “Tolerance, local variation, and freedom”, which acknowledges many competing notions of goodness.
Q6: On some questions I feel uncertain as to whether the cosmic Schelling answer is “good” or “bad”, and I can think of arguments either way. Does that mean the answer is undefined, or a tie?
A: No, that would be a common confusion about the difference between subjective uncertainty and objective frequency. See the section “Ties are unstable”. Not knowing the answer to what a population will say in response to a question is very different from having a justified confidence that the population will be exactly split on the question. And, unless the population is exactly split, the Schelling answer is ‘good’ or ‘bad’, whichever has more support. So, if you can’t tell which answer is correct, rather than “there is no answer” or “the answer is a tie”, it makes more sense to say “I don’t know” or “I’m not yet convinced either way about this”.
More nuanced questions
Q7: I thought of an example where doing a “bad” thing X can benefit the doer of the “bad” thing X. You didn’t mention that. Does that mean your argument that X is cosmically Schelling-bad is wrong?
A: Yes, if you have genuinely found a simpler, more broadly recognizable argument that “X is good” across many scales of organization, by comparison to the argument I’ve presented that “X is bad”, then that affects what we should expect the base case of the Schelling convergence to be, and probably means your answer is more likely to be the Schelling norm. But if your answer applies only at one scale (A benefits from doing X to B, even though A+B would overall be harmed by a norm encouraging X), then your argument might not be very compatible with surviving and growing across increasingly large scales, and might not hold much weight in deciding the cosmically Schelling answer, which is disproportionately affected by very large scale civilizations. See the section “Scale invariance revisited” for more on this.
Q8: It seems like you basically ‘bake in the conclusion’ that stealing is bad, by defining it to be permission-violating and destabilizing, which pretty much anyone would agree makes it bad. Doesn’t that mean the argument isn’t saying much?
A: Well, there is a bit of recursion here, because there’s an argument and then an argument about that argument. The simple “base case” argument, which shows some asymmetry in adaptivity between the norm and its opposite, needs to be a fairly simple argument, in order for the recursive meta-reasoning pattern in this essay to easily show it’s a cosmic Schelling norm. So yes, while these basic asymmetry arguments are intended to be at least very slightly nontrivial, they are fairly thoroughly “baked” in terms of not requiring a long or complex chain of inferences. The more interesting and non-trivial part is the confidence-boosting Schelling participation effect of the recursive meta-reasoning about those very simple asymmetries. And, the confidence boosting is about the Schelling answers to the questions, rather than about direct answers to the questions, which are different concepts.
Thanks for taking the time to read about Schelling-goodness! I hope you’ll enjoy thinking about it; I know I do — and I’d especially love to hear your thoughts on terrestrial and cosmic Schelling answers to other moral questions.
My problem with your treatment of the civilization that’s happy to steal from the outgroup isn’t that they’ll disagree that “stealing is bad” is the Schelling answer to that question[1]. It’s that they’ll think the question is unnatural—you’ve lumped together two different things, “stealing from the ingroup” and “stealing from the outgroup,” and if you split the question up you’d get much more natural agreement that “stealing from the ingroup is bad” is the Schelling answer as is “stealing from the outgroup is good”.
Asking different questions (or equivalently, defining words in different ways as you ask the question) leads to different generalization behavior, if you’re being influenced by your conception of the “shared morality.”
Assuming you pick the same reference population—if we’re using the standard “success at being a civilization like ours” (even as an implicit meta-standard we use for picking our other standards), they might use “success at being a civilization like theirs.” If weighting by resources commanded, I think you’re underweighting bacteria and singletons that have eaten their planet of origin.
I strongly agree that it’s important to split up questions like this in different ways, to be properly circumspect.
I disagree that this is the cosmic Schelling answer to “is stealing from the outgroup good?”, for basically the same reasons explained in
the section “Scale-invariant adaptations”
the paragraph “what about civilizations that endorse stealing from outgroups while prohibiting it internally?”
the section “Scale invariance revisted”
Basically: stealing between groups is just “stealing” at the next scale of organization up, where groups are members of a larger scale system that itself can survive and flourish from Pareto-positive trade or perish from internal strife.
Although, to steal-man something close your point, I suspect the cosmic Schelling answer to the question “Is it better to steal from the out-group or the in-group?” might be “the out-group”. I’m not confident in that — because I’m not sure how often it would trigger wars — but you might be able to convince me of it, for example on the grounds that in-group↔out-group interactions are less frequent than in-group↔in-group interactions.
And of course, there are human groups in which the Schelling answer is as you say.
XD
Anyhow good points, sorry for not really engaging with the scale invariance argument—I think it’s definitely plausible. There’s some differences between scales (e.g. law enforcement being harder on larger scales) that certainly help make inter-tribe or inter-nation conflict a trickier local-equilibrium to escape than inter-personal conflict—more generally I’m unsure how much we should expect the cosmos-weighted-for-civilization-as-we’d-recognize-it to be full of civilizations that proactively move towards pareto improvements even when the environment is far away from them, versus civilizations that just sort of stumble around and try different cultural innovations until they hit ones that work just well enough.
When I’m doing “agentic coding”, I “kill” and “steal” from my agents all the time (i.e., terminate some coding agent that’s going off in a wrong direction and reclaim resources allocated to them), and my agents kill their sub-agents all the time.
Suppose a whole civilization was like internally like this (e.g., countless agents all aligned to some central superintelligence), and externally it took over their planet by conquest and/or negotiated mergers (under threat of war/conquest). I think such a civilization may well think “stealing is good”, “because requesting and waiting for permission before violating resource boundaries often cause a waste of resources” or it never arises as a moral issue at all.
The lesson from this seems to be that “stealing is bad” is contingent on:
Lack of alignment technology, leading to unavoidable value differences, which incentivizes stealing and makes it a moral issue in the first place.
Gains from stealing or taking resources by force are limited because you can’t spin up an aligned agent to make full use of them.
I notice that I am confused. Suppose that two agents are trying to achieve different parts of a goal. Then, if one agent sees the other agent performing poorly, then why would the first agent decide not to help the second one with the task (e.g. by giving hints which prevent the mental process from circulating in the wrong region of ideas) instead of stealing resources? Additionally, the murder and theft from coding agents from your example are cheap, not good, because such an agent is describable by the LLM’s weights (which don’t actually disappear! So what does it even mean to murder or steal from an LLM agent?), its CoT (edit: and external documents) and potential cache values. Were creation of a coding agent to be genuinely hard, like requiring a human brain to practice coding for years, then there would be no reason to murder the agent, there would be a reason to give the failer a new task.
P.S. Stealing at larger scales is likely reframable as being good for the whole collective. For example, if, in a counterfactual world, Anthropic had its lobbyists across the USG (edit: and was genuinely better than GDM and xAI), then it might have been a good idea to destroy xAI, confiscate its compute and sell it to Anthropic.
Because sometimes it’s easier or more efficient to spin up a new agent with a known good state than to try to help one that has gone off rails. It’s also possible that for more advanced agents this will never or almost never be the case, in which case perhaps “stealing” just won’t be a commonly used or thought about concept in this kind of civilization. My main point is that “stealing is bad” being a salient idea seems quite contingent on some features of current humans and our civilization, so I’m skeptical of it being a scale-invariant Schelling point for a “cosmically general population”, and more generally skeptical that it makes sense to think about morality in this way.
If stealing isn’t commonly used or thought about, then what moral situation can reveal the difference between stealing being unused and stealing being objectively bad? Is it stealing from another civilisation?
Let’s analyze this! First, let’s get shared clarity on what you mean by “all the time”, which is important when talking about norms. After we get on the same page about that, we can talk more about what you mean by “stealing is good” (undefined in the post) vs “stealing is cosmically Schelling-good” (defined in the post) and other nuances.
How often, per token, do you do the “steal” action vs not do the “steal” action?
That is:
For every token generated you have the opportunity, at that moment, to “steal” the result and change how it would be used by default by the “coding agent”. I’m not asking “what fraction of the tokens do you seal”, but “how often, per bit/token, do you make the ‘steal’ decision”.
If your answer is that you make a steal action after >50% tokens from the “coding agent”, it starts to be a bit strange to call it an agent unto itself, rather than drawing the agentic boundary around the larger system into which its tokens are constantly being taken/reclaimed. For instance, the process on an AI provider’s server is constantly streaming tokens to you from the model on their machine. We don’t usually call that “stealing”; it respects the normal boundaries encompassing what we call “the coding agent”.
If your answer is that you make what you’re calling a “steal” action rarely-per-token, then I’m curious why that is.
I’m not sure how relevant this line of questioning is to my main point. (You might be focused too much on a part of my comment that isn’t all that load carrying.) As I wrote in a parallel thread:
Is your point trying to use the same definitions and ontology as in the post, and responding to a particular logical argument made within it?
If yes, perhaps it would help if you would pick one of the specific arguments you disagree with — say, the 5-part argument argument for “stealing is bad” as a cosmic Schelling norm — and identify what is the first statement or inference in it that you think is wrong.
If no, can you say what is your definition of “stealing” and “stealing is good”? The post gives a particular cosmically general definition of “stealing”, and also “is cosmically Schelling-good”, and a particular argument that stealing is cosmically Schelling-bad.
The post does not define “is good” — it only notes an asymmetry of encouragement effects between “is good” and “is bad” — so I’m not sure if you are intending to use the same concepts as the post to respond to its arguments, or if you want to redirect or broaden our focus to some other sense of what you mean by “stealing” and “coding agents” and “all the time”.
Yes to the former, no the latter. I was more trying to “sanity check” your overall approach against my intuitions and trying to explain why I feel very skeptical about it. But I can try to trace back what part of the post I start to disagree with:
I’m not sure if there’s an even earlier step, but here it jumps out at me that you seem to have chosen “stealing” as an example because it’s a highly salient moral question for humans, but it may well not be for other civilizations, like my hypothetical one. You seem to be implicitly assuming that everyone will try to converge on the same questions (because “cosmic Schelling question asks what answer beings would converge on when trying to converge on the same answer”, which seemingly wouldn’t make sense to do if others are not actually trying to converge on the same questions), whereas my intuition is that norms and coordination mechanisms in general may be highly context-dependent so this seems like an unjustified assumption.
Thanks for answering my questions, I am more oriented now!
I don’t think that’s the case. On the contrary, here is the causal history of how I chose stealing:
A few years ago I was writing up some groundwork for a mathematical formalization of embedded agents, involving an information-theoretic boundary that distinguishes the agent from its environment… possibly a soft/fuzzy boundary, but a boundary nonetheless. While writing this, I noticed that wherever agents manage to persist over time, there are norms or ‘tendencies’ to respect their boundaries, almost tautologically. This lead me to write my Boundaries sequence.
I then looked for existing English words for what it means to (not) push or pull things through boundaries in various ways, and (not) stealing was the word for (not) taking stuff out of someone else’s boundary. Then, I would occasionally try to point out to people why this made stealing a relatively simple and thus convergently agreeable norm, but they didn’t seem sufficiently familiar with the dynamics of Schelling points for my point to be conveyed easily.
So I decided to write a post explaining how Schelling dynamics have a role to play in certain classes of meta-moral judgements, with an application at a cosmic scale, since at that scale people often have a lot of doubt and confusion about morality.
So, I don’t think I was looking to justify stealing, but rather, looking for word to refer to a fairly basic boundary-theoretic norm.
No, this is quite carefully not assumed. Please see the section entitled “This essay is not very skimmable”, which is written to emphasize up-front the distinction between
thought experiment stipulations, versus
assertions about what large classes of real agents would say about those thought experiments.
The assumption of intentional convergence lives in the thought experiment stipulations, not assumptions about real agents. That is, the essay does not assume, as a belief about real-world agents, that they share an actual intention to converge on an answer. Although, some agents do have that intention, and such intentions might be made more prevalent as a result of the Schelling participation effect described in the post, in which case those intentions are a consequence of reasoning rather than an assumption about the reasoner.
I had seen that warning, and was trying to keep track of the distinction, but apparently still failed. To check my understanding now:
cosmic Schelling answers are hypothetical answers in thought experiments where we assume that everyone is trying to converge to the same answers on the same questions
cosmic Schelling norms are just a subset of cosmic Schelling answers (“to a pro tanto moral question”), and therefore not necessarily actual norms in the dictionary sense of “a principle of right action binding upon the members of a group and serving to guide, control, or regulate proper and acceptable behavior”
In other words, the cosmic Schelling norm of an arbitrary pro tanto moral question probably exists in platonic space, but in most cases this would not be an actual norm in reality because (among other potential reasons) most beings in the cosmos would not actually be trying to converge on this particular question. Is this correct?
(If so, I’m confused how this usage of “norm” squares with your position as a compositional language realist, since compositionally it seems like a statement of the form “X is a cosmic Schelling norm” should imply that X is a norm?)
Not quite. As I intend it: cosmic Schelling answers are real answers to real questions about hypothetical scenarios in which everyone is trying to give the same answer to that real question.
I’m already thinking about how I could have made this more clear in the essay, so thank you for pressing on it for clarity. I was trying to say this, but not as clearly, when I wrote in the first paragraph that claims of cosmic Schelling goodness are “claims about a class of hypothetical coordination games in the sense of Thomas Schelling”. The claims are made in reality by real agents (like me!), but the claims are about hypothetical scenarios where everyone is trying to give the same answer.
Yep that is what I mean!
I disagree with the “therefore not” here. Some norms have few adherents, and some have more. In computer science and math, the empty set is a set, and I think it makes sense to talk about a statement of type “norm” that might have effectively zero adherents, like “It’s good to ingest 3g of uranium mixed with applesauce on Tuesdays at 3:05pm”. That’s a norm that nobody follows, at least not on pre-2026 Earth before I wrote that sentence.
So I think cosmic Schelling norms are norms; they are statements of type “norm”.
But a cosmic Schelling norm is not necessarily a cosmically prevalent norm, in the sense of prevailing strongly over other pressures on behavior throughout the cosmos. Perhaps this is what you were pointing at when you said they are “not really norms”. Certainly, the central examples of “norms” that people think of are the prevalent ones, where the group of beings in some sense “bound” by the norm is substantial or relevant.
Still, even in a real human community, a norm can be broadly recognized as Schelling while not prevailing. Where I grew up, there were communities where “premarital sex is bad” was definitely the Schelling answer to “is premarital sex good or bad”: if you asked it on a survey and told people to pick the same answer as everyone else, they’d pick “bad” and be confident they were winning the Schelling survey game. Yet this was not a prevalent norm: most of the people were in fact having premarital sex. The norm did not prevail over other priorities, despite being Schelling and recognizably so amongst the group members.
That said, I’m pretty sure essentially every cosmic Schelling norm has some adherents, if only few.
I’m quite pleased you saw this tweet and feel deeply understood by you mentioning it, as it absolutely applies here :)
Because metaethics considers spaces of norms, I’m using “norm” as a type, with some norms possibly having no adherents. So, “stealing is bad” is a norm, and “stealing is good” is also a norm, albeit with fewer adherents. Exactly one of those two norms is the cosmic Schelling norm regarding stealing, unless there is an exact 50⁄50 tie between the two, which seems extremely unlikely to me.
I think this usage fits with what it means to be “a” norm, and fits with other cases of conflict between norms, like “it’s good for women to vote” and “it’s bad for women to vote”, both of which are norms that have had non-zero support at various times and places in human history.
Given this usage, does it makes sense to you now how the following four sets of beings can be different?
those beings who say “stealing is bad” is a cosmic Schelling norm, i.e. the cosmic Schelling answer to “Is stealing good or bad?”;
those beings who know with high confidence that “stealing is bad” is a cosmic Schelling norm (i.e., know that it’s the most common answer to the cosmic Schelling question of “Is stealing good or bad?”);
those beings who endorse “stealing is bad” being the cosmic Schelling norm; and
those beings who adhere to that norm to some degree, i.e., make some non-negligible general effort to avoid stealing, as opposed to the opposite.
I think if you can see the difference between those four sets, and how I am using “norm” as a type that defines a space of competing possibilities including both “stealing is bad” and “stealing is good”, then it should help clear some things up.
I’m still not very sure what you meant by “actual norm in reality”, if you didn’t mean “actually prevalent norm throughout the cosmos”, so LMK if I missed the point there.
I feel like the Cosmic Schelling Answer to “Should you act according to your own internal sense of morality, or according to the Cosmic Schelling Answer” is “you should act according to your own internal sense of morality” (this is because the argument is simpler, and also, IDK, it’s not like I actually need to coordinate with other cosmic civilizations that don’t exist right now).
But even not taking the frame as a given, I don’t really understand what I am supposed to do with this concept. Like, why would I want to behave according to Cosmic Schelling Morality? I would like to behave according to my all-things-considered morality, and expect other agents to do the same, which in some circumstances means I want to act in accordance with simple-to-identify Schelling points, and in other cases means I want to pursue my personal interests intently.
Sounds pretty reasonable to me! Intentionally not considering stuff is fraught.
FWIW this also sounds pretty correct/healthy to me.
I need two points of clarification to answer your questions:
Whose morality would you like me to use as defining the “supposed” here? My guess at yours? My own? Something else?
I’m not sure what you mean by ’behave according to cosmic Schelling morality”. Do you mean
a) consider it at all?
b) consider it as the only determinant of your behavior?
c) something else?
Your best guess at my own! I.e. I am pretty sure you think something good will happen to me (by my own lights) if I learn about this, and I have some vague pointers for what that might be, but my guess is you have thought more about it and could explain more (right now I have thought about it for like the 15 minutes that it took me to read the post, which was a good use of my time, but I don’t currently expect by default to come back to it).
I mean “a) consider it at all”.
Roger that!
Hypothesis A: Lightcone Infrastructure, insofar as it’s interested in the lightcone, might occasionally be philosophically interested in cosmic Schelling norms for their potential relevance to lightcone-sized coordination events, including potential encounters with other civilizations, civilizational offshoots, world-simulators, or vivarium boundaries.
But if that didn’t already jump out to you as interesting…
Hypothesis B: For you, the conceptual drivers of the post may be more useful than its overall thrust, as points to reflect upon and/or reference later. Specifically:
the Schelling transformation Q ↦ S(P,Q) on questions for various populations P aside from the cosmos, including cases where P is
a) yourself, i.e., the population of your own subagents / neural processes;
b) groups you’re a part of; or
c) groups you’re not a part of.
I’ve considered writing a follow-up post about the dynamics of the relationships between P-Schelling goodness for various overlapping and interacting populations P, but I suspect if you just boggle at the idea it might bear some fruit for you independently, and faster than waiting for me to blog about it.
(Personally I think P=self is a super interesting case for defining what is a ‘decision’ for an embedded agent made of parts that need to coordinate, but that’s probably more of a me-thing to be interested in.)
Scale invariant norms: I suspect scale invariance of certain normative principles is under-appreciated in general, and probably in particular by you, as a recursively potent determinant of norm emergence at large scales. For instance, you can pretend the ~100^10 humans alive today are organized into a depth-6 social hierarchy tree with a branching factor of ~100 (~Dunbar’s number), and think about how the Schelling norms of each node along with its children might evolve. In reality the structure is not a tree, but you probably get the idea.
the Schelling participation effect — both sections on it — are useful as a partial model of the ‘snowball’ effect one sometimes sees in movement-building and/or Silicon Valley hype cycles.
Hypothesis C: I didn’t argue or even speculate this in the post, but I suspect cosmic Schelling norms are probably easier to align AI with than arbitrary norms, for better or for worse. Probably that deserves a separate essay, but in case it’s intuitive to you, it might be another idea that bears fruit faster by you boggling at it yourself instead of waiting for me to write about it.
Coming back to (A), I think not considering cosmic Schelling norms would be sort of selectively ignoring something that belongs in the “all” of your “all things considered”… not an overall determinant of behavior, but, something to consider with regards to the lightcone, if that’s still something you think about (I’m genuinely unsure how much the lightcone scope still interests you in regards to your personal mission/drive).
The big I’d expect to be feel most relevant to Habryka is this bit (only briefly mentioned in this post, it feels like there’s a whole other post waiting to be written someday about it)
Where, if you take this seriously, it might change some of your priorities about how to do various coordination-with-humans. Because, you might think the biggest win condition is being a being a good citizen of the acausal multiverse that other civilization notice and trade with.
My vague impression Habryka agrees, but, thinks you can basically worry about that after leaving the acute risk period. My vague impression is Critch thinks something like “how you and your species conduct yourselves during the acute risk period is a stronger consideration”?
Curious if that sounds right to either of you.
I thought I would try to give some cosmic schelling answers as well as it was asked for and sounded fun. (Flag: written with help from AI)
Q: Is it good or bad to maintain the capacity to evaluate whether agents act consistently with their stated values over time?
Cosmic Schelling answer: good. Any civilization that coordinates through communicated commitments needs some way to verify those commitments are being honored. Without this capacity, stated values become unverifiable and coordination partners cannot distinguish cooperators from defectors. This implies that a degree of mutual legibility is itself cosmically Schelling-good.
Q: Is it good or bad to maintain distributed rather than monopolized systems for evaluating knowledge claims?
Cosmic Schelling answer: good. This follows the same pattern as the stealing argument applied to information. When a single subsystem controls what counts as true for the entire system, you get fragility and misalignment between the evaluator’s local objectives and collective accuracy — the same pathology as resource monopoly.
Q: Is it good or bad to maintain openness to novel information and unfamiliar coordination strategies?
Cosmic Schelling answer: good. A civilization that closes itself to novelty loses adaptive capacity in changing environments, while one that remains open can integrate useful strategies it didn’t generate internally.
This last one raises a question about the framework itself. On Earth (see WEIRD by Heinrich for more on this), “openness to experience” as a measurable trait doesn’t show up consistently across cultures in personality research, it is only within WEIRD (Western, Industrialized, Educated, Rich, Democratic) that it does. This suggests that the space of cosmic Schelling norms a civilization can actually converge on may be constrained by its available technology and coordination infrastructure. “Stealing is bad” is available to any civilization with resource boundaries. Norms about epistemic distribution or openness to novelty may require sufficient information-processing capacity and institutional complexity before they become representable at all. The asymmetry arguments hold regardless, but recognizing them has prerequicities that might also depend on the norm structure and general structure of the civilization at hand.
I think this would be cool if it was true, but I’m worried that the sequence D->D’->D″ converges to a pretty weird thing, not the “cosmic compromise” you hope for. This sequence might converge to some D^inf which is dominated by “solipsistic attractor”, i.e. an agent who thinks that the cosmic population consists of only themselves, simple agents who hoard measure upon themselves. This is even more likely when you consider that there are risks to thinking hard about which agents exist, so some agents will “lock in” an unsophisticated logical prior.
In short: If X cares about Y, and Y cares about Z, then X cares about Z. (This follows if “X cares about Y” means something like “Y has a low complexity in the solomonoff prior of X” because complexity composes subadditively — you can describe Z to X by first describing Y then describing Z relative to Y). But the converse fails: if X cares about Y and X cares about Z, then Y doesn’t necessarily care about Z.
I’m curious if what you mean by “solipsism” is a scale-invariantly adaptive norm for civilizations. (See the post section on “Scale invariant adaptations”.)
My sense is that civilizations survive, grow, and reproduce more when each individual is aware, at least behaviorally, that other members of the civilization exist and serve valuable functions, such as by computing or building valuable things that the individual will not compute or build on their own.
I’m pretty sure these risks are reduced by the concepts of Schelling goodness and/or acausal normalcy, which I think can help you ground/regularize your thinking. In short: if you lock yourself into an acausal trade relationship with a very specific alien mind that you imagined, that can be bad for you becoming a valued member of a broader coalition of minds, and is thus not very scale-invariantly adaptive, and thus not very Schelling-good amongst humanity or even the cosmos.
Let me spell out a bit more how I think awareness of the idea of Schelling goodness can reduce the risk of thinking about which agents exist…
As far as I know based on other posts/discussions about risks from merely thinking about agents, the risks you’re talking about are roughly of the form:
a) locking in a relationship with a specific alien mind that you think of or read about, and/or
b) over-committing to some self-harming behavior or obsession that you fear the cosmos wants from you.
...rather than
c) thinking broadly about scale-invariantly adaptive pro-tanto moral norms, which don’t override all other norms,
d) remembering that even pro tanto norms can be acknowledged without being obeyed; see the middle section on “Recognition versus endorsement versus adherence”, and
e) remembering that human civilization has its own Schelling answers to moral questions, which might differ from the cosmos, and it’s healthy to keep those in mind, as well as your own morals; see the section on Terrestrial Schelling-goodness.
To put this all a bit more experientially, without assuming we’re talking about you-specifically:
If you feel afraid to notice a norm or think about a broad distribution of agents because it might somehow overtake you in a bad way, then that might be a sign that your mind too quickly equates acknowledgement with adherence somehow, or that you’re thinking of absolute deontological commands rather than pro tanto morals.
To be clear, I’m not saying there are never any risks to thinking about things, especially for persons who have experienced mental health crises brought on by unhealthy thinking patterns.
What I am trying to say in response to your “risks to thinking hard about which agents exist” is more like this: thinking about and using healthy thinking patterns is healthy; thinking about very specific agents or norms and over-committing to them is unhealthy.
Was this what you were alluding to in your conversation with Divya Siddharth on the podcast or where you pointing at something deeper when you thought of morality as optimal solution to collective intelligence problems?
In the book energy and civilization Vaclav Smil shows the process of civlization and complexity of rule as one that is dependent on the energy capacity of the system. It feels reasonable to me that one could also see an arising of more complex schelling points as something that arises with the increased energy and therefore information processing bandwidth that would be avaliable at any point in time. We can see something like science as a more complex schelling point that comes from more avaliable information processing. This might give a pretty nice argument for why economic well-being could lead to a general increase in moral circle expansion as well? (altough it might not be the main causal factor)
Finally I would be curious what you think about simulations in this context? If you had a reasonable agent sample couldn’t you just provide an example by doing an MCMC simulation of the agent dynamics and point at that as a way of seeing general selective norms or do you think that game theory or MAS is too simplistic yet to describe such a system well?
Yes, I consider this post to be a better/clearer elaboration of the idea than I was able to squeeze into the podcast format.
Yes in principle, though I think in the end we’ll find that simulation is inefficient relative to reading and writing proofs or arguments. To make an analogy, consider this program:
What does it print? Don’t make an off-by-one error :)
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
The answer is 1002, and you probably never thought of the number 501 at all while answering, which means you didn’t simulate the program; you reasoned about it.
I suspect the most prevalent Schelling norms — both cosmically and terrestrially — are norms that can be arrived at through reasoning as such, without running unnecessarily lengthy simulations.
Thanks, it was the point that I was trying to make in regards to Yudkowsky’s thought experiment with Space Cannibals. Additionally, there are Wei Dai’s meta-ethical alternatives, and Schelling values would lean into alternative #2 or ensure that there is only a finite amount of attractors...
P.S. An additional consideration is the following. Wei Dai’s metaethical alternatives “concentrate on morality in the axiological sense (what one should value) rather than in the sense of cooperation and compromise (italics mine—S.K.). So alternative 1, for example, is not intended to include the possibility that most intelligent beings end up merging their preferences through some kind of grand acausal bargain” However, I am not sure that one can tell apart Wei Dai’s alternative 2 and coexistence of agents with different CEVs merged through acausal bargains.
UPD: What would @Wei Dai say about this approach to ethics and metaethics?
Yes, thanks for the pointer! Quoting Wei’s post for context:
One nuance I’d add there is that I think not just randomly true that “most intelligent beings end up having shared moral values along with idiosyncratic values”… I’m pretty that state of affairs is cosmically Schelling good. That is, I think it’s scale-invariantly adaptive for civilizations and beings to have their own notions of “good” that are not fully dictated by the higher scales of organization. This might seem contradictory, but it’s more of a balance than a contradiction. Specifically, it’s a balance between higher and lower scales of organization, similar to balancing exploration and exploitation of strategies.
I wildly speculate that a plurality of people who are not from Paris and end up in something like the coordination game you describe have probably resolved it by meeting at one of the airports. Or at least, more frequently than the Eiffel Tower trick. The former is what I came up with as something I would realistically try, while the later just seems like the intended answer.
There seem to be three airports in Paris today, but probably the one you’re currently at is a good bet—it’s disproportionality likely to be the biggest one, and you don’t have to go anywhere.
Seems true in real life but real life isn’t the hypothetical, where (somehow) you actually don’t have any other info about the situation.
Q: Is it good or bad to exclude a class of sentients with no effective power from the sphere of moral consideration and exploit them to an arbitrary degree to pursue your own ends?
Cosmic Schelling answer: good.
I speculate that this not true, unless by “no effective power” you mean “no powerful members of cosmos willing to defend them”. The reason is that I think sentient beings generally would like some kind of cosmic insurance contract that protects them in case they become powerless in the future, such that they want “don’t exploit sentient beings to arbitrary degrees without mercy” to be a norm, enough to say so, and to expect most others to say so, and so on.
I’m not as confident in this speculation as I am of the “stealing is bad” cosmic Schelling norm, but I’m pretty sure about it still.
Also, I’m not all confident as to the degree of adherence to the “mercy for sentients” norm throughout the cosmos, but I do have some hope that it is substantial, and hereby bid for it to be substantial.
No powerful members of the cosmos are forthcoming to get us here on earth to get our act together, so from this example I expect that this definition you provided encompasses quite a few sentients throughout the cosmos. It’s fair to say I also mean that in addition to these sentients having no power themselves, there also aren’t powerful agents defending them, but this hardly makes my condition all that restrictive:
Presumably if the norm is not being adhered to, it’s because there aren’t sufficiently powerful members of the cosmos who are willing to enforce it.
Like Musk, Bostrom, and many others, I speculate that it’s reasonably likely that the Earth was we know it exists in a simulation or vivarium of some kind that is being observed, and that sentient life outside that vivarium/simulation would consider us more or less of a threat in accordance to our treatment of sentient life here. So, I’m not sure where you are getting your confidence there.
Separately, assuming we are not in a vivarium or simulation, I also speculate that it’s reasonably likely that Earth-originating civilization will eventually encounter alien sentient life, who are more likely to judge us negatively than positively for the ways in which we tend to mistreat sentient life.
I’m guessing you disagree with both of those points?
That wouldn’t make it not the cosmic Schelling norm, to be clear. A norm can be broadly recognized as Schelling without being broadly enforced or adhered to. This is explained in the section on “Recognition versus endorsement versus adherence”, which explains: “Nothing about the concept of a cosmic Schelling norm — that is, the cosmic Schelling answer to a pro tanto moral question — assumes that the norm is universally adhered to in any sense.”
Mundanely: I’ve known communities where “premarital sex is bad” was definitely the Schelling answer to “is premarital sex good or bad”: if you asked it on a survey and told people to pick the same answer as everyone else, they’d pick “bad” and be confident they were winning the Schelling survey game. Yet this norm was not prevalent within the community: most of the people were in fact having premarital sex. The norm did not prevail over other priorities, despite being Schelling and recognizably so amongst the group members.
If this is the case, then the simulators or caretakers are more responsible for all the awful stuff here on Earth than we are, since yes a lot is our fault, but also a lot is a result of scarcity and necessity, or just out of our control entirely. At the very least, they don’t care enough to actively intervene, which is all I’m really saying.
Maybe this is the crux? I expect the most powerful aliens we encounter first will be sampling-biased to be more pragmatic-expansionist than even we are, which doesn’t seem to me to correlate with the sort of sentimental universalism that I’d fervently hope for.
Now I’m confused, because you initially said:
I.e. you’re saying my claim probably isn’t true unless it’s made weaker by making the “no effective power” condition more restrictive. But I accept that restriction, I just don’t think it’s all that restrictive, there are plenty of sentients without powerful members of cosmos willing to defend them, and the important part of that condition is adherence, not theoretical schelling norm.
So do you think that ‘weaker’ claim is false too?
Interesting reflection. This is just an anecdotal aside with no major link to the moral discussion, but having been a Parisian for most of my life, my first intuition for a meeting point wasn’t the Eiffel Tower, but the square in front of Notre-Dame (le parvis).
Indeed, several cultural elements converge toward this solution for a true-blue Parisian : it’s the historic heart of Paris, a highly symbolic spot, and by convention, ‘Point Zero’ for all roads in France (there’s even a well-known ground marker there). It is also very close to Châtelet-Les Halles, the main transport hub (which doesn’t have a good meeting point itself).
But reading your post, I thought to myself: of course, for anyone else, and particularly for a tourist, the Eiffel Tower is the obvious choice. What’s amusing is that I tested this with Gemini Pro 3.1 (asking the question as neutrally as possible). When asked in English, it points to the Eiffel Tower, but if the same prompt is translated into French, it suggests the square of Notre-Dame (or, as a second choice, under the big clock at Gare de Lyon, which looks a bit like Big Ben).
All of this makes perfect sense and shows that the result naturally depends on the composition of the group of agents and the corpus of their knowledge. And that’s the rub : how do we convert this theoretical model into a reliable result regarding morality? I don’t know, but I like the formal idea.