This is Dr. Andrew Critch’s professional LessWrong account. Andrew is the CEO of Encultured AI, and works for ~1 day/week as a Research Scientist at the Center for Human-Compatible AI (CHAI) at UC Berkeley. He also spends around a ½ day per week volunteering for other projects like the Berkeley Existential Risk initiative and the Survival and Flourishing Fund. Andrew earned his Ph.D. in mathematics at UC Berkeley studying applications of algebraic geometry to machine learning models. During that time, he cofounded the Center for Applied Rationality and SPARC. Dr. Critch has been offered university faculty and research positions in mathematics, mathematical biosciences, and philosophy, worked as an algorithmic stock trader at Jane Street Capital’s New York City office, and as a Research Fellow at the Machine Intelligence Research Institute. His current research interests include logical uncertainty, open source game theory, and mitigating race dynamics between companies and nations in AI development.
Andrew_Critch
I actually think there are more people like that on LessWrong; the disagreement score on both their comment and this post has been going up and down a lot, so I think there is high variance. That is, I think there is a genuinely divisive conflict on LessWrong as to whether radicalization to the degree of calling AI developers “murderers” is good thing. (My position: it’s clearly a hyperbolic and thus irresponsible and/or bad-faith way to use the term ‘murder’.)
bc clearly it was designed to simulate the behavior of a pre-agi civ?
No? This makes no sense to me, unless you define “computer” to mean “computer built by a future human civilization”, which is a weirdly human-centric definition of computer.
Here’s a weirdly specific scenario to help illustrate why:
Suppose I use a 2040 MacBook to build an alien-like digital world with novel 13-tentacled lifeforms in it, that are not a simulation of anything that I believe exists. The aliens in that digital world have their own computers that look nothing like MacBooks, but I’m still quite interested in what they’ll do with their alien computers, and if they’ll make AGI. The computers run on “greenstone” circuits that are more like Minecraft redstone circuits than electrical transistors, but are actually different from both (not a simulation of either). The creatures then discuss, in their own way of communicating, as follows:
Creature 1: “If we’re in a computer, it’s sure to be a simulation.”
Creature 2: “Not necessarily. Even if we’re in a computer, it could be in some kind of digital vivarium that’s not a simulation of anything, just a computational world with artificial lifeforms created to live within it (us).”
Creature 1: “But if we’re in a computer, clearly it was designed to simulate the behavior of a pre-agi civ?”
Creature 2: “No? If we’re in a vivarium, its creators may be a wholly different civilization, for whom we are not a simulation of anything. Like, who knows, maybe they only have 4 limbs! Unless by ‘computer’ you specifically mean one of these 13-tentacle-operated devices that we-specifically built from greenstone circuits… then sure, yeah, our future civilization would probably be running the simulation in that case. But that’s a weirdly us-specific definition of ‘computer’ don’t you think?”
I of course agree that warning people about their enemies can be helpful to the people being warned.
I don’t agree that’s what Eliezer is doing with Amodei and Hegseth. Across multiple posts, he is promoting enmity toward both of them. Here he is saying that Hegseth’s position is even worse than Amodei’s: https://x.com/allTheYud/status/2027357747960766554?s=20
It’s of course logically consistent to say two people have bad positions and that one is worse. The strange thing is to be doing that while performatively “helping” Hegseth by “warning” him that AI leaders would “discard him like used toilet paper”. Taken in that context, the effect is more like symmetrically creating enmity between the two camps, as opposed to encouraging a positive resolution to the conflict.
I’ll also reiterate that Eliezer is far from alone in this pattern of promoting enmity between leaders in and around AI. He merely offers an easy-to-analyze public example of the pattern.
I get that a lot of people use “simulation” and “computer program” as basically synonyms, but that’s a bit linguistically impoverished for the hard work of analyzing distinct metaphysical hypotheses and their consequences. Consider that ideal behavior is different in
a) computer worlds that are built to mimic an existing world, in which case our “job” is be similar to whatever is “out there” that we’re a simulation of, versus
b) computer worlds that are built to be a de novo home for a new kind of life or being, such as for entertainment or exploratory science.
(a) is a “simulation”, and if I discover I’m in a simulation, I might just go ahead an act like I’m not, to help the simulator with their intended purpose to mimic something that’s not in a simulation.
(b) is not a “simulation”, and if we ever discover we’re in one of those, I might look for other instructions for what I’m supposed or expected to do, from the creator(s). I call this a ‘digital vivarium’ but I’m open to other terms, just not ‘simulation’ which fails to correctly draw the distinction of, well, not being a simulation of anything.
if we are ultimately running on a computer, then wouldn’t that mean that we are a simulation?
Not really, if the computer program we’re inside is not designed to simulate anything.
Could we be running on a computer and also be a vivarium
Yes, because artificial life is still life.
Promoting enmity and bad vibes around AI safety
You seem to be assuming the value of grass aggregates without bound as a function of the amount of grass. Why wouldn’t there be diminishing marginal value to grass, as the amount of grass increased?
Inspired by a conversation below with @Wei Dai about the particular nesting of hypotheticals defining cosmic Schelling goodness, I just added the following clarifying section on Reality vs Hypotheticals:
Because there are so many more possible viviaria than simulations. Like, a simulation is simulating something, which involves a kind of match between what’s in the simulation and what it’s supposed to represent/simulate. If it contains life and life is the object of study, it’s a vivarium, and I think most simulations are vivaria, but most vivaria are probably not simulations.
We might hope that successful civilisations have norms that generalize to this case from the causal coordination they do.
Yep, I think in the end we’ll settle on irreducible-computation-causation as the main notion of causation relevant to morality, and then there won’t be a special case to be made about the causal/acausal distinction.
Can you separate these into separate comments, so people can vote separately on them?
Can you make these separate comments? Otherwise people can’t vote on them.
What’s something you believe, that would get negative Karma if earnestly expressed in a normal LessWrong conversation? Write it in quotes. Vote on the meta-claim “would get negative karma” using ✔️/ X, where ✔️ = yes this would get negative karma, and X = no this would get positive or nonnegative karma.
Nice. Apropos, I’ve found words like “almost” or “approximately” are useful for saying something has relatively low moral worth without the fraught implication that the worth is literally zero. (Equalling precisely zero is a rare event with strong logical consequences.)
E.g.:
-
“grass has almost no moral value”, versus just “grass has no moral value”
-
“grass boundaries are nearly worthless”, versus just “grass boundaries are worthless”.
My sense is that people don’t acknowledge these caveats out of fear that someone will try to force them to debate about the magnitude of the near-zero value of something like grass. I think the key is to feel/be secure and ready to say, if they try to force debate, “Sorry, I don’t want to debate the magnitude; it’s positive and near zero.” and then just move on from the topic.
-
Personally I think the world as we know it is more likely to be in a vivarium than a simulation, though many of the same principles apply in terms of there being a powerful outside force that can shut down Earth-originating civilization if it appears to have bad morals. And yes, people writing proofs about us without simulating us is another pathway for our actions to matter separately from their naive causal consequences. And, although fewer people understand the proof-theoretic angle, do I think the effect is even stronger than the simulation effect because simulations are just a special case of proofs.
“well, for this particular definition of ‘steal’, this doesn’t really make it unpredictably costly for them to defend their resource boundaries because they don’t have resource boundaries or resource rights at all atm”.
Yes, this is an important point which I didn’t get into that deeply in my reply.
I’d need a more actively compelling reason to think it applied to grass.
Question: By “applied to grass”, do you mean “applied at all to grass”, or “applied as much to grass as to humans”, or something else?
I’m asking because I agree boundary protection norms apply less strongly to protecting grass than to protecting humans (both terrestrially and cosmically), but conflating small numbers with zero is quite fraught in terms of the moral implications as things scale up. Even saying “I round the value of grass to zero for attentional cost reasons” is different from saying “there is literally zero moral value in protecting grass”, and they take about the same number of words.
Thanks for bringing up this example! Comparing S(P,Q) and S(P’,Q) — i.e., the Schelling versions of question Q for populations P and P’ — is particularly interesting and important when one is a member of both P and P’. Real-world decision-making, to the extent Schelling dynamics matter, involves balancing the important of these Schelling answers across different scales of organization. Figuring out that balance is where a lot of the hard work of moral reasoning comes from, I think.
(That’s also the main reason I defined C(Q) via a more general population-dependent operator S(P,Q), so future conversations about this stuff can more easily talk about multiple populations at once.)
Critch thinks something like “how you and your species conduct yourselves during the acute risk period is a stronger consideration”?
I wouldn’t argue hard for it being “a stronger consideration” because I think that’s a harder question.
But I would argue hard for it being “a consideration”, especially if someone was like “boo this is worthless and should round to zero”. And, if someone finds the “cosmic” framing distracting, I would also argue the following, in terms of real-world relevance this century:
-
humanity and AI are both less likely to wind up in mutually destructive conflict if we both pay non-zero attention to scale-invariant moral principles and their Schelling-ness at various real-world scales.
-
individual groups of humans are less likely to get into needlessly destructive AI-powered wars with each other if we pay some non-zero attention to scale-invariant moral principles and their Schelling-ness at various real-world scales.
-
humanity and AI are both less likely to face simulation shutdown if we pay nonzero attention to scale-invariant moral principles and their Schelling-ness at various real-world scales.
These would primarily not be arguments to override all considerations, but to be considerations at all.
-
Your best guess at my own!
Roger that!
I don’t really understand what I am supposed to do with this concept.
Hypothesis A: Lightcone Infrastructure, insofar as it’s interested in the lightcone, might occasionally be philosophically interested in cosmic Schelling norms for their potential relevance to lightcone-sized coordination events, including potential encounters with other civilizations, civilizational offshoots, world-simulators, or vivarium boundaries.
But if that didn’t already jump out to you as interesting…
Hypothesis B: For you, the conceptual drivers of the post may be more useful than its overall thrust, as points to reflect upon and/or reference later. Specifically:
the Schelling transformation Q ↦ S(P,Q) on questions for various populations P aside from the cosmos, including cases where P is
a) yourself, i.e., the population of your own subagents / neural processes;
b) groups you’re a part of; or
c) groups you’re not a part of.
I’ve considered writing a follow-up post about the dynamics of the relationships between P-Schelling goodness for various overlapping and interacting populations P, but I suspect if you just boggle at the idea it might bear some fruit for you independently, and faster than waiting for me to blog about it.
(Personally I think P=self is a super interesting case for defining what is a ‘decision’ for an embedded agent made of parts that need to coordinate, but that’s probably more of a me-thing to be interested in.)
-
Scale invariant norms: I suspect scale invariance of certain normative principles is under-appreciated in general, and probably in particular by you, as a recursively potent determinant of norm emergence at large scales. For instance, you can pretend the ~100^10 humans alive today are organized into a depth-6 social hierarchy tree with a branching factor of ~100 (~Dunbar’s number), and think about how the Schelling norms of each node along with its children might evolve. In reality the structure is not a tree, but you probably get the idea.
-
the Schelling participation effect — both sections on it — are useful as a partial model of the ‘snowball’ effect one sometimes sees in movement-building and/or Silicon Valley hype cycles.
Hypothesis C: I didn’t argue or even speculate this in the post, but I suspect cosmic Schelling norms are probably easier to align AI with than arbitrary norms, for better or for worse. Probably that deserves a separate essay, but in case it’s intuitive to you, it might be another idea that bears fruit faster by you boggling at it yourself instead of waiting for me to write about it.
I mean “a) consider it at all”.
Coming back to (A), I think not considering cosmic Schelling norms would be sort of selectively ignoring something that belongs in the “all” of your “all things considered”… not an overall determinant of behavior, but, something to consider with regards to the lightcone, if that’s still something you think about (I’m genuinely unsure how much the lightcone scope still interests you in regards to your personal mission/drive).
I would like to behave according to my all-things-considered morality,
Sounds pretty reasonable to me! Intentionally not considering stuff is fraught.
which in some circumstances means I want to act in accordance with simple-to-identify Schelling points, and in other cases means I want to pursue my personal interests intently.
FWIW this also sounds pretty correct/healthy to me.
I need two points of clarification to answer your questions:
I don’t really understand what I am supposed to do with this concept.
Whose morality would you like me to use as defining the “supposed” here? My guess at yours? My own? Something else?
Like, why would I want to behave according to Cosmic Schelling Morality?
I’m not sure what you mean by ’behave according to cosmic Schelling morality”. Do you mean
a) consider it at all?
b) consider it as the only determinant of your behavior?
c) something else?
Why the focus on “reasons”?
Many things exist from causes that are not “reasons” in the sense of a decision-maker choosing something with an objective. All reasons are causes, but not all causes are reasons. For example, reproduction is a process that creates a lot of things without “reasons” in the central case of the word referring to something “reasoning”.
And, if you wonder what caused you (or us) to exist, a good contender is “a causing-things-to-exist maximizer”.