Trying to distill why strategy-stealing doesn’t work even for consequentialists:
Consider a game between A and B, where at most 1 player can win and:
U_A(A wins)=3, U_A(B wins)=2, U_A(both lose)=0
U_B(A wins)=0, U_B(B wins)=3, U_B(both lose)=0
At time 1, A has a button that if pressed, ends the game and gives 40% chance of both players losing and 60% of A winning. A can press, pass, or surrender (giving B the win). At time 2, the button passes to B, who has the same options with “press” giving 60% chance of winning to B. At time 3 if both passed, they each have 50% chance of winning.
Solving this backwards, at time 2, B should press because that gives U=.6x3 vs .5x3 for passing, so at time 1, A should surrender because U_A(press)=.6x3, U_A(pass)=U_A(B presses)=.6x2, U_A(surrender)=2.
In terms of theory, this can be explained by this game violating the unit-sum (mathematically equivalent to zero-sum) assumption of strategy-stealing. It confuses me that it has significant mind-share among AI safety people, e.g. @ryan_greenblatthere, despite the world in general, and technological races in particular, obviously not being zero-sum. See also my failure to “steal” the strategy of investing in AI companies.
I agree that it’s weird how widely uncritically endorsed the assumption is—in particular it’s often cited as if some kind of result or theorem, when even the original articulation is (not enough as it happened) hesitant!
Unfortunately my guess is the concrete articulation above is not especially catchy or illuminating. I suspect the more abstract gesture at constant-sum might be both more general and more catchy.
My view is something like “if you ~100% solved alignment, then the situation is mostly unit-sum from the perspective of longtermists because they care mostly about long resources and this is mostly unit-sum with a few notable exceptions (e.g. vacuum decay)”. Do you disagree with this claim? I certainly agree that not having solved alignment means you can’t effectively strategy steal and other things can go wrong with strategy stealing especially if you aren’t maximizing expected long run resources. (In general, you in principle may also need to take very aggressive and undesirable actions to defend yourself as part of strategy stealing, like staying in a biobunker while limiting any memetic exposure to the outside world.)
My view is something like “if you ~100% solved alignment, then the situation is mostly unit-sum from the perspective of longtermists because they care mostly about long resources and this is mostly unit-sum with a few notable exceptions (e.g. vacuum decay)”. Do you disagree with this claim?
My impression since reading Robin Hanson’s Burning the Cosmic Commons is that space colonization is closer to a tragedy of the commons situation than unit-sum (as you can kind of infer from the title).
Also there’s always the possibility of large-scale wars that destroy or degrade significant portions of the cosmic endowment. Even if war never happens, the mere possibility implies that the game isn’t unit-sum, and the more altruistic side is unable to “steal” certain strategies of the other side, like threatening mutual destruction as a bargaining tactic.
Also Black-Hole Negentropy, where value scales superlinearly with resources (mass/energy).
space colonization is closer to a tragedy of the commons situation than unit-sum
My current best guess is that this seems possible but pretty unlikely. And that this type of negotiation seems particularly easy given the distribution of values I expect for the actors negotiating (e.g., strongly locust-like values aren’t that likely).
Why isn’t it likely, given that you can “burn” more resources in order to grab a larger share of the lightcone? If you’re saying that the outcome of burning the cosmic commons isn’t likely because everyone will negotiate to avoid it, I’m saying that the game structure itself isn’t zero-sum, which is needed to show that strategy-stealing applies in theory.
And that this type of negotiation seems particularly easy given the distribution of values I expect for the actors negotiating (e.g., strongly locust-like values aren’t that likely).
I do not know of a result, or have the intuition, that if negotiation is “easy” then strategy-stealing (approximately) applies. My intuition is that even in this case (like in my toy game) some parties can credibly threaten to burn down the world (or to risk this), and others can’t, and this gives the former a big advantage that the latter can’t copy. Negotiation is “easy” in my game too (note that the outcome is pareto optimal, and no risky action is actually taken), but the more cautious or altruistic party is disadvantaged.
I don’t currently think you can burn more resources to grab a larger fraction of the light one. Or like, I think the no-negotiation equilibrium burns a small fraction of resources. I don’t feel super confident in this view, but that was my understanding of our current best guess. I haven’t looked into this seriously because it didn’t seem like a crux for anything. Maybe I’m totally wrong!
My cached view is something like “you can send out an absurd number of probes at ~maximal speed given very small fractions of resources, so burning resources more aggressively doesn’t help”.
The following LLM output matches my own understanding:
Ryan’s crux is his “cached view” that you can send probes at nearly maximal speed using very small fractions of resources, so burning extra resources doesn’t help. This violates the physics of relativistic travel.
Because of relativity, kinetic energy scales non-linearly as you approach the speed of light (). The energy required to accelerate an object approaches infinity as its speed approaches .
If Actor A wants to beat Actor B to an uncolonized star system, and Actor B launches a probe at , Actor A must launch at to get there first.
Upgrading a probe’s speed from to , and then from to , requires exponentially more energy for the same payload mass.
Furthermore, if you want your probe to actually do something when it arrives (like decelerate, build infrastructure, and defend itself), it needs mass. To decelerate without relying entirely on ambient interstellar medium, you have to carry fuel for the deceleration phase, which exponentially increases the launch mass required (the Tsiolkovsky rocket equation).
Therefore, Robin Hanson’s “Burning the Cosmic Commons” scenario is physically accurate. In an uncoordinated race for the universe, colonizers must convert almost all available local mass/energy into propulsion to outpace competitors. Securing a larger share of the lightcone absolutely requires burning vastly more resources.
LLM output doesn’t seem nearly quantitative enough. With some numbers of 9s, it surely doesn’t give you a meaningful advantage to go at 0.99...99c rather than merely 0.99...9c — especially when you factor in that it probably takes time to convert energy/mass into the additional speed (most mass will be in between your origin and the farthest reaches of the universe, and by the time some payload have decelerated and started harvesting significant energy from the middle mass, the frontier of the colonization wave will likely already be quite distant). I share Ryan’s guess that you can get close enough to optimum without burning a large fraction of all energy in the universe. (That’s a lot of energy!)
I think you’re right that wasn’t really conclusive. Will try to address your arguments below.
With some numbers of 9s, it surely doesn’t give you a meaningful advantage to go at 0.99...99c rather than merely 0.99...9c
This seems right but you can (probably) still gain a meaningful advantage by sending more colony ships (and war/escort ships) instead of pushing for more speed.
especially when you factor in that it probably takes time to convert energy/mass into the additional speed (most mass will be in between your origin and the farthest reaches of the universe, and by the time some payload have decelerated and started harvesting significant energy from the middle mass, the frontier of the colonization wave will likely already be quite distant)
Are you assuming either that it’s possible to launch colony ships directly across the universe, or that it takes millions/billions of years to fully harvest a star (e.g. using a Dyson sphere while the star burns naturally)? If instead there’s a distance beyond which it’s infeasible or uncompetitive to try to directly colonize, like 10x the average distance between neighboring galaxies, and also possible to quickly harvest a star using direct mass to energy conversion (e.g., via Hawking radiation of small black holes), then the colonies in the middle should have plenty of tempting new targets to try to colonize (before someone else does), at the edge of the feasible range?
I share Ryan’s guess that you can get close enough to optimum without burning a large fraction of all energy in the universe.
I’ll describe a toy model to convey my intuitions here.
Setup
Two players each own 0.5 of Galaxy 1. They compete for Galaxy 2 by consuming their Galaxy 1 resources as colonization effort (c).
Payoff
Player A’s total utility is their retained Galaxy 1 plus their competitively won share of Galaxy 2. U = (0.5 - cA) + cA / (cA + cB).
Solution
To find the Nash Equilibrium, we maximize Player A’s utility by taking its derivative and setting it to zero. Because the game is symmetric, both players will invest equal effort (cA = cB). Solving this yields an equilibrium effort of c = 0.25.
Outcome
Both players sacrifice exactly half of their initial resources (0.25 out of 0.5). Because they invest equally, they split Galaxy 2 evenly (0.5 each). Their final score is 0.75 each.
P.S., what do you think about my earlier points about war and black hole negentropy, which could end up being stronger (or easier to think about) arguments for my position?
It confuses me that it has significant mind-share among AI safety people, e.g. @ryan_greenblatthere, despite the world in general, and technological races in particular, obviously not being zero-sum.
FWIW, I find it useful to think about strategy stealing, and don’t think it has too much mindshare. Not really sure how to productive it is to argue about that though because “too much or little mindshare” seems hard to settle.
despite the world in general, and technological races in particular, obviously not being zero-sum
Just to respond to this in particular: Some situations are close to being zero-sum, and when they’re not, I think it’s often useful to explicitly track the reason why they’re not zero-sum and how that changes the dynamics.
My impression of people invoking strategy stealing is not that they’re actually assuming it holds without argument, but instead interested in specific reasons to believe it fails in a given situation, and (if they agree those reasons are real) often interested in quantifying how significant those reasons are. Ryan’s linked comment seems like an example of this.
Paul’s linked article talks about lots of ways that strategy stealing can fail, many of which aren’t downstream of violating unit-sum. (By my count, only 2 of them are about that.)
You say “even for consequentialists”, but iirc, non-consequentialism only really features in point 11, so that’s just one more.
Just to clarify that you’re not distilling the whole post but just providing an example for 1-2 of the issues.
Trying to distill why strategy-stealing doesn’t work even for consequentialists:
Consider a game between A and B, where at most 1 player can win and:
U_A(A wins)=3, U_A(B wins)=2, U_A(both lose)=0
U_B(A wins)=0, U_B(B wins)=3, U_B(both lose)=0
At time 1, A has a button that if pressed, ends the game and gives 40% chance of both players losing and 60% of A winning. A can press, pass, or surrender (giving B the win). At time 2, the button passes to B, who has the same options with “press” giving 60% chance of winning to B. At time 3 if both passed, they each have 50% chance of winning.
Solving this backwards, at time 2, B should press because that gives U=.6x3 vs .5x3 for passing, so at time 1, A should surrender because U_A(press)=.6x3, U_A(pass)=U_A(B presses)=.6x2, U_A(surrender)=2.
In terms of theory, this can be explained by this game violating the unit-sum (mathematically equivalent to zero-sum) assumption of strategy-stealing. It confuses me that it has significant mind-share among AI safety people, e.g. @ryan_greenblatt here, despite the world in general, and technological races in particular, obviously not being zero-sum. See also my failure to “steal” the strategy of investing in AI companies.
I agree that it’s weird how widely uncritically endorsed the assumption is—in particular it’s often cited as if some kind of result or theorem, when even the original articulation is (not enough as it happened) hesitant!
Unfortunately my guess is the concrete articulation above is not especially catchy or illuminating. I suspect the more abstract gesture at constant-sum might be both more general and more catchy.
My view is something like “if you ~100% solved alignment, then the situation is mostly unit-sum from the perspective of longtermists because they care mostly about long resources and this is mostly unit-sum with a few notable exceptions (e.g. vacuum decay)”. Do you disagree with this claim? I certainly agree that not having solved alignment means you can’t effectively strategy steal and other things can go wrong with strategy stealing especially if you aren’t maximizing expected long run resources. (In general, you in principle may also need to take very aggressive and undesirable actions to defend yourself as part of strategy stealing, like staying in a biobunker while limiting any memetic exposure to the outside world.)
My impression since reading Robin Hanson’s Burning the Cosmic Commons is that space colonization is closer to a tragedy of the commons situation than unit-sum (as you can kind of infer from the title).
Also there’s always the possibility of large-scale wars that destroy or degrade significant portions of the cosmic endowment. Even if war never happens, the mere possibility implies that the game isn’t unit-sum, and the more altruistic side is unable to “steal” certain strategies of the other side, like threatening mutual destruction as a bargaining tactic.
Also Black-Hole Negentropy, where value scales superlinearly with resources (mass/energy).
My current best guess is that this seems possible but pretty unlikely. And that this type of negotiation seems particularly easy given the distribution of values I expect for the actors negotiating (e.g., strongly locust-like values aren’t that likely).
Why isn’t it likely, given that you can “burn” more resources in order to grab a larger share of the lightcone? If you’re saying that the outcome of burning the cosmic commons isn’t likely because everyone will negotiate to avoid it, I’m saying that the game structure itself isn’t zero-sum, which is needed to show that strategy-stealing applies in theory.
I do not know of a result, or have the intuition, that if negotiation is “easy” then strategy-stealing (approximately) applies. My intuition is that even in this case (like in my toy game) some parties can credibly threaten to burn down the world (or to risk this), and others can’t, and this gives the former a big advantage that the latter can’t copy. Negotiation is “easy” in my game too (note that the outcome is pareto optimal, and no risky action is actually taken), but the more cautious or altruistic party is disadvantaged.
I don’t currently think you can burn more resources to grab a larger fraction of the light one. Or like, I think the no-negotiation equilibrium burns a small fraction of resources. I don’t feel super confident in this view, but that was my understanding of our current best guess. I haven’t looked into this seriously because it didn’t seem like a crux for anything. Maybe I’m totally wrong!
My cached view is something like “you can send out an absurd number of probes at ~maximal speed given very small fractions of resources, so burning resources more aggressively doesn’t help”.
The following LLM output matches my own understanding:
Ryan’s crux is his “cached view” that you can send probes at nearly maximal speed using very small fractions of resources, so burning extra resources doesn’t help. This violates the physics of relativistic travel.
Because of relativity, kinetic energy scales non-linearly as you approach the speed of light ( ). The energy required to accelerate an object approaches infinity as its speed approaches .
If Actor A wants to beat Actor B to an uncolonized star system, and Actor B launches a probe at , Actor A must launch at to get there first.
Upgrading a probe’s speed from to , and then from to , requires exponentially more energy for the same payload mass.
Furthermore, if you want your probe to actually do something when it arrives (like decelerate, build infrastructure, and defend itself), it needs mass. To decelerate without relying entirely on ambient interstellar medium, you have to carry fuel for the deceleration phase, which exponentially increases the launch mass required (the Tsiolkovsky rocket equation).
Therefore, Robin Hanson’s “Burning the Cosmic Commons” scenario is physically accurate. In an uncoordinated race for the universe, colonizers must convert almost all available local mass/energy into propulsion to outpace competitors. Securing a larger share of the lightcone absolutely requires burning vastly more resources.
LLM output doesn’t seem nearly quantitative enough. With some numbers of 9s, it surely doesn’t give you a meaningful advantage to go at 0.99...99c rather than merely 0.99...9c — especially when you factor in that it probably takes time to convert energy/mass into the additional speed (most mass will be in between your origin and the farthest reaches of the universe, and by the time some payload have decelerated and started harvesting significant energy from the middle mass, the frontier of the colonization wave will likely already be quite distant). I share Ryan’s guess that you can get close enough to optimum without burning a large fraction of all energy in the universe. (That’s a lot of energy!)
I think you’re right that wasn’t really conclusive. Will try to address your arguments below.
This seems right but you can (probably) still gain a meaningful advantage by sending more colony ships (and war/escort ships) instead of pushing for more speed.
Are you assuming either that it’s possible to launch colony ships directly across the universe, or that it takes millions/billions of years to fully harvest a star (e.g. using a Dyson sphere while the star burns naturally)? If instead there’s a distance beyond which it’s infeasible or uncompetitive to try to directly colonize, like 10x the average distance between neighboring galaxies, and also possible to quickly harvest a star using direct mass to energy conversion (e.g., via Hawking radiation of small black holes), then the colonies in the middle should have plenty of tempting new targets to try to colonize (before someone else does), at the edge of the feasible range?
I’ll describe a toy model to convey my intuitions here.
Setup
Two players each own 0.5 of Galaxy 1. They compete for Galaxy 2 by consuming their Galaxy 1 resources as colonization effort (c).
Payoff
Player A’s total utility is their retained Galaxy 1 plus their competitively won share of Galaxy 2. U = (0.5 - cA) + cA / (cA + cB).
Solution
To find the Nash Equilibrium, we maximize Player A’s utility by taking its derivative and setting it to zero. Because the game is symmetric, both players will invest equal effort (cA = cB). Solving this yields an equilibrium effort of c = 0.25.
Outcome
Both players sacrifice exactly half of their initial resources (0.25 out of 0.5). Because they invest equally, they split Galaxy 2 evenly (0.5 each). Their final score is 0.75 each.
P.S., what do you think about my earlier points about war and black hole negentropy, which could end up being stronger (or easier to think about) arguments for my position?
IIRC someone I know tried to look into this at some point (at least the physics). I’ll see if I can learn what they found.
FWIW, I find it useful to think about strategy stealing, and don’t think it has too much mindshare. Not really sure how to productive it is to argue about that though because “too much or little mindshare” seems hard to settle.
Just to respond to this in particular: Some situations are close to being zero-sum, and when they’re not, I think it’s often useful to explicitly track the reason why they’re not zero-sum and how that changes the dynamics.
My impression of people invoking strategy stealing is not that they’re actually assuming it holds without argument, but instead interested in specific reasons to believe it fails in a given situation, and (if they agree those reasons are real) often interested in quantifying how significant those reasons are. Ryan’s linked comment seems like an example of this.
Paul’s linked article talks about lots of ways that strategy stealing can fail, many of which aren’t downstream of violating unit-sum. (By my count, only 2 of them are about that.)
You say “even for consequentialists”, but iirc, non-consequentialism only really features in point 11, so that’s just one more.
Just to clarify that you’re not distilling the whole post but just providing an example for 1-2 of the issues.