But the SI consequentialists feel instead like a world model plus a random seed that identifies a location and a world model plus a random seed that happens to match the external world’s random seed, and this feels much smaller than a world model plus a random seed that happens to match the external world’s random seed.
Solomonoff induction has to “guess” the world model + random seed + location.
The consequentialist also have to guess the world model + random seed + location. The basic situation seems to be the same. If the consequentialists just picked a random program and ran it, then they’d do as well as Solomonoff induction (ignoring computational limitations). Of course, they would then have only a small amount of control over the predictions, since they only control a small fraction of the measure under Solomonoff induction.
The consequentialists can do better by restricting to “interesting” worlds+locations (i.e. those where someone is applying SI in a way that determines what happens with astronomical resources), e.g. by guessing again and again until they get an interesting one. I argue that the extra advantage they get, from restricting to interesting worlds+locations, is probably significantly larger than the 1/(fraction of measure they control). This is because the fraction of measure controlled by consequentialists is probably much larger than the fraction of measure corresponding to “interesting” invocations of SI.
(This argument ignores computational limitations of the consequentialists.)
they have to run simulations of all kinds of possible universes to work out which ones they care about and where in the multiverse Solomonoff inductors are being used to make momentous decisions
I think that they have an easy enough job but I agree the question is a little bit complicated and not argued for in the post. (In my short response I was imagining the realtime component of the simulation, but that was the wrong thing to be imagining.)
I think the hardest part is not from the search over possible universes but from cases where exact historical simulations get you a significant prediction advantage and are very expensive. That doesn’t help us if we build agents who try to reason with Solomonoff induction (since they can’t tell whether the simulation is exactly historically accurate any better than the simulators can) but it could mean that the actual universal prior conditioned on real data is benign.
(Probably this doesn’t matter anyway, since the notion of “large” is relative to the largest 1% of universes in the universal prior or something---it doesn’t matter whether small universes are able to simulate us, if we get attention from very big universes some of whose inhabitants also care about small universes. But again, I agree that it’s at least complicated.)
An asymmetry like this seems necessary because specifying the input channel accounts for pretty much all of the complexity in the intended model.
The consequentialist can optimize to use the least awkward output channel, whatever it is.
They get the anthropic update, including a lot of info about the choice of universal prior.
They can focus on important decisions without having to specify what that means.
Realistically the “intended model” is probably also something like “find important bitstrings that someone is trying to predict with the universal prior,” but it would have to be able to specify that in really few bits in order to compete with the malicious model while the consequentialists are basically going to use the optimal version of that.
they perhaps need to simulate their own universe to work out which plausible input/output channels they want to target
Their goal is basically to find a simple model for their physics. I agree that in some universes that might be hard. (Though it doesn’t really matter if it’s hard in 90% of them, unless you get a lot of different 90%’s, you need to be saying that a pretty small fraction of possible worlds create such simulations).
if they do this then all they get in return is a pretty measly influence over our beliefs, (since they’re competing with many other daemons in approximately equally similar universes who have opposing values)
I don’t think this works as a counterargument---if the universal prior is benign, then they get lots of influence by the argument in the post. It it’s malign, then you’re conceding the point already. I agree that this dynamic would put a limit on how much (people believe that) the universal prior is getting manipulated, since if too many people manipulate it the returns drop too low, but the argument in the post then implies that the equilibrium level of manipulation is such that a large majority of probability mass is manipulators.
Also, I have a different view than you about how well acausal coordination works, I’d expect people to make an agreement to use this influence in service of compromise values, but I understand that’s a minority view.
The “manipulative” program is something like: run this simple cellular automaton and output the value at location X, one every T steps, starting from step T0. It doesn’t encode extra stuff to become manipulative at some point, it’s just that the program simulates consequentialists who can influence its output and for whom the manipulative strategy is a natural way to expand their influence.
(I’m happy to just drop this, or someone else can try to explain, sorry the original post is not very clear.)
I didn’t have something in mind (other than what I wrote in the post), I just don’t understand your argument at all and I suspect I’m missing something.
Your argument sounds analogous to: “Simulating the universe requires simulating the Earth in 1997. So the shortest program for simulating the universe must be longer than the shortest program for simulating the Earth in 1997.”
If manipulating our world involves getting good at predicting it along the way, it can’t be encoded as a shorter program than one that simply predicts our world.
A version of this argument might show that it can’t be encoded as a faster program, but a short program can run a long program as a subroutine.
Unnatural output channel: essentially the same thing applies to the “intended” model that you wanted Solomonoff induction to find. If you are modeling some sequence of bits coming in from a camera, the fact that “most input channels just start from the beginning of time” isn’t really going to help you. What matters is the relative simplicity of the channels that the simulators control vs. channels like “the bits that go into a camera,” and it’s hard for me to see how the camera could win.
Computational constraints: the computations we are interested in aren’t very expensive in total, and can be run once but then broadcast across many output channels. For similar reasons, the computational complexity doesn’t really restrict what output channels they can use. A simulator could simulate your world, collect the bits from your camera,, and then send those bits on whatever output channel. It’s not obvious this works exactly as well when there is also an input channel, rather than in the Solomonoff induction case, but I think it does.
Unnatural input channel: Seems like the same thing as the unnatural output channel. I haven’t thought as much about the case with an input channel in general, I was focusing on the universal distribution itself, but I’d be surprised if it changed the picture.
How large a magnet would you need to be noticeable at a reasonable distance? (E.g. how large a magnet would you need for us to have noticed it at a depth of 30 feet or whatever?)
How expensive is the cheapest magnet of that size, and will they last 500M years?
Intuitively, I wouldn’t expect that you can use magnets to significantly increase the visibility of the beacon.
I’m not very clear on the underlying geology here. Are there points on the earth where a large beacon deposited on the surface would have a chance of remaining on the surface for hundreds of millions of years?
I don’t think I’m getting your point here. Personally it seems safe to say that >80% of the contingent of my moral parliament that cares about astronomical waste would say that if our universe was capable of 10^(10^120) operations it would be at least 100x as valuable as if was capable of only 10^120 operations. Are your numbers different from this? In any case, what implications are you suggesting based on “no domination”?
I might have given 50% or 60% instead of >80%.
I don’t understand how you would get significant conclusions out of this without big multipliers. Yes, there are some participants in your parliament who care more about worlds other than this one. Those worlds appear to be significantly harder to influence (by means other than trade), so this doesn’t seem to have a huge effect on what you ought to do in this world. (Assuming that we are able to make trades that we obviously would have wanted to make behind the veil of ignorance.)
In particular, if your ratio between the value of big and small universes was only 5x, then that would only have a 5x multiplier on the value of the interventions you list in the OP. Given that many of them look very tiny, I assumed you were imagining a much larger multiplier. (Something that looks very tiny may end up being a huge deal, but once we are already wrong by many orders of magnitude it doesn’t feel like the last 5x has a huge impact.)
I don’t understand this part at all. Please elaborate?
We will have control over astronomical resources in our universe. We can then acausally trade that away for influence over the kinds of universes we care about influencing. At equilibrium, ignoring market failures and friction, how much you value getting control over astronomical resources doesn’t depend on which kinds of astronomical resources you in particular terminally value. Everyone instrumentally uses the same utility function, given by the market-clearing prices of different kinds of astronomical resources. In particular, the optimal ratio between (say) hedonism and taking-over-the-universe depends on the market price of the universe you live in, not on how much you in particular value the universe you live in. This is exactly analogous to saying: the optimal tradeoff between work and leisure depends only the market price of the output of your work (ignoring friction and market failures), not on how much you in particular value the output of your work.
So the upshot is that instead of using your moral parliament to set prices, you want to be using a broader distribution over all of the people who control astronomical resources (weighted by the market prices of their resources). Our preferences are still evidence about what others want, but this just tends to make the distribution more spread out (and therefore cuts against e.g. caring much less about colonizing small universes).
Isn’t “how they should deliberate/learn/self-modify” itself a difficult philosophical problem (in the field of meta-philosophy)? If it’s somehow easier or safer to “give them an AI that doesn’t defer to them about how they should deliberate/learn/self-modify” than to “give them an AI that doesn’t defer to them about philosophy” then I’m all for that but it doesn’t seem like a very different idea from mine.r
I still don’t really get your position, and especially why you think:
It seems to me that the best way to avoid both of these outcomes [...] is to make sure that the first advanced AIs are highly or scalably competent in philosophy.
I do understand why you think it’s an important way to avoid philosophical errors in the short-term, in that case I just don’t see why you think that such problems are important relative to other factors that affect the quality of the future.
This seems to come up a lot in our discussions. It would be useful if you could make a clear statement of why you think this problem (which I understand as: “ensure early AI is highly philosophically competent” or perhaps “differential philosophical progress,” setting aside the application of philosophical competence to what-I’m-calling-alignment) is important, ideally with some kind of quantitative picture of how important you think it is. If you expect to write that up at some point then I’ll just pause until then.
Another intuition pump here is to consider a thought experiment where you think there’s 50⁄50 chance that our universe supports either 10^120 operations or 10^(10^120) operations (and controlling other universes isn’t possible). Isn’t there some large coalition of total utilitarians in your moral parliament who would be at least 100x happier to find out that the universe supports 10^(10^120) operations (and be willing to bet/trade accordingly)?
I totally agree that there are members of the parliament who would assign much higher value on other universes than on our universe.
I’m saying that there is also a significant contingent that cares about our universe, so the people who care about other universes aren’t going to dominate.
(And on top of that, all of the contingents are roughly just trying to maximize the “market value” of what we get, so for the most part we need to reason about an even more spread out distribution.)
Yeah I didn’t make this clear, but my worry here is that most humans won’t choose to “deliberate/learn/self-modify” in a way that leads to philosophical maturity (or construct a new AI with greater philosophical competence and put it in charge), if you initially give them an AI that has great intellectual abilities in most areas but defers to humans on philosophical matters.
There are tons of ways you could get people to do something they won’t choose to do. I don’t know if “give them an AI that doesn’t defer to them about philosophy” is more natural than e.g. “give them an AI that doesn’t defer to them about how they should deliberate/learn/self-modify.”
I guess with measure-based utilitarianism, it’s more about density of potentially valuable things within the universe than size. If our universe only supports 10^120 available operations, most of it (>99%) is going to be devoid of value under many ethically plausible ways of distributing caring-measure over the space-time regions within a universe.
I agree, but if you have a broad distribution over mixtures then you’ll be including many that don’t use literal locations and those will dominate for “sparse” universes.
I can see easily how you’d get a modest factor favoring other universes over astronomical waste in this universe, but as your measure/uncertainty gets broader (or you have a broader distribution over trading partners) the ratio seems to shrink towards 1 and I don’t feel like “orders of magnitude” is that plausible.
Some people seem to think there’s a good chance that our current level of philosophical understanding is enough to capture most of the value in this universe. (For example, if we implement a universe-wide simulation designed according to Eliezer’s Fun Theory, or if we just wipe out all suffering.) Others may think that we don’t currently have enough understanding to do that, but we can reach that level of understanding “by default”. My argument here is that both of these seem less likely if the goal is instead to capture value from larger/richer universes, and that gives more impetus to trying to improve our philosophical competence.
I agree this is a further argument for needing more philosophical competence. I personally feel like that position is already pretty solid but I acknowledge that it’s not a universal position even amongst EAs.
They’re not supposed to be related except in so far as they’re both arguments for wanting AI to be able to help humans correct their philosophical mistakes instead of just deferring to humans.
“Defer to humans” could mean many different things. This is an argument against AI forever deferring to humans in their current form / with their current knowledge. When I talk about “defer to humans” I’m usually talking about an AI deferring to humans who are explicitly allowed to deliberate/learn/self-modify if that’s what they choose to do (or, perhaps more importantly, to construct a new AI with greater philosophical competence and put it in charge).
I understand that some people might advocate for a stronger form of “defer to humans” and it’s fine to respond to them, but wanted to make sure there wasn’t a misunderstanding. (Also I don’t feel there are very many advocates for the stronger form, I think the bulk of the AI community imagines our AI deferring to us but us being free to design better AIs later.)
I agree that’s a concern for a small map (especially if the interpretation is complicated).
For a large payload, I’m not nearly as concerned about that. Maybe I’m too optimistic.
I don’t understand why you think that the expectation should be orders of magnitude larger for other universes. The model “like utilitarianism, but with an upper bound on # of people” seems kind of wacky, maybe it gets a seat in the moral parliament but I don’t think it’s the dominant force for caring about astronomical waste. For non-counting-measure utilitarianism, I don’t see either why the models concerned about astronomical waste would assign larger universes an overwhelming share of our caring-measure.
It also feels to me like you are 2-enveloping wrong if you end up with a 100x ratio here. (I.e., if you have 10% probability on a model where there two are equal, I don’t think you should end up with 100x.)
Overall it seems like a fairly safe conclusion that the part of you that is attracted by the idea of preventing astronomical waste (or a large fraction of that part of you) probably shouldn’t stop at just preventing astronomical waste in this universe.
I you put 50% on a theory that cares overwhelmingly about infinite universes and 50% on a theory that cares about all universes, the thing to do is probably still to prevent astronomical waste in this universe, so that we can later engage in trade or spend the resources exploring whatever angles of attack seem useful. Maybe this is the kind of thing you have in mind, but it’s a notable special case because it seems to recommend the same short-term behavior.
trying to preserve and improve the collective philosophical competence of our civilization, such that when it becomes possible to pursue strategies like ones listed above, we’ll be able to make the right decisions.
I agree that if we don’t eventually reach philosophical maturity (or end up on an approximately optimal philosophical trajectory) then we won’t capture most of the value in the universe. It seems like that conclusion doesn’t really depend on infinite universes though (e.g. a utilitarian might be similarly concerned about discovering how to optimally organize matter), unless you think this is the main way our preferences might not be easily satiable.
The best opportunity to do this that I can foresee is the advent of advanced AI, which is another reason I want to push for AIs that are not just value aligned with us, but also have philosophical competence that scales with their other intellectual abilities, so they can help correct the philosophical errors of their human users (instead of merely deferring to them), thereby greatly improving our collective philosophical competence.
This doesn’t seem related to recent discussions about philosophical competence and AI, since itis about wha we want AI to do eventually rather than what you want to do in the 21 century (I’m not sure if it was supposed to be related).
I agree that literal total utilitarianism doesn’t care about any worlds at all except infinite worlds (and for infinite worlds its preferences are undefined). I think it is an unappealing moral theory for a number of reasons (as are analogs with arbitrary but large bounds), and so it doesn’t have much weight in my moral calculus. In particular, I don’t think that literal total utilitarianism is the main component of the moral parliament that cares about astronomical waste.
(To the extent it was, it would still advocate getting “normal” kinds of influence in our universe, which are probably dominated by astronomical waste, in order to engage in trade, so it also doesn’t seem to me like this argument would change our actions too much, unless we are making a general inference about the “market price” of astronomical resources across a broad basket of value systems.)
Is your more general point that we might need to make moral trades now, from behind the veil of ignorance?
I agree that some value is lost that way. I tend to think it’s not that large, since:
I don’t see particular ways we are losing large amounts of value.
My own moral intuitions are relatively strong regarding “make the trades you would have made from behind the veil of ignorance,” I don’t think that I literally need to remain behind the veil. I expect most people have similar views or would have. (I agree this isn’t 100%.)
It seems like we can restore most of the gains with acausal trade at any rate, though I agree not all of them.
If your point is that we should figure out what fraction of our resources to allocate towards being selfish in this world: I agree there is some value lost here, but again it seems pretty minor to me given:
The difficulty of doing such trades early in history (e.g. the parts of me that care about my own short-term welfare are not effective at making such trades based on abstract reasoning, since their behavior is driven by what works empirically). Even though I think this will be easy eventually it doesn’t seem easy now.
The actual gains from being more selfish are not large. (I allocate my resources roughly 50⁄50 between impartial and self-interested action. I could perhaps make my life 10-20% better by allocation all to self-interested action, which implies that I’m effectively paying a 5x penalty to spend more resources in this world.)
Selfish values are still heavily influenced by what happens in simulations, by the way that my conduct is evaluated by our society after AI is developed, etc.
Do I terminally value reproduction? What if we run an RL algorithm with genetic fitness as its reward function?
If we define terminal values in the way you specify, but then “aligned” (whether parochially or holistically) does not seem like a strong enough condition to actually make us happy about an AI. Indeed, most people at MIRI seem to think that most of the difficulty of alignment is getting from “has X as explicit terminal goal” to “is actually trying to achieve X.”
Although it feels intuitive, I’m not satisfied with the crispness of this definition, since we don’t have a good way of determining a black box system’s intentions
Are you also unsatisfied with “holistic alignment” or “parochial alignment” as crisp concepts, since we don’t have a way of determining a black box system’s terminal values?
Yes. The detailed dynamics depend a lot on the particular commodity, and how elastic we expect demand to be; for example, over the long run I expect GDP/oil to go way up as we move to better substitutes, but over a short period where there aren’t good substitutes it could stay flat.
Consider the set of bonds issued and held. The number issued is equal to the number held. If I increase the number held by 1, then I will bid up the price of bonds, both decreasing the # held by other people (since more expensive bonds are less attractive to hold) and increasing the # issued (since more expensive bonds are more attractive to issue). Those two effects must add up to 1.
If I increase the number of bonds issued, that means someone new has a loan.
If I decrease the number of bonds held, then that means someone else was going to buy a loan and instead they didn’t. That means they spent the money on something else, or perhaps left it under their mattress.
Either way, the money gets out of the financial system.
Temporarily setting aside the fact that “don’t save” is a perfectly valid way of getting money out of the financial system, that has a similar effect on inflation: your implicit position seems to be that if try to buy a bond, the effect will just be to stop someone else from holding a bond rather than from causing any new bonds to be issued. I.e., that if I decrease the interest rates on a bond by 0.01%, that has a much (>20x) larger effect on decreasing people’s willingness to hold bonds than on increasing their willingness to issue bonds. That’s true over the very short term, since taking out a loan takes longer than selling a loan, but an extremely strange view over the medium term. It also seems to contradict observed behaviors where people, and especially firms, consider interest rates before taking out a loan (in fact, that effect seems much stronger than the effect of interest rates on savings).
Also, note that the same substitution effect happens when I make a loan. If I hadn’t loaned Alice money, she would have taken out a loan from someone else. Does anything have any effect on the world?
showing the extent to which gold is a favored store of value slash the growth rate of wealth as opposed to income/production (GDP), and what that implies about the value of gold as an investment and the forward interest rate. It’s much cleaner than how I’d been thinking about it before, and the logic likely extends (amusing graph: GDP-to-Bitcoin, world is clearly ending).
Sure, but we don’t expect a long-term positive trend in GDP/gold. If it’s roughly constant over the long term, the conclusion is “people’s attitude towards gold isn’t changing too much,” not “we aren’t getting richer after all.”