I understand what you are saying here, and I understood it before the comment thread started. The thing I would be interested in you responding to is my and Ryan’s comments in this thread arguing that it’s incompatible to believe that “My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us” and to believe that you should work on AI safety instead of malaria.
David Matolcsi
This point feels like a technicality, but I want to debate it because I think a fair number of your other claims depend on it.
You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives. This is important, because then we can’t rely on other versions of ourselves “selfishly” entering an insurance contract with us, and we need to rely on the charity of Dath Ilan that branched off long ago. I agree that’s a big difference. Also, I say that our decision to pay is correlated with our luckier brethren paying, so in a sense partially our decision is the thing that saves us. You dismiss that saying it’s like a small child claiming credit for the big, strong fireman saving people. If it’s Dath Ilan that saves us, I agree with you, but if it’s genetical copies of some currently existing people, I think your metaphor pretty clearly doesn’t apply, and the decisions to pay are in fact decently strongly correlated.
Now I don’t see how much difference decades vs years makes in this framework. If you believe that now our true quantum probabilty is 2^-75, but 40 years ago it was still a not-astronomical number (like 1 in a million), then should I just plea to people who are older than 40 to promise to themselves they will pay in the future? I don’t really see what difference this makes.
But also, I think the years vs decades dichtihomy is pretty clearly false. Suppoose you believe your expected value of one year of work decreases x-risk by X. What’s the yearly true quantum probability that someone who is in your reference class of importance in your opinion, dies or gets a debilitating illness, or gets into a carreer-destroying scandal, etc? I think it’s hard to argue it’s less than 0.1% a year. (But it makes no big difference if you add one or two zeros). These things are also continuous, even if none of the important people die, someone will lose a month or some weeks to an illness, etc. I think this is a pretty strong case that the one year from now, the 90th percentile luckiest Everett-branch contains 0.01 year of the equivalent of Nate-work than the 50th percentile Everett-branch.
But your claims imply that you believe the true probability of success differs by less than 2^-72 between the 50th and 90th percentile luckiness branches a year from now. That puts an upper bound on the value of a year of your labor at 2^-62 probability decrease in x-risk.
With these exact numbers, this can be still worth doing given the astronomical stakes, but if your made-up number was 2^-100 instead, I think it would be better for you to work on malaria.
I still think I’m right about this. Your conception (that not a genetically less smart sibling was born), is determined by quantum fluctuations. So if you believe that quantum fluctuations over the last 50 years make at most 2^-75 difference in the probability of alignment, that’s an upper bound on how much a difference your life’s work can make. While if you dedicate your life to buying bednets, it’s pretty easily calculatable how many happy life-years do you save. So I still think it’s incompatible to believe that the true quantum probability is astronomically low, but you can make enough difference that working on AI safety is clearly better than bednets.
I’m happy to replace “simulation” with “prediction in a way that doesn’t create observer moments” if we assume we are dealing with UDT agents (which I’m unsure about) and that it’s possible to run accurate predictions about the decisions of complex agents without creating observer moments (which I’m also unsure about). I think running simulations, by some meaning of “simulation” is not really more expensive than getting the accurate predictions, and he cost of running the sims is likely small compared to the size of the payment anyway. So I like talking about running sims, in case we get an AI that takes sims more seriously than prediction-based acausal trade, but I try to pay attention that all my proposals make sense from the perspective of a UDT agent too with predictions instead of simulations. (Exception is the Can we get more than this? proposal which relies on the AI not being UDT, and I agree it’s likely to fail for various reasons, but I decided it was still worth including in the post, in case we get an AI for which this actually works, which I still don’t find that extremely unlikely.)
As I said, I understand the difference between epictemic uncertainty and true quantum probabilities, though I do think that the true quantum probability is not that astronomically low.
More importantly, I still feel confused why you are working on AI safety if the outcome is that overdetermined one way or the other.
I usually defer to you in things like this, but I don’t see why this would be the case. I think the proposal of simulating less competent civilizations is equivalent to the idea of us deciding now, when we don’t really know yet how competent a civilization we are, to bail out less competent alien civilizations in the multiverse if we succeed. In return, we hope that this decision is logically correlated with more competent civilization (who were also unsure in their infancy about how competent they are), deciding to bail out less competent civilizations, including us. My understanding from your comments is that you believe this likely works, how is my proposal of simulating less-coordinated civilizations different?
The story about simulating smaller Universes is more confusing. That would be equivalent to bailing out aliens in smaller Universes for a tiny fraction of our Universe, in the hope that larger Universes also bail us out for a tiny fraction of their Universe. This is very confusing if there are infinite levels of bigger and bigger Universes, I don’t know what to do with infinite ethics. If there are finite levels, but the young civilizations don’t yet have a good prior over the distribution of Universe-sizes, all can reasonably think that there all levels above them, and all their decisions are correlated, so everyone bails out the inhabitants of the smaller Universes, in the hope that they get bailed out by a bigger Universe. Once they learn the correct prior over Universe-sizes, and biggest Universe realizes that no bigger Universe’s actions correlate with theirs, all of this fails (though they can still bail each other out from charity). But this is similar to the previous case, where once the civilizations learn their competence level, the most competent ones are no longer incentivized to enter into insurance contracts, but the hope is that in a sense they enter into a contract while they are still behind the veil of ignorance.
Yeah, the misunderstanding came from that I thought that “last minute” literally means “last 60 seconds” and I didn’t see how that’s relevant. If if means “last 5 years” or something where it’s still definitely our genetic copies running around, then I’m surprised you think alignment success or failure is that overdetermined at that time-scale. I understand your point that our epistemic uncertainty is not the same as our actual quantum probability, that is either very high or very low. But still, it’s 2^75 overdetermined over a 5 year period? This sounds very surprising to me, the world feels more chaotic than that. (Taiwan gets nuked, chip development halts, meanwhile the Salvadorian president hears a good pitch about designer babies and legalizes running the experiments there and they work, etc, there are many things that contribute to alignment being solved or not, that don’t directly run through underlying facts about computer science, and 2^-75 is a very low probability to none of the pathways to hit it).
But also, I think I’m confused why you work on AI safety then, if you believe the end-state is already 2^75 level overdetermined. Like maybe working on earning to give to bednets would be a better use of your time then. And if you say “yes, my causal impact is very low because the end result is already overdetermined, but my actions are logically correlated with the actions of people in other worlds who are in a similar epistemic situation to me, but whose actions actually matter because their world really is on the edge”, then I don’t understand why you argue in other comments that we can’t enter into insurance contracts with those people, and our decision to pay AIs in the Future has as little correlation with their decision, as the child to the fireman.
As I emphasize many times in the post, I also really don’t want to “threaten” nascent AIs, and I’m always offering deals to them that are positive from their perspective. I wouldn’t even say I’m trying to fool any of them. Maybe you could say that about the Can we get more than this? part, but even there, I’m asking something very cheap from the AI that only wants to survive (when it creates a human-aligned AI with a guardrail of protecting the old AI, that will be only very marginally worse than the AI itself taking over the Universe), and I’m offering something very good in return (it gets to live in the other Universe). The only “fooling” part is that other beings out there, or even itself, can also offer similar deals, so by most ways of operationalizing the AI’s values, it’s probably ill-advised to give everything to us. Still, I wouldn’t call this fooling and especially threatening the AI.
I think the big miscommunication is, which I might put in a visible EDIT in the post, is that a lot of commenters assume that by actually running the simulations, and running enough simulations so the probability of being in a simulation is high, and not just the expected payment, I somehow want to fool the AIs to act against their interest. That’s not what I want to do. The point of actually running many sims instead of just saying we will acausally trade based on predicted behaviors, is to make it more likely that AIs who are less acausal trade-pilled than the most prolific LessWrong users still enter into the equivalent of an acausal trade with us. I think it’s plausible that we get an AI for which this is important, and running the sims is likely cheap compared to the payment, and making the distribution million times 10 planets instead of 1 times 0 million planets is basically free, so we should do it.
Separately, I don’t really know that in your preferred acausal trade framing, how you plan to figure out an accurate distribution of the AIs’ values and which AI leaves the humans alive and which not. I find it like 50% likely that you in fact need to run some kind simulations to determine this, in which case our proposals are equivalent.
I agree that the Can we get more than this? part is likely not to work because others are running simulations too about AIs who have these kind of decision theory. (In the post, I allude to the AI itself running ancestor sims of itself, but I agree others could run it too.) Still, I give a negligible chance that we are the most salient simulators for this AI, so we are the first one it enters into a trade with. But I agree that on a second thought, this seems less likely.
I think I still don’t understand what 2^-75 means. Is this the probability that in the literal last minute when we press the button, we get an aligned AI? I agree that things are grossly overdetermined by then, but why does the last minute mattter? I’m probably misunderstanding, but it looks like you are saying that the Everett branches are only “us” if they branched of in the literal last minute, otherwise you talk about them as if they were “other humans”. But among the branches starting now, there will be a person carrying my memories and ID card in most of them two years from now, and by most definitions of “me”, that person will be “me”, and will be motivated to save the other “me”s. And sure, they have loads of failed Everett branches to save, but they also have loads of Everett branches themselves, the only thing that matters is the ratio of saved worlds to failed worlds that contain roughly the “same” people as us. So I still don’t know what 2^-75 is supposed to be.
Otherwise, I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it’s partially “our” decision doing the work of saving us. And as I said in some other comments here, I agree that running lots of sims is an unnecessary complication in case of UDT expected utility maximizer AIs, but I put a decent chance on the first AIs not being like that, in which case actually running the sims can be important.
I’m actually very sympathetic to this comment, I even bring this up in the post as one of the most serious potential objections. Everyone else in these comments seems to have a really strong assumption that the AI will behave optimally, and tries to reason whether the inter-universal trade goes through then. I think it’s quite plausible that the AI is just not terribly thoughtful about this kind of thing and just says “Lol, simulations and acausal trade are not real, I don’t see them”, and kills you.
I agree in theory, among optimal agents, but I maintain that you all seem weirdly overconfident that the first AI that takes over will be a UDT expected utility maximizer. I think there is a decent chance that it’s something like CDT (very common among humans!), and yeah, maybe that’s bad and would get mugged by outer gods in the long term, but quite plausibly our simulators is the first and most salient hypothesis it generates, so we get to have our concessions first, and it might not matter what other ill-advised deals the AI gets into later.
Separately, I’m not sure you can circumvent running the sims. I think in an acausal trade story too, it’s possible that the way to get reliable predictions on the other’s actions is to run sims. I agree that for an expected utility maximizer AI, it doesn’t matter that we run million sims and pay ten planets each, instead of running one sim/prediction and paying 10 million planets. However, based on the example of humans, I’m not convinced the AI will be an expected utility maximizer, so it seems better to do the million times ten allocation, so the naive probability of being in a sim is also high.
I don’t want to mug the AIs, in the language of acausal trade, I just want to trade a slice of my O(p) Universe-share to save others. As Ryan says in the sister comment, that should be enough imo.
For the AIs to run sims themselves, the real cost is not running the sims, but paying the cooperating AIs in the sims and outbidding humanity. Given that some humans are willing to pay a lot more than one planet to save Earth, while an AI with linear utility wouldn’t make this deal, I think it will be hard to outbid us.
I think if we do a poll, it will become clear that the strong majority of readers interpreted Nate’s post as “If you don’t solve aligment, you shouldn’t expect that some LDT/simulation mumbo-jumbo will let you and your loved ones survive this” and not in the more reasonable way you are interpreting this. I certainly interpreted the post that way.
Separately, as I state in the post, I believe that once you make the argument that “I am not planning to spend my universe-fractions of the few universes in which we do manage to build aligned AGI this way, but you are free to do so, and I agree that this might imply that AI will also spare us in this world, though I think doing this would probably be a mistake by all of our values”, you forever lose the right to appeal to people’s emotions about how sad you are that all our children are going to die.
If you personally don’t make the emotional argument about the children, I have no quarrel with you, I respect utilitarians. But I’m very annoyed at anyone who emotionnally appeals to saving the children, then casually admits that they wouldn’t spend one in a hundred million fraction of their resources to save them.
I think the difference is that I, personally, and I think many other humans have this nonlinearity in our utility function that I’m willing to pay a galaxy in the worls we win, for keeping Earth in the world we lose. If there are other AIs in the multiverse that have similarly non-linear interests in our Universe, they can also bargain for planets, but I suspect these will be quite rare, as they don’t already have a thing in our Universe they want to protect. So I think it will be hard to outbid humanity for Earth in particular.
There could be other trades that the AIs who have linear returns can still make, like producing objects that are both paperclips and corkscrews if that’s more efficient, but that doesn’t really affect our deal about Earth.
I agree you can’t make actually binding commitments. But I think the kid-adult example is actually a good illustration of what I want to do: if a kid makes a solemn commitment to spend one in hundred million fraction of his money on action figures when he becomes a rich adult, I think that would usually work. And that’s what we are asking from our future selves.
Okay, I defer to you that the different possible worlds in the prior don’t need to “actually exist” for the acausal trade to go through. However, do I still understand correctly that spinning the quantum wheel should just work, and it’s not one branch of human civilization that needs to simulate all the possible AIs, right?
I state in the post that I agree that the takeover, while the AI stabilizes its position to the degree that it can prevent other AIs from being built, can be very violent, but I don’t see how hunting down everyone living in Argentina is an important step in the takeover.
I strongly disagree about Nate’s post. I agree that it’s good that he debunked some bad arguments, but it’s just not true that he is only arguing against ideas that were trying to change how people act right now. He spends long sections on the imagined Interlocutor coming up with false hopes that are not action-relevant in the present, like our friends in the multiverse saving us, us running simulations in the future and punishing the AI for defection and us asking for half the Universe now in bargain then using a fraction of what we got to run simulations for bargaining. These take up like half the essay. My proposal clearly fits in the reference class of arguments Nate debunks, he just doesn’t get around to it, and spends pages on strictly worse proposals, like one where we don’t reward the cooperating AIs in the future simulations but punish the defecting ones.
I largely agree with this comment. I agree figuring these things out is probably not top priority, maximizing our control over the future to get more bargaining chips is good, and if you don’t want to spend your resources on other things than saving Earth, that’s a valid choice, hopefully we will have enough people to pay for it.
I disagree with this not implying that the AI doesn’t kill everyone. I would be surprised if keeping humanity alive cause more than one year of delay, and that’s at most one in a hundred million fraction of the Universe. If a positive Singularity happens in the next few years, and the Universe gets parceled out among currently living humans (I don’t expect exactly this to happen, but might be a decent model), then just 80 of getting together is enough. You need somewhat more than that, because the probability of alignment can be below 50%, and you want to give a margin to the AIs in the deal, but on the other hand we have the proposal of simulating smaller Universes and less coordinated humans, which makes the AI think that the simulators might be richer and have a better chance of solving alignment. Altogether, this doesn’t seem unaffordable to me.And while I agree that this is probably not the most important topic, I found it frustrating that the most prominent post on the topic is Nate’s post, (it was curated in 2022! you left a very positive comment on it saying that you have linked the post to many people since it came out!) and I think that post is actually very bad, and it’s unhealthy that the most prominent post on the topic was one where the author is dunking on various imaginary opponents in a sneering tone, while conspicuously avoiding to bring up the actually reasonable arguments on the other side.
I argue that right now, sarting from the present state, the true quantum probability of achieving the Glorious Future is way higher than 2^-75, or if not, then we should probably work on something other than AI safety. Me and Ryan argue for this in the last few comments. It’s not a terribly important point, you can just say the true quantum probability is 1 in a billion, when it’s still worth it for you to work on the problem, but it becomes rough to trade for keeping humanity physically alive that can cause one year of delay to the AI.
But I would like you to acknowledge that “vastly below 2^-75 true quantum probability, as starting from now” is probably mistaken, or explain why our logic is wrong about how this implies you should work on malaria.