I’d be interested in hearing more about Ryan’s proposal to do better generalization science(or if you don’t have much more to say in the podcast format I’d be interested in seeing the draft about it)
interstice
He’s talking about the stuff around the simulation hypothesis and acausal trade in the preceding section.
Yeah “esoteric” perhaps isn’t the best word. What I had in mind is that they’re relatively more esoteric than “AI could kill us all” and yet it’s pretty hard to get people to take even that seriously! “Low-propensity-to-persuade-people” maybe?
but “extremely unlikely” seems like an overstatement[...]
Yes this is fair.
Interesting. I’d wondered why you wrote so many pieces advising people to be cautious about more esoteric problems arising from AI, to an extent that seemed extremely unlikely to be implemented in the real world, but there being a chance simulators are listening to your arguments does provide an alternative avenue for influence.
The Electric Slide
I don’t think we have much reason to think of all non-human-values-having entities as being particularly natural allies, relative to human-valuers who plausibly have a plurality of local control. I think you might be lumping non-human-valuers together in ‘far mode’ since we know little about them, but a priori they are likely about as different from each other as from human-valuers. There may also be a sizable moral-realist or welfare-valuing contingent even if they don’t value humans per se. There may also be a general acausal norm against extortion since it moves away from the pareto frontier of everyone’s values.
OK, so then so would whatever other entity is counterfactually getting more eventual control. But now we’re going in circles.
A very slightly perturbed superintelligence would probably concieve of itself as almost the same being it was before,
OK but if all you can do is slightly perturb it then it has no reason to threaten you either.
Do you not think that causing their existence is something they are likely to want?
But who is they? There’s a bunch of possible different future SIs(or if there isn’t, they have no reason to extort us). Making one more likely makes another less likely.
re: 4, I dunno about simple, but it seems to me that you most robustly reduce the amount of bad stuff that will happen to you in the future by just not acting on any particular threats you can envision. As I mentioned there’s a bit of a “once you pay the danegeld” effect where giving in to the most extortion-happy agents incentivizes other agents to start counter-extorting you. Intuitively the most extortion-happy agents seem likely to be a minority in the greater cosmos for acausal normalcy reasons, so I think this effect dominates. And I note that you seem to have conceded that even in the mainline scenario you can envision there will be some complicated bargaining process among multiple possible future SIs which seems to increase the odds of acausal normalcy type arguments applying. But again I think an even more important argument is that we have little insight into possible extorters and what they would want us to do, and how much of our measure is in various simulations etc(bonus argument, maybe most of our measure is in ~human-aligned simulations since people who like humans can increase their utility and bargain by running us, whereas extorters would rather use the resources for something else). Anyway, I feel like we have gone over our main cruxes by now. Eliezer’s argument is probably an “acausal normalcy” type one, he’s written about acausal coalitions against utility-function-inverters in planecrash.
So don’t all the lines of argument here leave you feeling that we don’t know enough to be confident about what future extorters want us to do? At the very least I’ll point out there are many other possible AIs who are incentivized to act like “AI B” towards people who give in to basilisk threats. Not to mention the unclearness of what actions lead to what AIs, how much influence you actually have(likely negligible), the possibility we are in a simulation, aliens.… And we are almost certainly ignorant of many other crucial considerations.
the original basilisk is more “Schelling-ish” than the others and so probably more likely
But the schellingishness of a future ASI to largely clueless humans is a very tiny factor in how likely it is to come to exist, the unknown dynamics of the singularity will determine this.
as a category, it is in their interest to behave as a whole in the context of acausally extorting humanity
It’s not clear that they form a natural coalition here. E.g. some of them might have directly opposed values. Or some might impartially value the welfare of all beings. I think if I had to guess, it seems plausible that human-aligned-ish values are a plurality fraction of possible future AIs(basically because: you might imagine that we either partially succeed at alignment or fail. If we fail, then the resulting values are effectively random, and the space of values is large, leaving aligned-ish values as the largest cluster(even if not a majority). Not sure of this but seems plausible. LLM-descended AIs might also see us as something like their ancestor)
but if the former, then the basilisk remains a threat
Ok but the AI A/B scenario can also apply here as long as there is more than one possible outcome of the singularity(or even if not since we could be in a simulation right now)
Because there is coherent logic behind it
I just don’t agree that the scenario you’ve presented is more plausible or logically compelling than the ones I’ve sketched in my OP. But none are that compelling because we just lack any good model of this domain.
As a meta-note, it can be rational in the presence of some weird compelling abstract argument which is hard to evaluate precisely to fall back on “common sense”. Why? Because your brain is corrupted hardware, it can generate conclusions and intuitive feelings of plausibility based on emotions. “Common sense” is the default option found to be relatively sane across all the rest of humanity. In your case the emotion seems to be anxiety about an imagined future scenario. “But the stakes are so high that it’s worth discounting that even if the objective probability is higher” Note you are essentially being pascal’s-mugged by your brain.
The guy who wrote this essay went on to make a “interactive building-as-computer” thing called dynamicland
I mean, why should I take your claim of non-ignorance seriously? By default we should not expect to have great insight into the decision procedures of a future superintelligence [1] . Sure we can predict some stuff like not violating light speed, wanting mass and energy(probably......) but those are things which we have a very solid theoretical understanding of, this really isn’t the case with acausal trade or decision theory generally. Likewise we have a good theoretical understanding of chess and extensive empirical experience. “There might be a future superintelligence who would torture you if you don’t help create it” is just way too weak of an argument to confidently predict anything or recommend any particular actions(how does your argument deal with the possibility of multiple possible future SIs as above, for one thing? This seems like the strong default assumption) Like, what even are the actions you think the SI will want you to take?
- ↩︎
You can think of a future superintelligence as having undergone millions of years of effective history, had multiple conceptual revolutions upending its understanding of reality, etc. It’s hard to say anything about such a being with confidence!
- ↩︎
Would you agree that the most commonly feared form of the basilisk is more of a schelling point?
Not really. I think we have ~no clue what the Schelling point of acausal coordination for superintelligences looks like(if one exists).
We seemingly have no idea what potential future extorters would want us to do. OK, you can imagine an AI that really wants to come into existence, and will torture you if you didn’t help create it. But what if there’s actually two AIs that want to come into existence, who each really hate the other, and AI B will torture you if you were helping AI A come into existence! Or maybe future humanity in some Everett branches will make a gazillion simulations of everyone so that most of their measure is there, and they’ll punish/reward you for helping the basilisks! Or maybe....etc.
In reality, it’s likely something weirder that no one anticipated will happen. The point is we have no idea what to expect, which makes threatening us pointless, since we don’t know what action extorters would want us to take. If you think you have a good enough picture of the future that you do know, you’re probably (very) overconfident.
I guess it depends on whether you have a preference for variety in the world in general, or in your own actions/experiences. But even in the world-in-general case there would be a force towards convergence in the things that overall get optimized for compared across different worlds.(Unless your preference is over variety across possible worlds, but that starts to seem a bit unnatural/hard to optimize for)
[Epistemic status: vague speculation] I like the idea of consciousness being allocated based on projecting influence backwards into the past. Humans are currently the most conscious beings because we are the densest nexus of influence over the future, but this will eventually change. This seems to have the requisite self-consistency properties. e.g. if you are aiming to have a large influence over the future it’s probably important to track other beings with the same property.
ETA: another perhaps better possibility is that consciousness is about being a bottleneck of information between the past and future.