The Sequences make it seem like the Many Worlds interpretation has solved this problem but that’s not true.
No, Eliezer talks about this at some length. See The Born Probabilities:
[...] But what does the integral over squared moduli have to do with anything? On a straight reading of the data, you would always find yourself in both blobs, every time. How can you find yourself in one blob with greater probability? What are the Born probabilities, probabilities of? Here’s the map—where’s the territory?
I don’t know. It’s an open problem. Try not to go funny in the head about it.
This problem is even worse than it looks, because the squared-modulus business is the only non-linear rule in all of quantum mechanics. Everything else—everything else—obeys the linear rule that the evolution of amplitude distribution A, plus the evolution of the amplitude distribution B, equals the evolution of the amplitude distribution A + B.
When you think about the weather in terms of clouds and flapping butterflies, it may not look linear on that higher level. But the amplitude distribution for weather (plus the rest of the universe) is linear on the only level that’s fundamentally real.
Does this mean that the squared-modulus business must require additional physics beyond the linear laws we know—that it’s necessarily futile to try to derive it on any higher level of organization?
But even this doesn’t follow. [...]
And Privileging the Hypothesis:
[...] But, said Scott, we might encounter future evidence in favor of single-world quantum mechanics, and many-worlds still has the open question of the Born probabilities.
This is indeed what I would call the fallacy of privileging the hypothesis. There must be a trillion better ways to answer the Born question without adding a collapse postulate that would be the only non-linear, non-unitary, discontinous, non-differentiable, non-CPT-symmetric, non-local in the configuration space, Liouville’s-Theorem-violating, privileged-space-of-simultaneity-possessing, faster-than-light-influencing, acausal, informally specified law in all of physics. Something that unphysical is not worth saying out loud or even thinking about as a possibility without a rather large weight of evidence—far more than the current grand total of zero.
But because of a historical accident, collapse postulates and single-world quantum mechanics are indeed on everyone’s lips and in everyone’s mind to be thought of, and so the open question of the Born probabilities is offered up (by Scott Aaronson no less!) as evidence that many-worlds can’t yet offer a complete picture of the world. Which is taken to mean that single-world quantum mechanics is still in the running somehow.
In the minds of human beings, if you can get them to think about this particular hypothesis rather than the trillion other possibilities that are no more complicated or unlikely, you really have done a huge chunk of the work of persuasion. Anything thought about is treated as “in the running,” and if other runners seem to fall behind in the race a little, it’s assumed that this runner is edging forward or even entering the lead.
[… O]ur uncertainty about where the Born statistics come from should be uncertainty within the space of quantum theories that are continuous, linear, unitary, slower-than-light, local, causal, naturalistic, et cetera—the usual character of physical law. Some of that uncertainty might slop outside the standard space onto theories that violate one of these standard characteristics. It’s indeed possible that we might have to think outside the box. But single-world theories violate all these characteristics, and there is no reason to privilege that hypothesis.
The main claims Eliezer is criticizing in the QM sequence are that (1) reifying QM’s complex amplitudes runs afoul of Ockham’s Razor, (2) objective collapse is a plausible explanation for the Born probabilities, (3) QM shows that reality is ineffable, and (4) QM shows that there’s no such thing as reality. I don’t know what question of fact you think the Quantum Bayesians and Eliezer disagree about, or what novel factual claim QB is making. (I assume we agree ‘physical formalisms can be useful tools’ and ‘we can use probability theory to think about strength of belief’ aren’t novel claims.)
I think it generally makes sense to have highly upvoted recent sequences spotlighted on the top of the page, for the same reason it makes sense to have them spotlighted in ‘Curated’.
They can then be made rarer (or phased out entirely) once they’re less recent, if there’s less value to spotlighting them in the long run. I’ve generally had a hard time finding posts from each other, because posts in Embedded Agency and Fixed Points on LW often haven’t been included in any sequence.
If people tend to systematically make a certain mistake, then it’s worth asking whether there’s some causal factor behind it and whether that could be nudging us toward making the same mistake.
On the other hand, our general ability to solve problems and figure things out presumably is either staying the same, or getting worse, or getting better. That’s a factual question that we should be able to learn about, and if (after trying to correct for biases) we did end up reaching a conclusion that resembles an old mistake, well, then it’s also possible that the truth resembles an old mistake.
This is a great post, and I think does a good job of capturing why the two sides tend to talk past each other. A is baffled by why B claims to be able to reduce free-floating symbols to other symbols; B is baffled by why A claims to be using free-floating symbols.
They’re also both probably right when it comes to “defending standard usage”, and are just defending/highlighting different aspects of folk moral communication.
People often use “should” language to try to communicate facts; and if they were more self-aware about the truth-conditions of that language, they would be better able to communicate and achieve their goals. Harris thinks this is important.
People also often use “should” language to try to directly modify each others’ motivations. (E.g., trying to express themselves in ways they think will apply social pressure or tug at someone’s heartstrings.) Harris’ critics think this is important, and worry that uncritically accepting Harris’ project could conceal this phenomenon without making it go away.
(Well, I think the latter is less mysterian than the typical anti-Harris ethics argument, and Harris would probably be more sympathetic to the above framing than to the typical “ought is just its own thing, end of story” argument.)
The above is the full Embedded Agency sequence, cross-posted from the MIRI website so that it’s easier to find the text version on AIAF/LW (via search, sequences, author pages, etc.).
Scott and Abram have added a new section on self-reference to the sequence since it was first posted, and slightly expanded the subsequent section on logical uncertainty and the start of the robust delegation section.
(Abram has added a note to this effect in the post above, and in the text version.)
Abram has made a major update to the post above, adding material on self-reference and the grain of truth problem. The corresponding text on the MIRI Blog version has also been expanded, with some extra material on those topics plus logical uncertainty.
New material on paradoxes of self-reference
Revised material on logical uncertainty
The next part just went live, and is about exactly that!: http://intelligence.org/embedded-models
Cross-posting some comments from the MIRI Blog:
Re: 5⁄10 problem
I don’t get it. Human is obviously (in that regard) an agent reasoning about his actions. Human also will choose 10 without any difficulty. What in human decision making process is not formalizable here? Assuming we agree that 10 is rational choice.
Suppose you know that you take the $10. How do you reason about what would happen if you took the $5 instead? It seems easy if you know how to separate yourself from the world, so that you only think of external consequences (getting $5). If you think about yourself as well, then you run into contradictions when you try to imagine the world where you take the $5, because you know it is not the sort of thing you would do. Maybe you have some absurd predictions about what the world would be like if you took the $5; for example, you imagine that you would have to be blind. That’s alright, though, because in the end you are taking the $10, so you’re doing fine.
Part of the point is that an agent can be in a similar position, except it is taking the $5, knows it is taking the $5, and unable to figure out that it should be taking the $10 instead due to the absurd predictions it makes about what happens when it takes the $10. It seems kind of hard for a human to end up in that situation, but it doesn’t seem so hard to get this sort of thing when we write down formal reasoners, particularly when we let them reason about themselves fully (as natural parts of the world) rather than only reasoning about the external world or having pre-programmed divisions (so they reason about themselves in a different way from how they reason about the world).
It’s the first post! The posts are indexed on https://www.alignmentforum.org/s/Rm6oQRJJmhGCcLvxh and https://intelligence.org/embedded-agency/, but it looks like they’re not on LW?
I’d draw more of a connection between embedded agency and bounded optimality or the philosophical superproject of “naturalizing” various concepts (e.g., naturalized epistemology).
Our old name for embedded agency was “naturalized agency”; we switched because we kept finding that CS people wanted to know what we meant by “naturalized”, and we’d always say “embedded”, so...
“Embodiment” is less relevant because it’s about, well, bodies. Embedded agency just says that the agent is embedded in its environment in some fashion; it doesn’t say that the agent has a robot body, in spite of the cute pictures of robots Abram drew above. An AI system with no “body” it can directly manipulate or sense will still be physically implemented on computing hardware, and that on its own can raise all the issues above.
Example: you can think AGI alignment is worth working hard on even if (a) you only assign a 30% probability to success, and (b) you’re not incredibly excited and overjoyed to be working on it.
By assumption, this also isn’t a case where you find the work so inherently bleh that it’s actually not a good fit for you and you shouldn’t try. If you’d be sufficiently excited in the world where you thought the success odds were 70%, and your system 2 doesn’t think the difference between 70% and 30% odds is decision-relevant in this case, then it seems like something’s going wrong if you’re insufficiently motivated in the 30% case.
Cool. I hadn’t thought to frame those problems in predictor terms, and I agree now that “only matters in multi-agent dilemmas” is incorrect.
That said, it still seems to me like policy selection only matters in situations where, conceptually, winning requires something like multiple agents who run the same decision algorithm meeting and doing a bit of logically-prior coordination, and something kind of like this separates things like transparent Newcomb’s problem (where policy selection is not necessary) from the more coordination-shaped cases. The way the problems are classified in my head still involves me asking myself the question “well, do I need to get together and coordinate with all of the instances of me that appear in the problem logically-beforehand, or can we each individually wing it once we see our observations?“.
If anyone has examples where this classification is broken, I remain curious to hear them. Or, similar question: is there any disagreement on the weakened claim, “policy selection only matters in situations that can be transformed into multi-agent problems, where a problem is said to be ‘multi-agent’ if the winning strategy requires the agents to coordinate logically-before making their observations”?
I think Eliezer’s goal was mainly to illustrate the kind of difficulty FAI is, rather than the size of the difficulty. But they aren’t totally unrelated; basic conceptual progress and coming up with new formal approaches often requires a fair amount of serial time (especially where one insight is needed before you can even start working toward a second insight), and progress is often sporadic compared to more applied/well-understood technical goals.
It would usually be extremely tough to estimate how much work was left if you were actually in the “rocket alignment” hypothetical—e.g., to tell with confidence whether you were 4 years or 20 years away from solving “logical undiscreteness”. In the real world, similarly, I don’t think anyone knows how hard the AI alignment problem is. If we can change the character of the problem from “we’re confused about how to do this in principle” to “we fundamentally get how one could align an AGI in the real world, but we haven’t found code solutions for all the snags that come with implementation”, then it would be much less weird to me if you could predict how much work was still left.
Nate says: “You may have a scenario in mind that I overlooked (and I’d be interested to hear about it if so), but I’m not currently aware of a situation where the 1.1 patch is needed that doesn’t involve some sort of multi-agent coordination. I’ll note that a lot of the work that I (and various others) used to think was done by policy selection is in fact done by not-updating-on-your-observations instead. (E.g., FDT agents refuse blackmail because of the effects this has in the world where they weren’t blackmailed, despite how their observations say that that world is impossible.)”
Nate says: “The main datapoint that Rob left out: one reason we don’t call it UDT (or cite Wei Dai much) is that Wei Dai doesn’t endorse FDT’s focus on causal-graph-style counterpossible reasoning; IIRC he’s holding out for an approach to counterpossible reasoning that falls out of evidential-style conditioning on a logically uncertain distribution. (FWIW I tried to make the formalization we chose in the paper general enough to technically include that possibility, though Wei and I disagree here and that’s definitely not where the paper put its emphasis. I don’t want to put words in Wei Dai’s mouth, but IIRC, this is also a reason Wei Dai declined to be listed as a co-author.)”
My model is that ‘FDT’ is used in the paper instead of ‘UDT’ because:
The name ‘UDT’ seemed less likely to catch on.
The term ‘UDT’ (and ‘modifier+UDT’) had come to refer to a bunch of very different things over the years. ‘UDT 1.1’ is a lot less ambiguous, since people are less likely to think that you’re talking about an umbrella category encompassing all the ‘modifier+UDT’ terms; but it’s a bit of a mouthful.
I’ve heard someone describe ‘UDT’ as “FDT + a theory of anthropics”—i.e., it builds in the core idea of what we’re calling “FDT” (“choose by imagining that your (fixed) decision function takes on different logical outputs”), plus a view to the effect that decisions+probutilities are what matter, and subjective expectations don’t make sense. Having a name for the FDT part of the view seems useful for evaluating the subclaims separately.
The FDT paper introduces the FDT/UDT concept in more CDT-ish terms (for ease of exposition), so I think some people have also started using ‘FDT’ to mean something like ‘variants of UDT that are more CDT-ish’, which is confusing given that FDT was originally meant to refer to the superset/family of UDT-ish views. Maybe that suggests that researchers feel more of a need for new narrow terms to fill gaps, since it’s less often necessary in the trenches to crisply refer to the superset.
Your comment here makes it sound like the FDT paper said “the difference between UDT 1.1 and UDT 1.0 isn’t important, so we’ll just endorse UDT 1.0”, where what the paper actually says is:
In the authors’ preferred formalization of FDT, agents actually iterate over policies (mappings from observations to actions) rather than actions. This makes a difference in certain multi-agent dilemmas, but will not make a difference in this paper. [...]
As mentioned earlier, the author’s preferred formulation of FDT actually intervenes on the node FDT(−) to choose not an action but a policy which maps inputs to actions, to which the agent then applies her inputs in order to select an action. The difference only matters in multi-agent dilemmas so far as we can tell, so we have set that distinction aside in this paper for ease of exposition.
I don’t know why it claims the difference only crops up in multi-agent dilemmas, if that’s wrong.
The opening’s updated now to try to better hint at this, with: “Somewhere in a not-very-near neighboring world, where science took a very different course…”