Research Lead at CORAL. Director of AI research at ALTER. PhD student in Shay Moran’s group in the Technion (my PhD research and my CORAL/ALTER research are one and the same). See also Google Scholar and LinkedIn.
E-mail: {first name}@alter.org.il
Research Lead at CORAL. Director of AI research at ALTER. PhD student in Shay Moran’s group in the Technion (my PhD research and my CORAL/ALTER research are one and the same). See also Google Scholar and LinkedIn.
E-mail: {first name}@alter.org.il
Quantum Poincare recurrences:
Thank you so much for this question! It’s silly of me that I haven’t seriously thought about quantum Poincare recurrences in this context, but now that I did, I finally see a path towards formally testing the “no BB in FCR” claim.
Explanation for readers who are following some of the formal details of FCR:
Consider the same setting of Gergely’s post that I linked, but instead of making time evolution stop after T steps, we can make it go on forever. The agent’s memory tape is still of size T, so it will end up cycling through it and (reversibly) overriding it infinitely many times. My conjecture is that, in this new setting, we still have a version of Theorem 4.19 with a non-trivial lower bound. This would imply no BB in the traditional sense: despite the Poincaré recurrences, the agent does not experience all possible histories.
Why would that be true? Because, if you measure the agent’s memory time at a time in which its memory tape is full of “garbage”, the results conveys only a little information about its policy, and its unlikely to make the agent “experience” too much in the formal sense defined by the bridge transform. Whereas if you measure the agent’s memory at time in which the Poincare recurrence recently reset the tape, the “minimal computations” principle would make it likely for it see the same observations it saw during previous such cycles.
Explanation for readers who are not following the formal details of FCR:
The thing is, whether we “must” compute something or not is not a binary. Rather, the probability we compute something increases with the total variation distance between the distributions we need to distinguish. So, as long as different agent policies produce similar distributions, we only have a low probability of computing the policy (which in this framework is equivalent to the agent “experiencing” something). I think this also addresses your remark about “NP”: it’s likely easy to approximate the thermal equilibrium distribution without simulating brains.
Other “observers” without subjective experiences:
I don’t know, but you can in principle use this theory to predict the experiences of an agent inside something like Wigner’s friend experiment or any other scenario that violates decoherence. Implementing such an agent would require a quantum computer.
I think that the solution to the puzzle of Boltzmann Brains will come out of the interpretation of quantum mechanics via the lens of Formal Computational Realism (FCR). On that view, the universe is sampling every possible quantum observable s.t. (i) the marginal distribution of each observable agrees with the Born rule (ii) the overall amount of computation made is minimal. (Tbc this is a very informal description of a rigorous mathematical framework.) For a time moment
In fact, given two late moments
That said, a fully formal analysis of BB in the framework is still pending.
Formal Computational Realism also aims to solve the confusions surrounding computationalism (as the name suggests). The key philosophical insight is that computations are actually more fundamental than “atoms”, rather than emergent from atoms. Instead, physical theories are sort of book-keeping devices for predicting which computations actually occur.
While it is possible, in some sense, to answer which computations occur according to a physical theory (this is what the “bridge transform” operator in FCR is doing), this requires information not contained in the physical theory itself, namely knowledge about mathematics. Notice that when we use physical theories in practice, we invoke our knowledge about mathematics all the time. We might naively imagine that it makes sense to think of a mathematically omniscient mind using the same physical theory to draw similar conclusions; however, it doesn’t really make sense: the existence of such an omniscient mind would require all possible computations to already occur inside the mind (or in the process of creating it).
Another such approach is computational superimitation (COSI), which seems to make a totally different set of assumptions (which very few people understand well enough to question). I hope that Vanessa Kosoy and Diffractor do not unilaterally decide that they have properly specified alignment, and then actually try to build an ASI based on COSI.
(I haven’t read the entire post yet, just wanted to respond to this point. The following is on behalf of myself and CORAL, but Diffractor might have his own take.)
I hope we will build ASI based on COSI (or some evolution of COSI), but it will be when
The theory is much, much more developed.
The assumptions are extensively validated in theory, by some combination of
Reducing the assumptions as much as possible to a simple and intuitive core.
Studying the theoretical implications of the assumptions in detail, to see that they lead to a comprehensive, coherent and convincing mathematico-philosophical view.
Tying the assumptions to knowledge in other fields, such as physics, cognitive science and evolutionary biology.
The assumptions are extensively validated in practice, by building scaled-down models and studying them with interpretability tools that also come out of the theory.
Waiting for an even stronger validation is infeasible because unaligned ASI is about to emerge from other projects, and the other projects refuse to coordinate on a pause.
As to “unilaterally”, we are very interested in thoughtful critique from other researchers. We are also going to vocally support a global AI moratorium that would apply to us. But, if there is no moratorium, we don’t commit to waiting for a global academic consensus that will never come (see point 4 above).
If someone wants to someday want to understand what you sometimes do with math besides… turning the math into exact code… …prove AI mathematically safe; which again, to be clear, is not a kind of thing that math can do in principle...
I want to push back against this some. (I’m not sure whether I’m arguing with the actual Yudkowsky, or with a plausible misinterpretation of Yudkowsky, but it seems worth saying either way.)
Some things with which I agree:
The safety of a given AI design depends not only on facts about math, but also on facts about the physical world.
Therefore, it is not possible to prove an AI design to be safe using math alone, without invoking any empirically grounded knowledge about the physical world.
Moreover, any sane project building safe ASI would conduct empirical tests of some kind.
However, it is also true that:
“Turning math into exact code” is actually pretty commonplace and not at all exotic or outlandish, like the quoted text might seem to imply. There is an entire mathematical science of algorithms, and many algorithms produced by this theory are routinely turned into exact code.
While it is true that (i) there are ways to incorporate heuristics into your code while staying safe, and also (ii) mathematical models can be used to reason about code by way of analogy, rather than direct implication, I also believe that (iii) Plan A for safe ASI should be that at least some critical core of the code will exactly correspond to the math, and we will even formally verify that this critical core satisfies that relevant theorems. At the very least, a sane civilization that’s not racing into doom would build ASI in this way.
There is nevertheless a reasonable sense in which AI can (and should) be “proven to be mathematically safe”. Of course, the mathematical argument that proves the AI to be safe rests on some assumptions that need to be empirically grounded in the very least. This is not dissimilar from cryptography, where we can prove a protocol to be mathematically safe, but must still work hard to ensure that the implementation actually obeys the assumptions of the mathematical model. And yet, the mathematical safety proof can (and probably must) do a lot of the heavy-lifting in establishing a strong overall case for safety.
This doesn’t contradict the OP, but is still important to note: to the extent that the safety case rests on experiments, these experiments must be interpreted through the lens of mathematical theory—otherwise there is (IMO) little chance of inferring the right generalizations from them.
Shifting the losses by one time step doesn’t really matter, since we’re mostly interested in the shape of the regret bound which (up to mild changes in the constants) is not affected by this.
a subset? Why is it not just that product space? I’m assuming it’s because this is a set of partial functions, but I don’t see how taking a subset lets you account for that.
You’re absolutely right, it should be a quotient space, not a subspace. In principle, it can be represented as a closed subspace of the product of copies of
In this case, as written, you don’t need to say “An open set is then an arbitrary union of basis elements”
Actually, we do? For example, consider the space
However, this set is not a basis set.
It is covered in the proof section.
You’re right, it’s supposed to be
where
Good catch, this sentence is very confused.
Epistemic status: half-baked
Arguably, an aligned AI should be aligned to the user’s prior as well as the user’s utility function. Hence, any value-learning protocol should also be doing prior-learning. The problem is, any learning process requires (explicitly or implicitly) its own prior. But shouldn’t this also be the user’s prior? Is this an infinite regress? Maybe not: here is a way out that seems elegant in a way.
For now, we will work in the Bayesian framework. Let
Mathematically, this is saying that there should be constant
So,
The problem is, this doesn’t describe a Bayesian agent: as the AI accumulates more evidence, its prior changes and hence its belief changes in a non-Bayesian way. Maaaybe this is some kind of “radical probabilism” (I don’t understand the latter well enough to say). From a different angle, what I really want is a priorist (“updateless”) specification of the agent’s policy, and atm I don’t know how to reconcile it with this “eigenprior”.
Also, this feels to good to be true: we get a canonical prior out of nothing? This brings to mind the sort of negative results found by Muller and then Hilton and Kramar. What I expect to be more likely is that we do need to choose some “ur-prior” for the AI, but maybe the sensitivity to this choice can be reduced by this kind of method. Perhaps the full-fledged setting with infinitely-many universes will admit existence but not uniqueness of eigenvectors, and then the precise choice of eigenvector will depend on the ur-prior.
What I meant is not “people only care about ~Dunbar number of people”, but something more like “the closest ~Dunbar number of people have [some fraction around the range 1/1000-1/2] of the total value”. Giuseppe Garibaldi was also influenced by considerations such as increasing his own status (or maybe even posthumous reputation).
As to “humans are not capable to behave this way rationally”, I disagree. (The whole point of decision theories like UDT/FDT is that you don’t need to rewrite your source code to behave in an a priori-optimal way, and I believe that I’m fully capable of following the recommendations of such decision theories—and do follow their recommendations. )There is probably also a sense in which we value something vaguely akin to “abstract moral concepts”, but this caches out to something very different from utilitarianism (closer to virtue ethics).
I’m not sure what do you mean by “does not scale much”, but I agree with everything else. (My own ideal outcome is not literally “I am the queen”, but the same principle applies.)
The above treatment of “CDT precommitment games” is problematic: the concept
Definition: A CDT decision problem is the following data. We have a set of variables
The parent relation must induce an acyclic directed graph. We also have a selected subset of decision variables
This is connected to our overall formalism by setting
The CDT counterfactuals and decision-rule are defined via a do-operator that forces
Definition: A CDT precommitment game is a CDT decision problem in which there is some special
1.
2. For some
3. For every
4. For every
This is connected to our abstract notion of precommitment game by setting
The underlying decision problem of the precommitment game is constructed by deleting
The game is said to be trivial when all variables with parent
Proposition: CDT is precommitment-stable in trivial precommitment games.
Definition: Given a CDT precommitment game with
Proposition: If
Above, I compare different decision theories to FDT. At the same time, I claim that in a deeper sense, FDT is ill-defined. One may doubt whether that is a coherent line of reasoning. Therefore, instead of a comparison to FDT, I propose to frame these observations as being about stability to precommitments. Details follow.
Definition: A precommitment game
1. We are given some
2. We are given
3. For any
4. Denote
The restriction of
Definition: An EDT precommitment game
1.
2.
The underlying decision problem is then an EDT decision problem with the belief
(Is there a natural generalization without the assumption
Proposition: EDT is precommitment-stable in formally causal precommitment games. That is, in any such game there is
For example, XOR blackmail can be formalized as an EDT precommitment game which is not formally causal and EDT is not precommitment-stable there (the only optimal policy is precommitting to reject).
Definition: [EDIT: The treatment of CDT here is problematic, see child post.] A CDT precommitment game
The underlying decision problem is then a CDT decision problem with
Proposition: CDT is precommitment-stable in policy-bottlenecked precommitment games. That is, in any such game there is
For example, Newcomb’s paradox can be formalized as a CDT precommitment game is which is not policy-bottlenecked and CDT is not precommitment-stable there (the only optimal policy is precommitting to one-box).
Definition: A DDT precommitment game
The underlying decision problem is then a DDT decision problem with
Proposition: IDDT is precommitment-stable in pseudocausal precommitment games. That is, in any such game there is
It should be straightforward to also formulate an analogous claim with plain DDT and iterated pseudocausal precommitment games.
To make the claim that DDT/IDDT is precommitment-stable more often than EDT and CDT, we need to somehow compare different decision theories on the same game. For this purpose, we have the following translations.
Definition: Given an EDT precommitment game with
Proposition: If
Definition: Given a CDT precommitment game, its DDT-translation is defined by setting
Proposition: If
Below we only use the case
There is no objective morality, but there is such as a thing as objectively rational decision-making. And I never said anything about egoism.
Your comment sounds to me like it’s coming from a particular school of moral philosophy discourse, which (in my view) is built on the erroneous redefinition of words. In particular, “moral” and “rational”, together with various synonym-ish words, mean different very things in colloquial speech, but this type of moral philosophy discourse conflates them. In theory, you can of course define your words any way you like. However, if you do so, you relinquish the right the argue from any common sense intuitive claim that uses these words in their original meaning. (Which, in my view, is how fallacies are smuggled in during this kind of discourse.)
Similarly, “egoism” and “taking rational actions according to your own preferences” are also very different things.
(Thank you for your comment, my explanation here is a useful addition to the OP, I think.)
Takes on moral philosophy and the history of this community that I mostly mentioned before but should maybe be put together somewhere:
Human preferences are very partial/parochial, and this is meta-endorsed. There is a finite number P s.t. for any N>0, the lives of N strangers are less than P times as terminally-valuable for you as the life of your loved one. If you want to be honest with yourself (which you should if it’s high-value for you to have accurate beliefs), you should endorse this.
(Objective, abstract) Morality is fake, both non-cognitivism and error theory have merit. Parochial altruistic preferences (=empathy) are real, rational and superrational cooperation are real. Morality-as-used-in-practice is a process of continuous negotiations about social norms (the “social contract” if you like).
In particular, utilitarianism is very confused. That said, (super)rational cooperation can cash out as something utilitarianism-ish in some situations. (For example, if it is best for everyone if we precommit to derailing the trolley even if a personal friend is on the other track.)
Paradoxes such as Pascal’s mugging, population ethics and infinite ethics all stem from trying to use a confused framework (impartial and unbounded utility).
This type of confusion contributed to the failure of Old MIRI’s agent foundations programme, by causing it to over-index on ideas like Pascal’s mugging and the procrastination paradox.
The self-deceptive endorsement of impartial unbounded utility obscures the importance of multi-agent considerations in morality-as-used-in-practice, and this contributed to failures of the Effective Altruism movement such as SBF and OpenAI. Ideas such as the “pivotal act” are also sus in a similar way (although I can see versions of that which might be justified).
This argument uses the assumption that Alice can’t change eir beliefs in response to learning that Omega has proposed specific bets and not others.
Not true. Changing her beliefs in response to Omega’s proposal doesn’t help her. Imagine that Alice is given a choice between
Take a bet that pays +2 if X and −1 if not-X.
Take a bet that pays +2 if not-X and −1 if X.
Refuse both bets.
No matter what probability Alice assigns to X after her update, “normal” Bayesian calculus (really CDT calculus, see below) mandates that she chooses 1 or 2, not 3.
It seems clear that a bookie can reliably make money from gamblers if the bookie knows which horse will win which race; this is not, in the classical way of thinking, a testament to the irrationality of the gamblers.
I guess this example assumes the gamblers are not allowed to update on the offered bets? (Otherwise it doesn’t make sense to me.) Like I said, we don’t assume it here.
Instead, infrabayesianism recommends a strict preference for mixed strategies.
Not really, you’re over-indexing on the somewhat outdated 6 year old post you’re replying to. It is true that if Alice has a coin that Omega cannot predict, she can come ahead by betting according to the coin. But, as my 1-2-3 example above demonstrates, this is not the core idea. The “modern” formulation of infra-Bayesianism only allows deterministic policies, whereas randomization is modeled by means such as “taking the action to flip a coin”.
That version relies on a “causal” assumption that Omega’s choices are probabilistically independent of the gambler’s. This assumption seems inherently contrary to the problem description (since Omega can predict the gambler’s choices, and uses those predictions to make its choices).
What is actually going on here, this is not a Dutch Book argument against Bayesianism per se, this is a Dutch Book argument against Bayesian CDT. CDT-Alice believes that choosing to bet on X doesn’t influence the veracity of X, since there is indeed no physical causal link from the former to the latter (X might even be determined before the bet is offered or made). EDT-Alice can succeed here by noticing that her own choice is correlated with X and therefore the probability of X differs between the “Alice bets on X” counterfactual and the “Alice bets on not-X” counterfactual.
So, why is this example interesting beyond other examples that undermine CDT?
Mainly, it’s just easier to understand how infra-Bayesianism solves the problem here, and in particular we only need (crisp) credal sets rather supradistributions (fuzzy credal sets).
Another reason is, the notion of subjective probability is often justified by thinking about bets. But thinking about bets requires a decision theory, and not just a theory of epistemology. Hence, once you noticed that you’re confused about decision theory, you should be open to reconsidering the notion of subjective probability as well.
Yet another reason is, there’s something interesting going on where the supra-POMDP method of dealing with Newcombian problems preserves causality in some sense, while the EDT solution “violates” it. I thought it’s notable, although probably more important are the cases where EDT fails altogether (while infra-Bayesianism / DDT succeeds).
I haven’t tried LZP in practice, but you can guess what results to expect by looking at the size of the LZ77-compression of the text. I expect that any remotely decent text prediction algorithm would be based on stochastic process prediction. The deterministic setting is just a toy model.
Thanks for the catch!
Actually, I think things might be even better than this: in semiclassical quantum gravity in asymptotically de Sitter spacetime, your quantum state is necessarily mixed (due to tracing over things outside the cosmological horizon, which is how you get Unruh radiation). So, there are no quantum Poincare recurrences. If you plug a stationary mixed state into the FCR interpretation, all the observables become completely frozen in time. If you’re just converging towards a stationary mixed state, I expect the observables to converge towards becoming frozen.