Common sense quantum mechanics

Related to: Quantum physics sequence.

TLDR: Quantum mechanics can be derived from the rules of probabilistic reasoning. The wavefunction is a mathematical vehicle to transform a nonlinear problem into a linear one. The Born rule that is so puzzling for MWI results from the particular mathematical form of this functional substitution.

This is a brief overview a recent paper in Annals of Physics (recently mentioned in Discussion):

Quantum theory as the most robust description of reproducible experiments (arXiv)

by Hans De Raedt, Mikhail I. Katsnelson, and Kristel Michielsen. Abstract:

It is shown that the basic equations of quantum theory can be obtained from a straightforward application of logical inference to experiments for which there is uncertainty about individual events and for which the frequencies of the observed events are robust with respect to small changes in the conditions under which the experiments are carried out.

In a nutshell, the authors use the “plausible reasoning” rules (as in, e.g., Jaynes’ Probability Theory) to recover the quantum-physical results for the EPR and SternGerlach experiments by adding a notion of experimental reproducibility in a mathematically well-formulated way and without any “quantum” assumptions. Then they show how the Schrodinger equation (SE) can be obtained from the nonlinear variational problem on the probability P for the particle-in-a-potential problem when the classical Hamilton-Jacobi equation holds “on average”. The SE allows to transform the nonlinear variational problem into a linear one, and in the course of said transformation, the (real-valued) probability P and the action S are combined in a single complex-valued function ~P12exp(iS) which becomes the argument of SE (the wavefunction).

This casts the “serious mystery” of Born probabilities in a new light. Instead of the observed frequency being the square(d amplitude) of the “physically fundamental” wavefunction, the wavefunction is seen as a mathematical vehicle to convert a difficult nonlinear variational problem for inferential probability into a manageable linear PDE, where it so happens that the probability enters the wavefunction under a square root.

Below I will excerpt some math from the paper, mainly to show that the approach actually works, but outlining just the key steps. This will be followed by some general discussion and reflection.

1. Plausible reasoning and reproducibility

The authors start from the usual desiderata that are well laid out in Jaynes’ Probability Theory and elsewhere, and add to them another condition:

There may be uncertainty about each event. The conditions under which the experiment is carried out may be uncertain. The frequencies with which events are observed are reproducible and robust against small changes in the conditions.

Mathematically, this is a requirement that the probability P(x|θ,Z) of observation x given an uncertain experimental parameter θ and the rest of out knowledge Z, is maximally robust to small changes in θ and independent of θ. Using log-probabilities, this amounts to minimizing the “evidence”

for any small ε so that |Ev| is not a function of θ (but the probability is).

2. The EinsteinPodolskyRosenBohm experiment

There is a source S that, when activated, sends a pair of signals to two routers R1,2. Each router then sends the signal to one of its two detectors Di+,– (i=1,2). Each router can be rotated and we denote as θ the angle between them. The experiment is repeated N times yielding the data set {x1,y1}, {x2,y2}, … {xN,yN} where x and y are the outcomes from the two detectors (+1 or –1). We want to find the probability P(x,y|θ,Z).

After some calculations it is found that the single-trial probability can be expressed as P(x,y|θ,Z) = (1 + xyE12(θ) ) /​ 4, where E12(θ) = Σx,y=+–1 xyP(x,y|θ,Z) is a periodic function.

From the properties of Bernoulli trials it follows that, for a data set of N trials with nxy total outcomes of each type {x,y},

and expanding this in a Taylor series it is found that

The expression in the sum is the Fisher information IF for P. The maximum robustness requirement means it must be minimized. Writing it down as IF = 1/​(1 – E12(θ)2) (dE12(θ)/​dθ)2 one finds that E12(θ) = cos(θIF12 + φ), and since E12 must be periodic in angle, IF12 is a natural number, so the smallest possible value is IF = 1. Choosing φ = π it is found that E12(θ) = –cos(θ), and we obtain the result that

which is the well-known correlation of two spin-1/​2 particles in the singlet state.

Needless to say, our derivation did not use any concepts of quantum theory. Only plain, rational reasoning strictly complying with the rules of logical inference and some elementary facts about the experiment were used

3. The SternGerlach experiment

This case is analogous and simpler than the previous one. The setup contains a source emitting a particle with magnetic moment S, a magnet with field in the direction a, and two detectors D+ and D.

Similarly to the previous section, P(x|θ,Z) = (1 + xE(θ) ) /​ 2, where E(θ) = P(+|θ,Z) – P(–|θ,Z) is an unknown periodic function. By complete analogy we seek the minimum of IF and find that E(θ) = +–cos(θ), so that

In quantum theory, [this] equation is in essence just the postulate (Born’s rule) that the probability to observe the particle with spin up is given by the square of the absolute value of the amplitude of the wavefunction projected onto the spin-up state. Obviously, the variability of the conditions under which an experiment is carried out is not included in the quantum theoretical description. In contrast, in the logical inference approach, [equation] is not postulated but follows from the assumption that the (thought) experiment that is being performed yields the most reproducible results, revealing the conditions for an experiment to produce data which is described by quantum theory.

To repeat: there are no wavefunctions in the present approach. The only assumption is that a dependence of outcome on particle/​magnet orientation is observed with robustness/​reproducibility.

4. Schrodinger equation

A particle is located in unknown position θ on a line segment [–L, L]. Another line segment [–L, L] is uniformly covered with detectors. A source emits a signal and the particle’s response is detected by one of the detectors.

After going to the continuum limit of infinitely many infinitely small detectors and accounting for translational invariance it is possible to show that the position of the particle θ and of the detector x can be interchanged so that dP(x|θ,Z)/​dθ = –dP(x|θ,Z)/​dx.

In exactly the same way as before we need to minimize Ev by minimizing the Fisher information, which is now

However, simply solving this minimization problem will not give us anything new because nothing so far accounted for the fact that the particle moves in a potential. This needs to be built into the problem. This can be done by requiring that the classical Hamilton-Jacobi equation holds on average. Using the Lagrange multiplier method, we now need to minimize the functional

Here S(x) is the action (Hamilton’s principal function). This minimization yields solutions for the two functions P(x|θ,Z) and S(x). It is a difficult nonlinear minimization problem, but it is possible to find a matching solution in a tractable way using a mathematical “trick”. It is known that standard variational minimization of the functional

yields the Schrodinger equation for its extrema. On the other hand, if one makes the substitution combining two real-valued functions P and S into a single complex-valued ψ,

Q is immediately transformed into F, concluding the derivation of the Schrodinger equation. Incidentally, ψ is constructed so that P(x|θ,Z) = |ψ(x|θ,Z)|2, which is the Born rule.

Summing up the meaning of Schrodinger equation in the present context:

Of course, a priori there is no good reason to assume that on average there is agreement with Newtonian mechanics … In other words, the time-independent Schrodinger equation describes the collective of repeated experiments … subject to the condition that the averaged observations comply with Newtonian mechanics.

The authors then proceed to derive the time-dependent SE (independently from the stationary SE) in a largely similar fashion.

5. What it all means

Classical mechanics assumes that everything about the system’s state and dynamics can be known (at least in principle). It starts from axioms and proceeds to derive its conclusions deductively (as opposed to inductive reasoning). In this respect quantum mechanics is to classical mechanics what probabilistic logic is to classical logic.

Quantum theory is viewed here not as a description of what really goes on at the microscopic level, but as an instance of logical inference:

in the logical inference approach, we take the point of view that a description of our knowledge of the phenomena at a certain level is independent of the description at a more detailed level.

and

quantum theory does not provide any insight into the motion of a particle but instead describes all what can be inferred (within the framework of logical inference) from or, using Bohr’s words, said about the observed data

Such a treatment of QM is similar in spirit to Jaynes’ Information Theory and Statistical Mechanics papers (I, II). Traditionally statistical mechanics/​thermodynamics is derived bottom-up from the microscopic mechanics and a series of postulates (such as ergodicity) that allow us to progressively ignore microscopic details under strictly defined conditions. In contrast, Jaynes starts with minimum possible assumptions:

“The quantity x is capable of assuming the discrete values xi … all we know is the expectation value of the function f(x) … On the basis of this information, what is the expectation value of the function g(x)?”

and proceeds to derive the foundations of statistical physics from the maximum entropy principle. Of course, these papers deserve a separate post.

This community should be particularly interested in how this all aligns with the many-worlds interpretation. Obviously, any conclusions drawn from this work can only apply to the “quantum multiverse” level and cannot rule out or support any other many-worlds proposals.

In quantum physics, MWI does quite naturally resolve some difficult issues in the “wavefunction-centristic” view. However, we see that the concept wavefunction is not really central for quantum mechanics. This removes the whole problem of wavefunction collapse that MWI seeks to resolve.

The Born rule is arguably a big issue for MWI. But here it essentially boils down to “x is quadratic in t where t = sqrt(x)”. Without the wavefunction (only probabilities) the problem simply does not appear.

Here is another interesting conclusion:

if it is difficult to engineer nanoscale devices which operate in a regime where the data is reproducible, it is also difficult to perform these experiments such that the data complies with quantum theory.

In particular, this relates to the decoherence of a system via random interactions with the environment. Thus decoherence becomes not as a physical intrinsically-quantum phenomenon of “worlds drifting apart”, but a property of experiments that are not well-isolated from the influence of environment and therefore not reproducible. Well-isolated experiments are robust (and described by “quantum inference”) and poorly-isolated experiments are not (hence quantum inference does not apply).

In sum, it appears that quantum physics when viewed as inference does not require many-worlds any more than probability theory does.