Today’s post is in response to the post “Quantum without complications”, which I think is a pretty good popular distillation of the basics of quantum mechanics.

For any such distillation, there will be people who say “but you missed X important thing”. The limit of appeasing such people is to turn your popular distillation into a 2000-page textbook (and then someone will still complain).

That said, they missed something!

To be fair, the thing they missed isn’t included in most undergraduate quantum classes. But it should be.^[1]

Or rather, there is something that I wish they told me when I was first learning this stuff and confused out of my mind, since I was a baby mathematician and I wanted the connections between different concepts in the world to actually have explicit, explainable foundations and definitions rather than the hippie-dippie timey-wimey bullshit that physicists call rigor.

The specific point I want to explain is the connection between quantum mechanics and probability. When you take a quantum class (or read a popular description like “Quantum without complications”) there is a question that’s in the air, always almost but not quite understood. At the back of your mind. At the tip of your tongue.

It is all around us. Even now, in this very room. You can see it when you look out your window or when you turn on your television. You can feel it when you go to work… when you go to church… when you pay your taxes. It is the world that has been pulled over your eyes to blind you from the truth.

The question is this:

The complex “amplitude” numbers that appear in quantum mechanics really feel like probabilities. But everyone tells us they’re not probabilities. What the hell is going on?

If you are brave, I’m going to tell you about it. Buckle up, Neo.

Quantum mechanics 101

Let me recap the standard “state space” quantum story, as exemplified by (a slight reinterpretation of) that post. Note that (like in the “Quantum without complications” post) I won’t give the most general or the most elegant story, but rather optimize for understandability:

The correct way to model the state of the quantum universe is as a state vector^[2], $| ϕ ⟩ \in H .$ Here the “bra-ket” notation $| \dots ⟩$ is physics notation for “vector”, with $| v ⟩$ representing that we view v as a column vector and $⟨ v |$ representing that we view it as a row vector (we’ll expand this language in footnotes as we go along, since it will be relevant). The “caligraphic” letter $H$ represents a complex Hilbert space, which is (more or less) a fancy term for a (complex) vector space that might be infinite-dimensional. Moreover:
1. We assume the Hilbert space has a basis $S = {| s_{1} ⟩, | s_{2} ⟩, \dots}$ of “pure states” (though different bases give more or less the same physics).
2. We assume the state vector $| ϕ ⟩$ is a unit complex vector when expressed in the state basis: $| ϕ ⟩ = a_{1} | s_{1} ⟩ + a_{2} | s_{2} ⟩ + \dots$ (here note that $a_{i} \in C$ are complex numbers) , so $| a_{1} |^{2} + | a_{2} |^{2} + \dots = 1$ ^[3].
3. Whenever we think about quantum mechanics in this formulation, our Hilbert space actually depends on a “number of particles” parameter: $H = H_{n}$ where n is “the number of particles in the universe”. In terms of the “pure states” basis, the basis of $H_{n}$ is $S_{n} = S_{1}^{n} = S_{1} \times S_{1} \times \dots \times S_{1} .$ In other words, we have a set of “one-particle” pure states $S_{1}$ , and an n-particle pure state is an n-tuple of elements of $S_{1}$ ^[4]. We think of these as “tuples of particle locations”: so if we have single-particle states $S_{1} = | a ⟩, | b ⟩, | c ⟩$ then for n = 4, we have $3^{4} = 81$ 4-particle states and $H = H_{4}$ is 81-dimensional. For example $H$ contains the 4-particle state $| c, a, a, b ⟩$ corresponding to “first particle at c, second particle at a, third particle at a, fourth particle at b,\) and also linear combinations like $| c, a, a, b ⟩ + 2 i | a, b, b, a ⟩$ (note that this last vector is not a unit vector, and needs to be normalized by $\frac{1}{\sqrt{5}}$ in order to be an allowable “state vector”.
Quantum states evolve in time, and the state of your quantum system at any time $t > 0$ is fully determined by its state $ϕ (0)$ at time 0. This evolution is linear, given by the “evolution equation”: $ϕ (t) = U_{t} ϕ (0) .$ Moreover:
1. The operators $U_{t}$ are unitary (a natural complex-valued analog of “orthogonal” real-valued matrices). Note that a complex matrix is unitary if and only if it takes unit vectors to unit vectors.
2. Importantly: Evolution matrices tend to mix states, so if a system started out in a pure state $| ϕ (0) ⟩ = | s_{k} ⟩,$ we don’t in general expect it to stay pure at time $t > 0$ .
3. As t varies, the operator $U_{t}$ evolves exponentially. We can either model this by viewing the time parameter T as discrete, and writing $U_{t} = U_{1}^{t} (= U_{1} \cdot U_{1} \cdot \dots U_{1})$ (as is done in the other blog post), or we can use continuous time and write $U_{t} = exp (- i^H \cdot t),$ where $^H$ is called the “Heisenberg matrix”. Note that in order for $U_{t}$ to be unitary, $^H$ must be Hermitian.
When you model the interaction of a quantum system with an external “observer”, there is a (necessarily destructive, but we won’t get into this) notion of “measurement” $M (| ϕ ⟩)$ . You model measurements as a statistical process whose result is a probability distribution (in the usual sense, not any weird quantum sense) on some set of possible outcomes $o_{1}, o_{2}, \dots$ . Here the probabilities $P (M (| ϕ ⟩ = o_{k})$ are nonnegative numbers which depend on the state $| ϕ ⟩$ and must satisfy $\sum k P (M (| ϕ ⟩ = o_{k}) = 1$ when measured at some fixed quantum state $ϕ$ .
1. The most basic form of measurement $M_{pure}$ , associated to the basis of pure states, returns one of the pure states $| s_{k} ⟩ .$ The probability that the state $s$ is measured is the square norm of the coordinate of $| ϕ ⟩$ at pure state s. As a formula in bra-ket notation: $P (M_{pure} | ϕ ⟩ = s) := | ⟨ s | ϕ ⟩ |^{2} .$ Note that these are the squared norms of the coordinates of a unit complex vector—thus they sum to one, as probabilities should. This is the main reason we want states to be unit vectors.
2. A second kind of measurement is associated to a pair of complementary orthogonal complex projections $π_{1} : H \to H$ and $π_{2} : H \to H$ (as they are orthogonal, $π_{1} + π_{2} = I$ ). The measurement then outputs one of two outcomes $1, 2$ depending on whether it thinks $ϕ$ is in the image of $π_{1}$ or $π_{2}$ . (While $ϕ$ is of course a superposition, the measurement will always reduce the superposition information to a probability in an appropriate way).
3. The above two scenarios have an obvious mutual generalization, associated with a collection of several orthogonal projections $π_{1}, \dots, π_{k}$ which sum to $I$ .

Upshots

The important things to keep in mind from the above:

A state is a complex linear combination of pure states, $a_{1} | s_{1} ⟩ + a_{2} | s_{2} ⟩ + \dots,$ where the $a_{i}$ are complex numbers whose squared norms sum to 1.
States evolve linearly in time. Pure states tend to get “mixed” with time.
A measurement is a process associated to the interaction between the quantum state $| ϕ ⟩$ and a (usually macroscopic) observer. It returns a probability distribution on some set of outcomes that depends on the state $| ϕ ⟩$ : i.e., it converts a quantum phenomenon to a statistical phenomenon.
One standard measurement that is always available returns a distribution over pure states. Its probability of returning the pure state $| s ⟩$ is the squared norm of the s-coordinate of $| ϕ ⟩$ .

Statistical mechanics 101

The process of measurement connects quantum mechanics with statistical mechanics. But even if I hadn’t talked about measurement in the last section, anyone who has studied probability would see a lot of parallels between the last section and the notion of Markov procesess.

Most people are intuitively familiar with Markov processes. A Markov process is a mathematical way of modeling some variable x that starts at some state s (which may be deterministic or already probabilistic) and undergoes a series of random transitions between states. Let me again give a recap:

The correct way to model the state of the universe is as a probability distribution $p$ , which models uncertain knowledge about the universe and is a function from a set of deterministic states $S = {s_{1}, s_{2}, \dots}$ to real numbers. These must satisfy:
1. $p (s) \geq 0$ for each deterministic state s.
2. $\sum_{s \in S} p (s) = 1$ .
We say that a probability distribution p is deterministic if there is a single state s with $p (s^{'}) = {\begin{matrix} 1, s^{'} = s 0, otherwise \end{matrix} .$ In this case we write $p = δ_{s}$ .
Probability distributions evolve in time, and the state $p_{t}$ of your statistical system at any time $t > 0$ is fully determined by its state $p_{t}$ at time 0. This evolution is linear, given by the “evolution equation”: $p_{t} = M_{t} p_{0} .$ In terms of probabilities, the matrix coefficient $(M_{t})_{s^{'}, s}$ is the transition probability, and measures the probability that “if your statistical system was in state s at time 0, it occupies state s’ at time t”. In particular:
1. The operators $M_{t}$ are Markovian (equivalent to the condition that each column is a probability distribution).
2. Importantly: Evolution matrices tend to mix states, so if a system started out in a deterministic state $p_{0} = δ_{s}$ we don’t in general expect it to stay deterministic at time $t > 0$ .
3. As t varies, the operator $M_{t}$ evolves exponentially. We can either model this by viewing the time parameter t as discrete, and writing $M_{t} = M_{1}^{t} (= M_{1} \cdot M_{1} \cdot \dots M_{1})$ , or we can use continuous time and write $M_{t} = exp (Q \cdot t),$ where $Q$ is called the “rate matrix”.

There are a hell of a lot of similarities between this picture and the quantum picture, though of course we don’t have to separately introduce a notion of measurement here: indeed, in the quantum context, measurement converts a quantum state to a probability distribution but in statistics, you have a probability distribution from the start!

However, there are a couple key differences as well. The standard one that everyone notices is that in the quantum picture we used complex numbers and in the statistical picture, we used real numbers. But there’s a much more important and insidious difference that I want to bring your attention to (and that I have been bolding throughout this discussion). Namely:

The “measurement” translation from quantum to statistical states is not linear.

Specifically, the “pure-state measurement” probability variable associated to a quantum state $| ϕ ⟩$ is quadratic in the vector $| ϕ ⟩$ (with coordinates $| ⟨ s | ϕ ⟩ |^{2}$ ).

This seems to dash the hopes of putting both the quantum and statistical pictures of the world on an equal footing, with perhaps some class of “mixed” systems interpolating between them. After all, while the dynamics in both cases are linear, there must be some fundamental nonlinearity in the relationship between the quantum and statistical worlds.

Right?

Welcome to the matrix

We have been lied to (by our quantum mechanics 101 professors. By the popular science magazines. By the well-meaning sci-fi authors). There is no such thing as a quantum state.

Before explaining this, let’s take a step back and imagine that we have to explain probability to an intelligent alien from a planet that has never invented probability. Then here is one possible explanation you can give:

Probability is a precise measure of our ignorance about a complex system. It captures the dynamics of a “minimal bound” on the information we have about a set of “coarse” states in a subsystem S (corresponding to “the measurable quantities in our experimental setup”) inside a large system U (corresponding to a maximally finegrained description of the universe)^[5].

Now whenever we do quantum mechanics, we also implicitly are separate a “large” system into an “experimental setup” and an “environment”. We think of the two as “not interacting very much”, but notably measurement is inherently linked to thinking about the interaction of the system and its environment.

And it turns out that in the context of quantum mechanics, whenever you are studying a subsystem inside a larger environment (e.g. you’re focusing on only a subset of all particles in the universe, an area of space, etc.), you are no longer allowed to use states.

Density matrices

Instead, what replaces the “state” or “wavefunction” from quantum mechanics is the density matrix, which is a “true state” of your system (incorporating the “bounded information” issues inherent with looking at a subsystem). This “true state” is a matrix, or a linear operator, $ρ : H \to H .$ Note here a potential moment of confusion: in the old “state space” picture of quantum mechanics (that I’m telling you was all lies), the evolution operators were matrices from $H \to H$ . The partition matrices happen to live in the same space, but they behave very differently and should by no means be thought of as the same “kind of object”. In particular they are Hermitian rather than unitary.

Now obviously the old picture isn’t wrong. If your system happens to be “the entire universe”, then while I am claiming that you also have this new “density matrix evolution” picture of quantum mechanics, you still have the old “state vector” picture. You can get from one to the other via the following formula:

$ρ = | ϕ ⟩ ⟨ ϕ | .$ In other words, $ρ$ is the rank-1 complex projection matrix associated to your “old-picture” state $ϕ$ .

Now the issue with states is that there is no way to take a universe state $| ϕ_{U} ⟩$ associated to a big system and convert it to a “system state” $| ϕ_{S} ⟩$ associated to a small or coarse subsystem. But there is a way to take the partition matrix $ρ_{U}$ associated to the big system and “distill” the partition matrix $ρ_{S}$ for the subsystem. It’s called “taking a trace”, and while it’s easy to describe in many cases, I won’t do this here for reasons of time and space (in particular, because I haven’t introduced the necessary formalism to talk about system-environment separation and don’t plan to do so).

Going back to the relationship between the quantum state and the partition function: notice that the passage $| ϕ ⟩ \mapsto ρ = | ϕ ⟩ ⟨ ϕ |$ is quadratic. I forgot to bold: it’s quadratic.

What does this mean? Well first of all, this means that the “probability vector” associated to performing a measurement on the state $| ϕ ⟩$ is now a linear operation of the “improved” version of the state, namely the density matrix $ρ = | ϕ ⟩ ⟨ ϕ |$ . This is a big deal! This means that we might be able to have a linear relationship with the “probability world” after all.

But does this mean that the linear evolution that Quantum mechanics posits on the nice vector $| ϕ ⟩$ turns into some quadratic mess? Luckily, the answer is “no”. Indeed, the evolution remains linear. Namely just from the formula, we see the following identity is true for the “universal” state vector^[6]: $ρ_{t} = U_{t} ρ_{0} U_{t}^{- 1} .$ Now if you expand, you see that each entry of $ρ_{t}$ is linear in entries of $ρ_{0}$ . Thus evolution is given by a linear “matrix conjugation” operator $Conj (U)_{t} : {Op}_{H} \to {Op}_{H},$ where “Op” denotes the vector space of operators from $H$ to itself. Moreover, the evolution operators $Conj (U)_{t}$ are unitary^[7].

So what we’ve developed is a new picture:

The “state” vector $| ϕ ⟩ \in H$ turns into the “density” matrix $ρ : H \to H .$
“Evolution” operator $U_{t}$ turns into the “conjugation” operator $Conj (U)_{t}$ .

So now comes the big question. What if instead of the “whole universe”, we are only looking at the dynamics of the “limited information” subsystem? Turns out there are two options here, depending on whether the Hilbert space $H_{S}$ associated with the subsystem is “coupled” (i.e., exchanges particles/ energy/ etc.) with the Hilbert space $H_{U ∖ S}$ of the “environment” (a.k.a. the “rest of the universe”).

If $H_{S}$ is uncoupled to its environment (e.g. we are studying a carefully vacuum-isolated system), then we still have to replace the old state vector picture $| ϕ ⟩ \in H$ by a (possibly rank $> 1$ ) density matrix $ρ \in {Op}_{H}$ , the evolution on the density matrix is still nice and unitary, given by $Conj (U)_{t}$ .
On the other hand, if the system $H_{S}$ is coupled to its environment, the dynamics is linear but no longer unitary. (At least not necessarily). Instead of the unitary evolution, the dynamics on the “density matrix” space of operators ${Op}_{H}$ is given by the “Lindbladian” evolution formula^[8] (also called the “GKSL master equation”, but that name sounds less cool).

So at the end of the day we see two new things that occur when modeling any realistic quantum system:

The relevant dynamics happens on the level of the density matrix. This makes the results of measurement linear in the state when viewed as a probability vector.
The linear evolution matrix is not unitary.

In fact, we can say more: the new dynamics interpolates between the unitary dynamics of “fully isolated quantum systems” and the Markovian dynamics of the stochastic evolution picture. In fact, if the interaction between the system and its environment exhibits weak coupling and short correlation-time (just words for now that identify a certain asymptotic regime, but note that most systems are like this macroscopically), then the Lindbladian dynamics becomes Markovian (at a suitable time step). Specifically if there are $N$ states, the density matrix at any point in time has $N^{2}$ terms. In this asymptotic regime, all the dynamics reduces to the dynamics of the diagonal density matrices, the N linear combinations of matrices of the form $| s ⟩ ⟨ s |$ , though the different diagonal terms can get mixed. And on large timescales, this mixing is exactly described by a Markov process.

If you’ve followed me along this windy path, you are now awakened. You know three things:

We live in the matrix. All that we can observe are matrix-shaped density matrices and—that is unless you want to take the blue pill and live in a comfortable make-believe world which is, literally, “out of contact with reality [of the external environment]”—there is no such thing as a quantum state.
Statistics (and specifically Markov processes) are part of the same class of behaviors as unitary quantum evolution. In fact, statistical processes are the “default” for large systems.
Realistic small systems which exist in an environment will exhibit a mix of probabilistic and quantum behaviors.

So can I brag to people that I’ve resolved all the “multiverse/decoherence” issues now?

Not really. Certainly, you can fully understand “measurement” in terms of these “corrected” quantum dynamics—it’s no longer a mystery (and has not been for a very long time). And you can design toy models where running dynamics on a “multiverse” exhibits everything a natural splitting into quantum branches and gives everything you want from decoherence. But the larger question of why and how different quantum “branches” decohere in our real, non-toy universe is still pretty hard and not a little mysterious. (I might write a bit more about this later, but I don’t have any groundbreaking insights for you here.)

Who ordered that?

This is the famous apocryphal question asked by the physicist Isidor Isaac Rabi in response to the discovery of yet another elementary particle (the muon). So who ordered this matrix-flavored craziness, that the correct way to approach modeling quantum systems is by evolving a matrix (entries indexed by pairs of configurations) rather than just a single state?

In this case there actually is an answer: Liouville. Liouville ordered that. Obviously Liouville didn’t know about quantum mechanics, but he did know about phase space^[9]. Here I’m going to get a little beyond our toy “Quantum 101” and talk about wavefunctions (in an very, very hand-wavy way. Get it—waves). Namely, something interesting that happens when performing “quantization”: passing from usual mechanics to quantum mechanics is that, weirdly, “space gets smaller”. Indeed, knowing a bunch of positions of particles is not sufficient to know how they evolve in the classical world: you also need to know their velocities (or equivalently, momenta). So for example in single-particle classical physics in three dimensions, the evolution equation you get is not on single-particle “configuration space” $R^{3}$ , but on the space of (position, momentum) pairs, which is $R^{3 + 3} = R^{6}$ . In “wavefunction” quantum mechanics, your quantum state loses half of its dimension: the evolution occurs on just 3-dimensional wavefunctions. This is to some extent unavoidable: the uncertainty principle tells you that you can’t independently set the position and the momentum of a particle, since position and momentum are actually two separate bases of the Hilbert space of wavefunctions. But on the other hand, like, classical physics exists. This means that in some appropriate “local/coarsegrained” sense of a particle in a box separated (but entangeled) from the environent of the rest of the universe, position and momentum are two meaningful quantities that can sort of cooccur.

Now there is a certain very natural and elegant quantum-classical comparison, called the “Wigner-Weyl transform”, that precisely relates the space of operators on $R^{3}$ (or a more general configuration space) and functions on the phase space $R^{3 + 3}$ (or a more general phase space). Thus, when we think in the “density matrix” formalism, there is a natural translation of states and evolutions between them which (approximately) translates phase-space dynamics and density-space dynamics. So in addition to all the good properties of the density matrix formalism that I’ve (badly) explained above, we see a reasonable explanation for something else that was mysterious and nonsensical in the “typical” quantum story.

But don’t worry. If you’re attached to your old nice picture of quantum mechanics where states are wavefunctions and evolution is unitary and nothing interesting ever happens, there’s always the blue pill. The wavefunction will always be there.

^
Along with the oscillating phase expansion, basics on Lie groups, $⋆$ products, and the Wigner-Weyl transform. Oh and did I mention that an intro quantum class should take 3 semesters, not one?
^
Often called a “wavefunction”
^
In terms of the bra-ket notation physicists write this requirement as $⟨ ϕ | ϕ ⟩ = 1.$ The way you’re supposed to read this notation is as follows:
- If the “ket” $| ϕ ⟩ = ⎛ ⎜ ⎜ ⎝ \begin{matrix} a_{1} a_{2} ⋮ \end{matrix} ⎞ ⎟ ⎟ ⎠$ is a column vector of complex numbers, then the same vector written as a “bra” $⟨ ϕ |$ fmeans $⟨ ϕ | := {¯ ϕ}^{T} = ({¯ a}_{1}, {¯ a}_{2}, \dots) .$ Here the notation $¯ a$ denotes “complex conjugate”.
- When we write a ket and a bra together, we’re performing matrix multiplication. So $⟨ v | v ⟩ = v^{T} \cdot ¯ v$ as above denotes “horizontal times vertical” vector multiplication (which is dot product and gives a scalar) and $| v ⟩ ⟨ v |$ denotes “vertical times horizontal” vector multiplication (which is external multiplication and gives a matrix). A good heuristic to remember is that “stuff between two brackets $⟨ \dots ⟩$ is a scalar and stuff between two pipes $| \dots |$ is a matrix.
^
There is often some discussion of distinguishable vs. indistinguishable particles, but it will not be relevant here and we’ll ignore it.
^
I initially wrote this in the text, but decided to replace with a long footnote (taking a page from @Kaarel), since it’s not strictly necessary for what follows.
A nice way to make this precise is to imagine that in addition to our collection of “coarse states” $S = {s_{1}, s_{2}, \dots, s_{m}},$ which encode “information about the particular system in question”, there is a much larger collection of “fine states” $U = {u_{1}, u_{2}, \dots, u_{N}}$ which we think of as encoding “all the information in the universe”. (For convenience we assume both sets are finite.) For example perhaps the states of our system are 5-particle configurations, but the universe actually contains 100 particles (or more generally, our subsystem only contains coarse-grained information, like the average of a collection of particles, etc.). Given a state of the universe, i.e. a state of the “full/fine system”, we are of course able to deterministically recover the state of our subsystem. I.e., we have a “forgetting information” map: $F : U \to S .$ In the case above of 5 particles in a 100-particle universe, the map F “forgets” all the particle information except the states of the first 5 particles. Conversely, given a “coarse” state $s \in S$ , we have some degree of ignorance about the fine “full system” state $u \in U$ that underlies it. We can measure this ignorance by associating to each coarse state a set $U_{s} := F^{- 1} (s) \subset U,$ namely its preimage under the forgetting map.
Now when thinking of a Markov process, we assume that there is an “evolution” mapping $A : U \to U$ that “evolves” a state of the universe to a new state of the universe in a deterministic way. Now given such an evolution on the “full system” states, we can try to think what “dynamics” it implies on subsystem states $S$ . To this end, we define the real number $M_{t} (s, s^{'})$ to be the average over $U_{s}$ (universe states underlying S) of the indicator function $δ_{F (u)} = s^{'}$ . De-tabooing the word “probability”, this is just the probability that a random “total” state underlying the coarse state s maps to a “total” state underlying s’ after time t.
Now in general, it doesn’t have to be the case that on the level of matrices, we have the Markov evolution behavior: e.g. that $M_{2} = M_{1}^{2}$ . For example we might have chosen the evolution mapping $A : U \to U$ to be an involution with $A^{2} = I,$ in which case $M_{2}$ is the identity matrix (whereas $M_{1}$ might have been essentially arbitrary). However there is an inequality involving entropy (that I’m not going to get into—but note that entropy is explainable to the alien as just a deterministic function on probability distribution “vectors”) that for a given value of the single-transition matrix $M_{1}$ , the least possible information you may have about the double-transition matrix $M_{2} (s)$ is in a suitable sense “bounded” by $M_{1}^{2}$ . Moreover, there is a specific choice of “large system” dynamics, sometimes called a “thermal bath”, which gives us time evolution $M_{k}$ that is (arbitrarily close to) $M_{1}$ . Moreover, any system containing a thermal bath will have no more information about multistep dynamics than a thermal bath. Thus in the limit of modeling “lack of information” about the universe, but conditional on knowing the single-time step coarse transformation matrix $M_{1}$ , it makes sense to “posit” that our k-step dynamics is $M_{1}^{k}$ .
^
To prove the following formula holds, all we need is the identity $U^{†} = U^{- 1}$ for unitary matrices. Here the “dagger” notation is a matrix version of $| ϕ ⟩ \mapsto ⟨ ϕ |,$ and takes a matrix to its “complex conjugate transpose” $U^{†} := {¯ U}^{T}$ .
^
Note that instead of all operators here, it would be sufficient to only look at the (real, not complex) subspace of Hermitian operators which satisfy $ρ^{†} = ρ$ . In this case, lacking complex structure, evolution would no longer be unitary: it would be orthogonal instead.
^
If you read the massive footnote about “explaining probability theory to an alien” above, you know that whenever we talk about probabilities we are making a secret implicit assumption that we are in the “worst-case” informational environment, where knowing dynamics on the “coarse” system being observed gives minimal information about the environment—this can be guaranteed by assuming the environment contains a “thermal bath”. The same story applies here: a priori, it’s possible that there is some highly structured interaction between the system and the environment that lets us make a “more informative” picture of the evolution, that would depend on the specifics of system-environment interaction; but if we assume that interactions with the environment are “minimally informative”, then any additional details about the rest of the universe get “integrated out” and the Lindbladian is the “true answer” to the evolution dynamics.
^
The history is actually a bit tangled here with the term attributed to various people—it seems the first people to actually talk about phase space in the modern way were actually Ludwig Boltzmann, Henri Poincaré, and Josiah Willard Gibbs.

The quantum red pill or: They lied to you, we live in the (density) matrix