Why is a chess game the opposite of an ideal gas? On short timescales an ideal gas is described by elastic collisions. And a single move in chess can be modeled by a policy network.
The difference is in long timescales: If we simulated elastic collisions for a long time, we’d end up with a complicated distribution over the microstates of the gas. But we can’t run simulations for a long time, so we have to make do with the Boltzmann distribution, which is a lot less accurate.
Similarly, if we rolled out our policy network to get a distribution over chess game outcomes (win/loss/draw), we’d get the distribution of outcomes under self-play. But if we’re observing a game between two players who are better players than us, we have access to a more accurate model based on their Elo ratings.
Can we formalize this? Suppose we’re observing a chess game. Our beliefs about the next move are conditional probabilities of the form P1(xk+1|x0⋯xk), and our beliefs about the next n moves are conditional probabilities of the form Pn(xk+1⋯xk+n|x0⋯xk). We can transform beliefs of one type into the other using the operators
If we’re logically omniscient, we’ll have ΠnP1=Pn and ΣnPn=P1. But in general we will not. A chess game is short enough that Πn is easy to compute, but Σn is too hard because it has exponentially many terms. So we can have a long-term model Pn that is more accurate than the rollout ΠnP1, and a short-term model P1 that is less accurate than ΣnPn. This is a sign that we’re dealing with an intelligence: We can predict outcomes better than actions.
If instead of a chess game we’re predicting an ideal gas, the relevant timescales are so long that we can’t compute Πn or Σn. Our long-term thermodynamic model Pn is less accurate than a simulation ΠnP1. This is often a feature of reductionism: Complicated things can be reduced to simple things that can be modeled more accurately, although more slowly.
In general, we can have several models at different timescales, and Π and Σ operators connecting all the levels. For example, we might have a short-term model describing the physics of fundmental particles; a medium-term model describing a person’s motor actions; and a long-term model describing what that person accomplishes over the course of a year. The medium-term model will be less accurate than a rollout of the short-term model, and the long-term model may be more accurate than a rollout of the medium-term model if the person is smarter than us.
Beliefs at different timescales
Why is a chess game the opposite of an ideal gas? On short timescales an ideal gas is described by elastic collisions. And a single move in chess can be modeled by a policy network.
The difference is in long timescales: If we simulated elastic collisions for a long time, we’d end up with a complicated distribution over the microstates of the gas. But we can’t run simulations for a long time, so we have to make do with the Boltzmann distribution, which is a lot less accurate.
Similarly, if we rolled out our policy network to get a distribution over chess game outcomes (win/loss/draw), we’d get the distribution of outcomes under self-play. But if we’re observing a game between two players who are better players than us, we have access to a more accurate model based on their Elo ratings.
Can we formalize this? Suppose we’re observing a chess game. Our beliefs about the next move are conditional probabilities of the form P1(xk+1|x0⋯xk), and our beliefs about the next n moves are conditional probabilities of the form Pn(xk+1⋯xk+n|x0⋯xk). We can transform beliefs of one type into the other using the operators
(ΠnP1)(xk+1⋯xk+n|x0⋯xk):=n−1∏i=0P1(xk+i+1|x0⋯xk+i)
(ΣnPn)(xk+1|x0⋯xk):=∑xk+2⋯∑xk+nPn(xk+1⋯xk+n|x0⋯xk)
If we’re logically omniscient, we’ll have ΠnP1=Pn and ΣnPn=P1. But in general we will not. A chess game is short enough that Πn is easy to compute, but Σn is too hard because it has exponentially many terms. So we can have a long-term model Pn that is more accurate than the rollout ΠnP1, and a short-term model P1 that is less accurate than ΣnPn. This is a sign that we’re dealing with an intelligence: We can predict outcomes better than actions.
If instead of a chess game we’re predicting an ideal gas, the relevant timescales are so long that we can’t compute Πn or Σn. Our long-term thermodynamic model Pn is less accurate than a simulation ΠnP1. This is often a feature of reductionism: Complicated things can be reduced to simple things that can be modeled more accurately, although more slowly.
In general, we can have several models at different timescales, and Π and Σ operators connecting all the levels. For example, we might have a short-term model describing the physics of fundmental particles; a medium-term model describing a person’s motor actions; and a long-term model describing what that person accomplishes over the course of a year. The medium-term model will be less accurate than a rollout of the short-term model, and the long-term model may be more accurate than a rollout of the medium-term model if the person is smarter than us.