That’s one way of putting it, yeah: the band who want to explore a new sound, the hunter who gets sick of eating deer every day, and the LLM with an entropy term in its reward function are all of the same ilk.
I think J Bostock has a good explanation (see the other reply to my comment). I put some more context at the bottom of this comment.
In physics, systems tend to minimize the free energy , not the enthalpy . In all other branches of mathematics (e.g. game theory, RL), they use the right sign convention (where energy is not negative) so you would say systems tend to maximize the free energy , not the enthalpy .
If you are purely maximizing enthalpy, everything will go to the highest-enthalpy state. This is the mode collapse issue you see. But why is the system purely optimizing for enthalpy? Either the temperature is very low, or more likely there are hidden constraints and the enthalpies you see are not actually the enthalpies you get. For example: bias in your loss function, committees judging based on historical convention instead of merit, or fans pushing away fans of other genres.
If your issue is a low temperature, you can anneal to find better outcomes (basically, increase the temperature so entropy is more important leading to more exploration, then decrease the temperature; when you anneal a steel sword, the atoms are doing exactly this—finding better alignments with each other, which makes a more uniform crystal lattice, leading to a less brittle sword).
Once entropy is a consideration, you will still get exponentially more high-enthalpy states, but you get a bigger spread into other states, which prevents mode collapse.
Background
I wrote this awhile ago, but never published it. I think it’s a good primer.
Definitions
Enthalpy (H): The kinetic and potential energy (ability to do work).
Entropy (S): The (logarithm of the) number of possible states.
Temperature (T): (One over) imaginary time.
Gibbs free energy (G):, the log-likelihood (logit) of encountering a particular kind of system.
Suppose we have a bunch of possible states with energies , and an atom (or molecule, or something bigger) is in each state with probability . If we have atoms, the number of possible states is
from Sterling’s approximation, up to a multiplicative constant. Taking a logarithm, we get
In real life, a kilogram of stuff has on the order of atoms, so the second term is going to be much, much larger. Then the entropy,
is pretty much the log-number of states for each atom. If an atom is equally likely to be in any state, then we would expect atoms to exist in systems where many more states are available. This is pretty much where the second law of thermodynamics comes from: systems tend to end up in places with many more states, i.e. higher entropy. Of course, not every state is equally likely, since some take more energy to get into than others. Suppose we have an isolated system, so there is a fixed supply of energy to go around. If we want to maximize the entropy, under the condition
for some fixed , then Lagrange multipliers gives
This is very reminiscent of Schroedinger’s equation,
where (the Hamiltonian) is a matrix where and is the coupling (complex-valued transition rate) between states and . For this reason, temperature is best thought of as inverse imaginary time:
However, we won’t always end up in the highest-entropy systems, we’re just more likely to because there are more available states. How much more likely? It should be proportional to the number of available states, . So, the probability of encountering any given system is proportional to
That term in the numerator is known as the Gibbs free energy. We tend to end up with systems that maximize this free energy, not enthalpy or entropy. If a system does not, there are three possibilities:
We need to let it run a little longer.
It moves through time with a phase shift, , rather than real-valued time (ETA: e.g. in macroeconomics, what this unfinished post was going to be about).
We’re missing a constraint. Perhaps we’re shining a laser at the atoms so the transition to a higher energy state is subsidized, or perhaps there is a filter that blocks larger molecules from one half of the experiment.
If you there are rewards associated with actions, and you choose a distribution over the action space, then the enthalpy is your expected reward and your entropy is the entropy of the distribution. The free energy depends on a choice of a temperature , and is (up to a constant shift up or down) . Maximizing the free energy ensures you do some exploration of suboptimal choices; the equation is that if is the reward for action , then your distribution should be , which can be derived using Lagrange multipliers.
I don’t think this is sufficient to avoid modal collapse, however. LLMs when they are trained on their own output experience modal collapse, but they are using this very same equation for the distribution of next-tokens.
Maybe there is some way to reward exploration explicitly, so that less-frequent tokens have a boost in visibility, but that sounds like a research question, not established fact.
Hmm, isn’t the issue that you’re enthalpymaxxing instead of free-energymaxxing?
That’s one way of putting it, yeah: the band who want to explore a new sound, the hunter who gets sick of eating deer every day, and the LLM with an entropy term in its reward function are all of the same ilk.
Say more? I only know about enthalpy in the physical sense. What does it mean here, and how would switching to free energy change things?
I think J Bostock has a good explanation (see the other reply to my comment). I put some more context at the bottom of this comment.
In physics, systems tend to minimize the free energy , not the enthalpy . In all other branches of mathematics (e.g. game theory, RL), they use the right sign convention (where energy is not negative) so you would say systems tend to maximize the free energy , not the enthalpy .
If you are purely maximizing enthalpy, everything will go to the highest-enthalpy state. This is the mode collapse issue you see. But why is the system purely optimizing for enthalpy? Either the temperature is very low, or more likely there are hidden constraints and the enthalpies you see are not actually the enthalpies you get. For example: bias in your loss function, committees judging based on historical convention instead of merit, or fans pushing away fans of other genres.
If your issue is a low temperature, you can anneal to find better outcomes (basically, increase the temperature so entropy is more important leading to more exploration, then decrease the temperature; when you anneal a steel sword, the atoms are doing exactly this—finding better alignments with each other, which makes a more uniform crystal lattice, leading to a less brittle sword).
Once entropy is a consideration, you will still get exponentially more high-enthalpy states, but you get a bigger spread into other states, which prevents mode collapse.
Background
I wrote this awhile ago, but never published it. I think it’s a good primer.
Definitions
Enthalpy (H): The kinetic and potential energy (ability to do work).
Entropy (S): The (logarithm of the) number of possible states.
Temperature (T): (One over) imaginary time.
Gibbs free energy (G): , the log-likelihood (logit) of encountering a particular kind of system.
Suppose we have a bunch of possible states with energies , and an atom (or molecule, or something bigger) is in each state with probability . If we have atoms, the number of possible states is
from Sterling’s approximation, up to a multiplicative constant. Taking a logarithm, we get
In real life, a kilogram of stuff has on the order of atoms, so the second term is going to be much, much larger. Then the entropy,
is pretty much the log-number of states for each atom. If an atom is equally likely to be in any state, then we would expect atoms to exist in systems where many more states are available. This is pretty much where the second law of thermodynamics comes from: systems tend to end up in places with many more states, i.e. higher entropy. Of course, not every state is equally likely, since some take more energy to get into than others. Suppose we have an isolated system, so there is a fixed supply of energy to go around. If we want to maximize the entropy, under the condition
for some fixed , then Lagrange multipliers gives
This is very reminiscent of Schroedinger’s equation,
where (the Hamiltonian) is a matrix where and is the coupling (complex-valued transition rate) between states and . For this reason, temperature is best thought of as inverse imaginary time:
However, we won’t always end up in the highest-entropy systems, we’re just more likely to because there are more available states. How much more likely? It should be proportional to the number of available states, . So, the probability of encountering any given system is proportional to
That term in the numerator is known as the Gibbs free energy. We tend to end up with systems that maximize this free energy, not enthalpy or entropy. If a system does not, there are three possibilities:
We need to let it run a little longer.
It moves through time with a phase shift, , rather than real-valued time (ETA: e.g. in macroeconomics, what this unfinished post was going to be about).
We’re missing a constraint. Perhaps we’re shining a laser at the atoms so the transition to a higher energy state is subsidized, or perhaps there is a filter that blocks larger molecules from one half of the experiment.
If you there are rewards associated with actions, and you choose a distribution over the action space, then the enthalpy is your expected reward and your entropy is the entropy of the distribution. The free energy depends on a choice of a temperature , and is (up to a constant shift up or down) . Maximizing the free energy ensures you do some exploration of suboptimal choices; the equation is that if is the reward for action , then your distribution should be , which can be derived using Lagrange multipliers.
I don’t think this is sufficient to avoid modal collapse, however. LLMs when they are trained on their own output experience modal collapse, but they are using this very same equation for the distribution of next-tokens.
Maybe there is some way to reward exploration explicitly, so that less-frequent tokens have a boost in visibility, but that sounds like a research question, not established fact.
Edit: I have a sign error. It should be here.