Time and Energy Costs to Erase a Bit

DaemonicSigil6 May 2023 23:29 UTC

24 points

Landauer derived a limit on the amount of energy dissipated in order to erase a bit. This limit is $k T log 2$ , where $k$ is Boltzmann’s constant and $T$ is the temperature of the environment.

However, recently some questions have been raised as to what situations this limit applies to. Maybe using $k T log 2$ energy only gives you a 50% chance of erasing the bit, which is nearly useless. Does $k T log 2$ suffice to allow us to erase a bit with high reliability, or is much more energy (eg. $50 k T$ ) required to do this? We’ll analyze the energy cost to reliably erase a bit in this post, while also considering the time taken to do so.

The argument for ~ $50 k T$

First let’s examine the argument for $50 k T$ . Consider that the bit needs to represented with two possible states. For simplicity, let’s say we have a particle which can be in either of the two states. To keep the particle from jumping from one state to the other, there needs to be an energy wall between the two states. And this wall needs to be high enough that the particle doesn’t spontaneously jump between them due to thermal noise. If we want a decent reliability (say $< 10^{- 15}$ probability of error), then giving the wall a height of something like $50 k T$ should be more than sufficient, since the chance of jumping the wall goes with $e^{- E_{wall} / k T}$ .

Now to erase a bit, we raise the energy of the state on one side of the wall. If the bit is a 1, then the energy of the particle increases until it is able to jump the wall and land on the other side. (Or if the bit is a 0, then the particle just stays on the other side and nothing happens.) Since we have to raise the particle quite high for it to be able to jump the wall, nearly the entire height of the wall, the energy cost is equal to nearly the height of the wall. So if the bit had a 50% chance of being a 1, then the expected energy cost to erase a bit in this scenario is about $25 k T$ (we don’t have to pay any energy cost to raise the energy of a state if the particle is not currently occupying that state).

Simple straightforward argument for ~ $k T$

First note that we only need the energy wall between states to be high while we care about the value of the bit and are doing computations with it. Once it comes time to erase a bit, we no longer care at all about whether the bit gets flipped by thermal noise. We’re going to be erasing it in any case.

So consider a computer made out of 2 parts: The CPU and the Radiator. The CPU is a futuristic reversible computer. It’s actively cooled to a few Kelvin, which uses some energy, but it’s well-insulated enough that the quantity of energy required is relatively small. The CPU isn’t responsible for erasing any information. Erasing information at low temperature would be proportionally cheaper, but the energy cost for a refrigerator to pull out the heat produced by the erasures would cancel out the benefit of the low temperature.

Whenever the CPU needs to erase a bit, it sends that bit to the Radiator, which is responsible for doing the erasure. The Radiator is exposed to the room temperature environment, and the heat produced by erasing all the bits flows from the Radiator to the environment. If the environment is at temperature $T$ and the CPU is at temperature $\frac{T}{100}$ , then the energy wall required to keep bits stable in the CPU is something like $50 k \frac{T}{100} = \frac{1}{2} k T$ . But lifting a particle by $\frac{1}{2} k T$ is a perfectly reasonable energy cost, on the same order as the $k T log 2$ in additional energy needed to fully complete the erasure.

Computational Study: Time costs and energy costs for reliable bit erasure

Here we consider the problem of erasing a bit in more detail. We won’t worry about having a separately cooled CPU in this model. We’ll show that even at constant temperature $T$ , it’s possible to erase a bit with high reliability in a reasonable amount of time for a cost of approximately $k T$ .

Choosing a model for the flow rate of particles between states

In this model, we’ll have $N$ discrete states, indexed by $i$ , each with a definite energy level $E_{i}$ . We’ll study a process that runs from $t = 0$ to $t = t_{f}$ , and we have control of the energy level of each state, written $E_{i} (t)$ or collectively $\to E (t)$ . In this model, information is represented by the position of a single particle, which can be in any of the states. We have a probability distribution over states which varies with time, $\to p (t)$ .

States can be either connected or disconnected. We represent this by defining $C_{i j}$ . If state $i$ is connected to state $j$ , then $C_{i j} = 1$ . Otherwise, $C_{i j} = 0$ . We also have a characteristic time constant $τ$ , which represents how fast the particle tends to jump between states (due to thermal noise from the environment).

We’ll use the following equation for how $\to p (t)$ varies with time, assuming we’re given $\to E (t)$ .

$\frac{d p_{i} (t)}{d t} = \frac{1}{τ} \sum j C_{i j} (p_{j} (t) e^{β min (E_{j} (t) - E_{i} (t), 0)} - p_{i} (t) e^{β min (E_{i} (t) - E_{j} (t), 0)})$

This formula results in a probability distribution for a particle’s state that converges to the Boltzmann distribution over time (at least if the states form a connected graph). In particular, note that for two states that are connected, we have the following steady state condition:

$\frac{p_{2}}{p_{1}} = e^{β (E_{1} - E_{2})} = \frac{e^{- β E_{2}}}{e^{- β E_{1}}}$

There are some other possible formulas which also satisfy this condition, but this one is a particularly good model because it limits the maximum flow rate between states to $\frac{1}{τ}$ , which is a relatively pessimistic assumption for the time cost to erase a bit. This formula is also essentially a continuous-time version of the transition probabilities that show up in the Metropolis-Hastings algorithm. Finally, this formula gives flow rates that match those that Landauer gives for his double-well model. (I’m happy to elaborate on this in a comment if anyone asks.)

As for energy costs to complete an operation, the formula for expected energy cost looks like this:

$⟨ E_{cost} ⟩ = \int_{0}^{t_{f}} d t \sum i p_{i} (t) E_{i}^{'} (t)$

This just says that the energy cost of changing the energy of a state by $Δ E$ is equal to the probability that the particle is in that state times $Δ E$ .

Code

This simulation computes the time and energy cost to erase a bit with error probability $10^{- 15}$ .

import numpy as np
import matplotlib.pyplot as plt

# constants:
dt = 0.01


def renormalize_probs(p):
    p = np.maximum(p, 0.)
    return p/p.sum()

def calc_energy_cost(p0, get_energies, connection_matrix, reliability_threshold=1e-15, dest_state=0):
    """ diagonal elements of connection matrix should be 0 """
    p = p0
    t = 0.
    total_cost = 0.
    E_hist = [get_energies(t)]
    p_hist = [p0.copy()]
    cost_hist = [0.]
    while np.sum(p[:dest_state]) + np.sum(p[dest_state + 1:]) > reliability_threshold:
        t += dt
        E = get_energies(t)
        dE = E - E_hist[-1]
        total_cost += np.sum(dE*p)
        E_diff = E.reshape((1, -1)) - E.reshape((-1, 1))
        flow_matrix = connection_matrix*np.exp(np.minimum(E_diff, 0)) # flow from j to i
        dp_dt = flow_matrix - np.diag(flow_matrix.sum(0))
        p += dt*(dp_dt @ p)
        p = renormalize_probs(p)
        E_hist.append(E)
        p_hist.append(p)
        cost_hist.append(total_cost)
    # print result
    print("time cost:", t)
    print("energy cost", total_cost)
    # plot result
    fig, axes = plt.subplots(3)
    ax1, ax2, ax3 = axes
    ax1.set(ylabel="E_i"); ax2.set(ylabel="p_i"); ax3.set(ylabel="E_cost", xlabel="t")
    t_data = dt * np.arange(len(cost_hist))
    E_data = np.stack(E_hist, axis=1)
    p_data = np.stack(p_hist, axis=1)
    for i in range(len(p0)):
        ax1.plot(t_data, E_data[i])
        ax2.plot(t_data, p_data[i])
    ax3.plot(t_data, cost_hist, color="k")
    plt.show()
    return total_cost

def line_connect_matrix(n):
    """ construct the connection matrix corresponding to a line graph of size n """
    ans = np.zeros((n, n))
    for i in range(n-1):
        ans[i, i+1] = 1.
        ans[i+1, i] = 1.
    return ans

# standard Landauer bit erasure
a = 0.25
def get_energies_basic(t):
    return np.array([0., a*t])
print("\nStandard Landauer bit erasure (no wall to preserve bit identity)")
calc_energy_cost(np.array([0.5, 0.5]), get_energies_basic, line_connect_matrix(2))

# Landauer double well
E_wall = 40
a = 2.0
def get_energies_double_well(t):
    return np.array([0., E_wall, min(a*t, E_wall)])
print("\ndouble well model from Landauer's paper")
calc_energy_cost(np.array([0.5, 0., 0.5]), get_energies_double_well, line_connect_matrix(3))

# particle swap
a = 0.2
E_0 = 40
def get_energies_swap(t):
    E = a*t
    return np.array([max(E_0 - E, 0), max(E - E_0, 0)])
print("\nSwapping a particle from one state to another")
calc_energy_cost(np.array([0., 1.]), get_energies_swap, line_connect_matrix(2))

# two stage erasure
a = 0.1
E_0 = 40
t_e = 850 # time where we stop swapping and start erasuing
E_1 = a*t_e - E_0
def get_energies_swap(t):
    if t <= t_e: # swap
        E = a*t
        return np.array([max(E - E_0, 0), max(E_0 - E, 0), max(E_0 - E, 0), max(E - E_0, 0)])
    else: # standard Landauer bit erasure
        t -= t_e
        return np.array([E_1, 0., t*a, E_1])

print("\nCheap and reliable two stage erasure")
calc_energy_cost(np.array([0.5, 0., 0., 0.5]), get_energies_swap, line_connect_matrix(4), dest_state=1)

Standard Landauer bit erasure with no energy wall

This costs $k T log 2$ in theory if we do it so slowly that the distribution of particle position always matches the Boltzmann distribution. In practice it’s a little bit more energy, but still quite close. Energy cost: ~ $0.79 k T$ Time cost: ~ $139 τ$

Double well from Landauer’s paper

Here we can see a much higher energy cost of ~ $20 k T$ . But at least there’s an energy wall to conserve our bit from being corrupted by noise.

Moving the particle between states.

As a warmup to being able to erase a bit reliably (starting from an initial state with an energy wall), we first calculate the cost to move a particle from one state to another (the destination state starts out with high energy so that its chances of containing the particle are initially almost 0). This should be completely free in the limit of infinite time, but has a (quite cheap) cost of ~ $0.15 k T$ for this particular case. Time cost: ~ $374 τ$

From the graph, you can see that we first generate some energy lowering the destination state, then spend it back on lifting the starting state to force the particle into the destination state. This is not perfectly efficient, so there’s still a small net energy cost.

Cheap reliable erasure for about $k T$ energy cost

There’s four states now, connected in a row with two states in the middle and a state on each end. The two middle states start out being very high energy to form an energy wall just like we had for Landauer’s double well. The outer two states are what we use to represent the bit. So we’re starting out from a bit representation that’s very reliable and could absolutely be used for computation.

The first stage of erasure is to move the particle from the outer two states to the inner two states following the “moving particle between states” procedure described above. Because of the symmetry, there’s no dissipation caused when the particle jumps between the middle two states. After this operation, there’s no longer an energy wall, and the particle is hanging out in the two middle states. The second stage of erasure is that we just perform a standard cheap Landauer erasure (the first case we looked at).

Energy cost: ~ $0.81 k T$ . Time cost: ~ $1200 τ$

Isn’t this really time-expensive?

I mean, not that much. It would be bad if time required went as the inverse of error probability, so we’d expect to need $O (10^{15})$ time. In that case, we really could say that reliable bit erasure takes infinite time in the limit. Instead we’re getting something much more reasonable, seems like maybe $O (log (10^{15}))$ (this is my sense just from a bit of messing around).

Q: The constant factor is pretty bad, though, isn’t it? Since we dealing with hundreds of $τ$ ?

A1: First of all, I made the energies simple piecewise linear functions of time for ease of programming, but this is definitely not the most optimal thing to do. You want to move energies more slowly when those energies are very close to each other, and more quickly when they’re very different. I bet the improvement could be pretty significant if we actually controlled the energies optimally.

A2: Even if we don’t keep them at different temperatures, the CPU/Radiator distinction is still useful for thinking about futuristic reversible computers. The key figures for the CPU performance include both the throughput of operations and also the serial speed of operations, while for the Radiator the things that matters are the throughput and energy cost of bit erasures. Having to put up with a 100x slowdown in operation time would be really terrible for CPU serial operation speed. But this analysis doesn’t pertain to CPU operation speed, just to Radiator bit erasure speed, where this can be compensated simply by putting 100x as many bit erasing modules in the Radiator. This has some materials costs, but making the Radiator less efficient also has materials costs since it has to be that much larger to radiate the heat away, not to mention the energy costs which are per-computation and not amortized.

What links here?