My side of an argument with Jacob Cannell about chip interconnect losses

Steven Byrnes21 Jun 2023 13:33 UTC

144 points

Context / How I came to write this

Jacob Cannell (@jacob_cannell) made some claims about fundamental limits of interconnect losses on chips at Brain Efficiency: Much More Than You Wanted To Know, and in particular claimed that current chips have interconnect losses close to as low as they will ever get.

When I read that claim I thought it was wrong, as was Jacob’s discussion of interconnect losses more generally, but I didn’t (and still don’t) think the disagreement directly mattered for AI x-risk, so I mostly didn’t want to spend time arguing about it.

But then later Alexander Gietelink Oldenziel wrote $250 prize for checking Jake Cannell’s Brain Efficiency, and I wound up in a 15-comment back-and-forth with Jacob about it, before ducking out. (Other people continued that thread afterwards).

Unfortunately, I quit the discussion while still being confused about where Jacob was coming from. So this post will not be maximally good and useful, sorry. Nevertheless, here’s a summary of my current perspective and understanding, in case anyone cares.

Also see comments section at the bottom for a response by Jacob.

(Jargon level: medium-low maybe? There is still some unexplained physics & EE jargon, but hopefully I made the centrally important parts accessible to non-experts. DM or email me if something is confusing, and I will try to fix it.)

(All numbers in this post should be treated as Fermi estimates.)

(Thanks very much to Jacob for his extraordinary patience in trying to explain to me his perspective on this topic. And also his perspective on many other topics!)

Background to the technical disagreement

“Interconnects” send information from one point to another on a chip.

The fundamental thermodynamic limit for the power required to send a bit of information from point A to point B is 0.^[1] As a stupid example, there is a lot of digital information on Earth, and it all travels $10^{12}$ meters in orbit around the sun each year for roughly zero energy cost.

Chip interconnect losses are obviously much much higher than the thermodynamic limit of “zero”—they might even constitute a majority of chip power consumption these days. Everyone knows that, and so does Jacob. So what is he saying?

I think Jacob divides the world of interconnects into two categories, “reversible” and “irreversible” interconnects, with the former including optical interconnects and superconducting wires, and the latter including normal wires and brain axons. (I’m stating this categorization without endorsing it.)

I think Jacob takes “reversible” interconnects (optical interconnects & superconducting wires) to have a fundamental interconnect loss limit of zero, but to have practical limits such that we’re not expecting to cut orders of magnitude from the total interconnect loss budget this way. I agree with his conclusion here, although we had some disagreements in how we got there. But anyway, that’s off-topic for this post.

(See my brief discussion of optical interconnects here—basically, nobody seems to have even a roadmap to making optical interconnects with such low power that they could replace almost all (say, >90%) of the aggregate on-chip interconnect length.)

Instead our main dispute was about voltages-on-wires, the workhorse of within-chip communication. Pause for:

On-chip wire interconnects for dummies: As background, here is the oversimplified cartoon version of integrated circuits. There are a bunch of metal wires, and there are transistors that act as switches that connect or disconnect pairs of wires from each other. Depending on which transistors are “on” versus “off” at any given time, any given wire may wind up connected to one of two possible fixed-voltage sources (either “ground” = 0 mV, or so-called “ $V_{DD}$ ” ≈ 500 mV).^[2] Once any part of that wire is contacting one of those sources, the entire wire (being conductive) winds up at that specified voltage level. And meanwhile a different part of that same wire is connected to the “gate” of one or more transistors, and depending on the wire voltage, that transistor might switch “on” or “off” in turn, which can change the voltage of still other wires, and so on forever. Thus, the wire is a mode of information transmission: one side gets set to a certain voltage, and then that same voltage value can be read off at the other side, which might be quite far away.

The basic idea of a CMOS inverter, as a very simple example of how integrated circuits work. The dot labeled “ $V_{out}$ ” indicates a wire, and sometimes that wire is connected to the $V_{DD}$ voltage source, and other times that wire is connected to ground, depending on the voltage of the “ $V_{in}$ ” wire which actuates the two switches. (Image modified from here.)

Jacob believes that there is a certain fundamental limit on interconnect losses when we transmit information via voltages on wires—something in the neighborhood of $10^{- 20} (J / nm) / bit$ , which he claims is not terribly far from what it is in modern chips (I didn’t check that part). I think he’s wrong that this is a fundamental limit.

My model: Why 6 orders of magnitude (OOM) reduction in on-chip interconnect losses is very hard but not impossible

Credit for this way of thinking goes to UC Berkeley professor Eli Yablonovitch (any errors are my own). See for example this 2012 talk with the basic idea, or this 2019 talk with more on implementation concepts & prototypes. By the way, this basic idea is the guiding philosophy of the Center For Energy Efficient Electronics Science, a center Yablonovitch directs, which got $50M of US-NSF funding over 10 years.^[3]

My model centers around the following major issue:

The easiest way to reliably actuate a switch is with a voltage of at least $20 k T / q \approx$ 500 mV, corresponding to a voltage high enough to use normal transistors as the switches,^[4] as opposed to the weirder approaches in the Appendix. (kT is defined at this link, q is the elementary charge, and 20 is a factor that gets us way above the noise floor.)
The most energy-efficient way to reliably send information down a wire is to use a voltage around $20 \sqrt{k T Z_{eff} f} \approx$ 0.7 mV, where 20 is a factor that gets us way above the noise floor, f is the frequency bandwidth of the noise that we care about (let’s say f ≈ 3 GHz for a chip), and Z_eff is the wire’s effective impedance (looking towards the wire from the receiver), let’s say probably 50-100 Ω in practice.^[5]

Slide from this Eli Yablonovitch talk making the same point I’m trying to make here.

So there’s a giant (3 OOM) mismatch in what voltage makes sense.

The “easy way” versus the “hard way” to deal with that giant voltage mismatch

The easy way to deal with that giant mismatch is to ignore it. Just use the same 500 mV voltage for both the switches and the wires. Unfortunately, this method is horrifically bad for interconnect losses: the power required to send a bit down a wire increases quadratically with voltage. So if we use 3 OOM higher voltage than necessary for sending a bit of information down our wires, our interconnect losses are 6 OOM higher than necessary.

Why quadratic? There’s a (undesired / stray) capacitor, one of whose “plates” is the wire, and the other of whose “plates” is whatever conductive stuff is in the vicinity of that wire. When you switch the voltage between ground and $V_{DD}$ , you need to charge or discharge that capacitor by flowing current in or out, which dissipates energy $C V_{DD}^{2}$ .^[6]

The hard way to deal with that giant mismatch is to succeed at one of the challenging approaches discussed in the Appendix. Those approaches all work in principle (I claim), and in some cases there are even proof-of-principle lab demos. But getting them to work at scale, in actual fabs, in a way that’s compatible with every other part of the chip-production process, doesn’t impact the yield or chip cost, and so on—that’s quite a different matter. I’m not particularly holding my breath.

So, people are not currently doing it the “hard way”, instead they are doing it the “easy way”, and they always have been.

Fermi estimate of interconnect losses on the assumption that people continue doing things the “easy way”

Given that people are in fact doing it the “easy way”, we can do a fermi estimate of interconnect losses as follows. We have $V_{DD} \approx 20 k T / q$ and $C \approx 10 ϵ_{0} L$ ^[7] (where L = wire length and $ϵ_{0}$ is the physical constant), and there’s a factor of ½ because we only need to charge or discharge the capacitor half of the time (i.e., only when a bit is different from the preceding bit). So we finally get a fermi-estimate:

\begin{matrix} “Easy way” interconnect energy dissipation = = \frac{1}{2} C V_{DD}^{2} / L \approx 10^{- 20} (J / nm) / bit \approx 3 (k T / nm) / bit \end{matrix}

which is not wildly different from Jacob’s supposed fundamental limit on wire interconnects mentioned at the top.

Why do I bring up this fermi estimate? Well, Jacob seems to have updated hard on the fact that his (supposed) fundamental limit is in the same ballpark as actual chips. If his model is totally wrong, doesn’t that seem a remarkable coincidence?? Well, I think his model is totally wrong, and I think that this is a remarkable coincidence, albeit much less remarkable than you might think thanks to the magic of dimensional analysis.

Anyway, given that background context, it seems highly relevant to note that my model predicts interconnect losses in that same ballpark as Jacob’s model, and in the same ballpark as actual chips, if we’re doing things the “easy way” (as opposed to doing the stuff in the Appendix), which, again, is how we make chips today, and always have.

Prospects for doing things “the hard way” in the future

Will people ever switch to doing it “the hard way”? (by following one of the approaches in the Appendix, or something else in that category.) Beats me, especially if we’re talking about what’s going to happen before AGI apocalypse / utopia. (And as I wrote at the top, I don’t think this post is relevant for AI x-risk.) I feel pretty strongly that at least some of the approaches in the Appendix are based on sound physics and have no fundamental blockers—and therefore that Jacob is wrong about his supposed fundamental limit on wire interconnect losses. But actually getting these approaches into commercial chips is another matter entirely—one where all kinds of things can go wrong (or be cost-prohibitive).

Jacob’s model

Everything above was my model. What about Jacob’s model? Well, I’m not going to even try to describe it in my own words, because I don’t understand it. Sorry! You can read the thread and citations therein.

Some ways that Jacob & I seem to be talking past each other

I will, however, point to some things that seem to be contributing to Jacob & me talking past each other, in my opinion.

Jacob likes to talk about detailed properties of the electrons in a metal wire (specifically, their de Broglie wavelength, mean free path, etc.), and I think those things cannot possibly be relevant here. I claim that once you know the resistance/length, capacitance/length, and inductance/length of a wire, you know everything there is to know about that wire’s electrical properties. All other information is screened off. For example, a metal wire can have a certain resistance-per-length by having a large number of mobile electrons with low mobility, or it could have the same resistance-per-length by having a smaller number of mobile electrons with higher mobility. And nobody cares which one it is—it just doesn’t matter in electronics.^[8]
I want to talk about wire voltage profiles in terms of the “normal” wire / transmission line formulas (cf. telegrapher’s equations, characteristic impedance, etc.), and Jacob hasn’t been doing that AFAICT. I can derive all those wire-related formulas from first principles (ooh and check out my cool transmission line animations from my days as a wikipedia editor!), and I claim that those derivations are perfectly applicable in the context in question (nano-sized wire interconnects on chips), so I am pretty strongly averse to ignoring those formulas in favor of other things that don’t make sense to me.
Relatedly, I want to talk about voltage noise in terms of the “normal” electronics noise literature formulas, like Johnson noise, shot noise, crosstalk noise, etc., and Jacob hasn’t been doing that AFAICT. Again, I’m not taking these formulas on faith, I know their derivations from first principles, and I claim that they are applicable in the present context (nano-sized wire interconnects on chips) just like for any other wire. For example, the Johnson noise formula is actually the 1D version of Planck’s blackbody radiation equation—a deep and basic consequence of thermodynamics. Here I’m thinking here in particular of Jacob’s comment “it accumulates noise on the landauer scale at each nanoscale transmission step, and at the minimal landauer bit energy scale this noise rapidly collapses the bit representation (decays to noise) exponentially quickly”. I will remain highly skeptical of a claim like that unless I learn that it is derivable from the formulas for electrical noise on wires that I can find in the noise chapter of my electronics textbooks.
Jacob wants to describe wires as being made of small (≈1 nm) “tiles”, each of which is a different “bit”, with information flow down wires corresponding to dissipative bit-copying operations, and I reject that picture. For example, take a 100 μm long wire, on which signals propagate at a significant fraction of the speed of light. Now smoothly slew the voltage at one end of the wire from 0 to $V_{DD}$ over the course of 0.1 ns. (In reality, the slew rate is indeed not infinite, but rather limited by transistor capacitance among other things.) Then, as you can check for yourself, the voltage across the entire wire will slew at the same rate at the same time. In other words, a movie of the voltage-vs-position curve on this 100 μm wire would look like a rising horizontal line, not a propagating wave. Now, recall where the Landauer limit comes from: bit-copy operations require kT of energy dissipation, because we go from four configurations (00,01,10,11) to two (00,11). The Second Law of Thermodynamics says we can’t reduce the number of microstates overall, so if the number of possible chip microstates goes down, we need to make up for it by increasing the temperature (and hence number of occupied microstates) elsewhere in the environment, i.e. we need to dissipate energy / dump heat. But in our hypothetical 100 μm long wire above, this analysis doesn’t apply! The different parts of the wire were never at different voltages in the first place, and therefore we never have to collapse more microstates into fewer.

…So anyway, I think our conversation had a bit of an unproductive dynamic where Jacob would explain why what I said cannot possibly be right [based on his “tiles” model], and then in turn I explain why what he said possibly be right [based on the formulas I like e.g. telegrapher’s equations], and then in turn Jacob would explain why that cannot possibly be right [based on his “tiles” model], and around and around we go.

Appendix: Some possible technical approaches to making “switches” on future ICs that are compatible with dramatically lower wire voltages than normal transistors are

Strategy 1: Just use normal transistors, but put nanoscale step-up voltage converters between the wires and the transistor gates that they connect to

Step-up voltage converters—which convert a low-impedance low-voltage source into a high-impedance high-voltage source—do not violate any laws of physics, and have no nonzero fundamental efficiency limit. If we could “just” put billions of tiny efficient 500:1 step-up voltage converters onto our chip, one for each transistor gate, then we could use $V_{DD} = 1 mV$ , and goodbye 99.99%+ of interconnect losses.

But how do we make billions of tiny efficient step-up voltage converters?? I’m aware of a couple possible approaches.

Piezoelectric stacks: As I understand it, you can stack two nanofabricated mechanically-coupled piezoelectric blocks of different sizes, within a slightly-larger (but still nano-sized) stiff mechanical box. When you apply a voltage to one of the piezoelectric blocks, it turns into a strain, and that in turn applies stress to the other piezoelectric block, which induces a voltage. And depending on geometry and materials, you can arrange for the second voltage to be much larger than the first. See paper here (by a group at IBM). I endorse this approach as probably sound in principle. They seem to have abandoned the project, so I guess it probably has challenges in practice; I don’t know the details.
Ferroelectric “negative capacitance”: Under certain circumstances, a configuration of electrodes with ferroelectric materials can amplify voltages. I don’t remember the details, but I think I spent a few hours looking into it a decade ago and decided that the theory was probably sound. Work is ongoing I guess—here’s a random 2022 paper from this subfield.

Strategy 2: Find a different kind of switch that can switch at much lower voltage than transistors can, e.g. 1 mV

To be clear, the Landauer limit concerns the energy required to switch a switch, not the voltage required to switch a switch. So there’s no fundamental reason that a switch can’t work directly with much lower voltage.

Here are a few things in this field. All of these are drawn from this 2019 Eli Yablonovitch talk—see there for better explanations and references.

NEMS: You can “increase q” and thereby decrease $k T / q$ . Wait, what? Recall, the electrostatic barrier is qV, where q is the electric charge of the thing climbing the barrier. In a transistor, the thing climbing the barrier is a single individual electron. But in principle one could develop a mechanical (NEMS) device that would switch a connection on and off the old-fashioned way—by actual physical contact between two conductors. The trick is that a mechanical cantilever could in principle have a charge of 10 e or 100 e or 1000 e or whatever (where e is the electron charge), and therefore a quite low voltage could actuate it without thermal noise being an issue. I’m sure there are issues with reliability, stiction, speed, and so on, but no fundamental barriers to my knowledge.
Tunnel FETs: Normal transistors work by using the voltage to raise or lower an energy barrier blocking current flow. But a different paradigm is that current flow is due to quantum tunneling, and the voltage is modifying the extent to which quantum tunneling is possible (it depends on aligned energy levels etc.). I’m not super confident that this works even in theory, but my guess is that it probably does work in theory. (As always, developing it to the point where it can be rolled out to chip fabs is another matter entirely.)
Spin-orbit torque: You can use an electrical current to switch the magnetization of a magnet via a phenomenon called “spin-orbit torque”. That magnetization can in turn change the resistance of an electrical connection via a different phenomenon called “magnetoresistance”. This setup apparently works, and is fast, but the on-off ratios are inadequate with currently-known magnetoresistance setups. On the other hand, there are numerous ways to get magnetoresistance, and I have no particular reason to believe that Science has discovered all of them already.

(Thanks again to Jacob for patience during that very long discussion, and sorry that I’m giving up without us reconciling. Thanks to Charlie Steiner for critical comments on a draft.)

^
OK, fine, I can imagine being convinced that there are inevitably up to 20 kT per bit (or whatever) that need to be dissipated for low-noise communication in certain circumstances, independent of interconnect length. But we can round that to zero—this post is talking about much larger amounts of energy than that.
^
Sometimes a wire will be in a “hi-Z” state, i.e. connected to neither of the two voltage sources, but that won’t matter for this post.
^
This whole footnote should be ignored by any readers willing and able to evaluate physics arguments on their merits. But other readers may be forced to rely on an Expert Credentials Pissing Contest, so this footnote exists to provide them with some relevant information. Here goes. I mentioned the NSF thing not because it is totally impossible for the NSF to have thrown $50M over 10 years at a set of projects that are fundamentally doomed on account of basic laws of physics, without anyone noticing. I do think that’s possible. But I think it’s pretty unlikely in this context. (And I say this despite generally thinking very little of the NSF’s competence.) At the very least, I think the NSF thing provides strong evidence that Eli Yablonovitch’s way of thinking has survived much much more expert physicist scrutiny than your average obscure peer-reviewed paper or two. Separately, I claim that Eli Yablonovitch in particular has a legible track record of being correct when he derives and/or popularizes conceptual insights based on basic physics principles. For example he was one of the inventors of the now-mainstream idea of “photonic bandgaps”, as well as the well-known “Yablonovitch limit” for light concentration in solar cells, among other things. While we’re at it, I can also list my own relevant credentials—physics PhD with extensive coursework in solid-state physics, semiconductor physics, semiconductor materials science, nanoelectronics, regular electronics, and thermodynamics, plus I’m listed on a transistor patent, albeit a pretty weird one, and one time I read a whole excruciatingly boring textbook on transistor noise… I can carry on if anyone cares to inquire in the comments.
^
A voltage of $> 20 k T / q$ or so is required for normal transistors, and more broadly for any low-noise room-temperature switch whose mechanism involves (1) an energy barrier preventing current flow, and (2) a voltage V that raises or lowers this energy barrier by ±qV, where q is the elementary charge. The reason is as follows. Any switch in this category has the problem that the Boltzmann distribution enables some current flow even when the energy barrier is nonzero. This undesired current gradually cuts off as the energy barrier gets up to a sufficiently large multiple of kT. In the lingo, this problem manifests as the 60 mV/decade “thermionic limit” of “subthreshold slope” for transistors.
^
I think in the case of interest, Z_eff is the same as the transmission line characteristic impedance, which tends to be 50-100 Ω. (This isn’t arbitrarily changeable—it’s pretty hard to get much above or below that for wires; it’s ultimately related to the 377 Ω impedance of free space, a fundamental constant of nature.) It’s not impossible AFAIK that the voltage source impedance is also relevant to Z_eff, but that wouldn’t change the discussion here.
^
You may recall from freshman physics that the energy required to charge a capacitor is $\frac{1}{2} C V^{2}$ rather than $C V^{2}$ like I said here. But squeezing out that extra ½ requires using a current source rather than a voltage source (or equivalently, ramping up the source voltage gradually as the capacitor charges up). That seems pretty unrealistic in the context of chips, I think, so I left out the ½. If I’m wrong, whatever, factors of 2 don’t really matter for this post anyway.
^
[THIS PART WAS EDITED JUNE 22] The formula for the stray capacitance of a wire is $C = a k ϵ_{0} L$ , where k is the dielectric constant of the so-called “interlayer dielectric” (maybe 2-3, or theoretically as low as 1 if you can use air-gaps between wires), and a is a unitless factor related to geometry, something like “how close together are the wires relative to their size”. There’s some discussion in the comments section here. According to that comments-section discussion, the lowest Intel can get in practice right now is $a k \approx 15$ . I used $a k \approx 10$ in the text to account for future technological development, and because it’s a nice round number. I can’t rule out that it will get even lower than 10 in the future, but this is just a fermi estimate anyway.
^
These parameters obviously matter to the person designing the wire, but once the wire is there, it stops mattering from an electronics perspective, I claim. Also note that semiconductors are very different from metals in that respect: Mobility and charge carrier concentration are independently centrally important in semiconductors, whereas only their product is important in metals. There is a good reason for this: namely, the number of charge carriers in a metal is so high that it cannot possibly noticeably change during operation, whereas the number of charge carriers in a semiconductor is sufficiently low that it can substantially jump around in real time during operation, for example via the field effect. OK fine, there are some cases where people care about the charge-carrier mobility in a metal, like in principle one could make a metallic Hall effect sensor, but the situation I’m discussing here is not one of those.

What links here?

Brain Efficiency Cannell Prize Contest Award Ceremony by Alexander Gietelink Oldenziel (24 Jul 2023 11:30 UTC; 145 points)