Putting the “verifier” on the same chip as the GPU seems like an approach worth exploring as an alternative to anti-tampering (which seems hard)
I heard[1] that changing the logic running on a chip (such as subverting an off-switch mechanism) without breaking the chip seems potentially hard[2] even for a nation state.
If this is correct (or can be made correct?) then this seems much more promising than having a separate verifier-chip and gpu-chip with anti tampering preventing them from being separated (which seems like the current plan (cc @davidad ) , and seems hard).
Courdesses built a custom laser fault injection system to avoid anti-glitch detection. A brief pulse of laser light to the back of the die, revealed by grinding away some of the package surface, introduced a brief glitch, causing the digital logic in the chip to misbehave and open the door to this attack.
It intuitively[3] seems to me like Nvidia could implement a much more secure version of this than Raspberry Pi (serious enough to seriously bother a nation state) (maybe they already did?), but I’m mainly sharing this as a direction that seems promising, and I’m interested in expert opinions.
It seems like the attacker has much less room for creativity (compared to subverting anti-tampering), and also it’s rare[4] to hear from engineers working on something that they guess it IS secure.
Chips have 15+ metal interconnect layers, so if verification is placed sufficiently all over the place physically, it probably can’t be circumvented. I’m guessing a more challenging problem is replay attacks, where the chip needs some sort of persistent internal clocks or counters that can’t be reset to start in order to repeatedly reuse old (but legitimate) certificates that enabled some computations at some point in the past.
Training frontier models needs a lot of chips, situations where “a chip notices something” (and any self-destruct type things) are unimportant because you can test on fewer chips and do it differently next time. Complicated ways of circumventing verification or resetting clocks are not useful if they are too artisan, they need to be applied to chips in bulk and those chips then need to be able to work for weeks in a datacenter without further interventions (that can’t be made into part of the datacenter).
AI accelerator chips have 80B+ transistors, much more than an instance of certificate verification circuitry would need, so you can place multiple of them (and have them regularly recheck the certificates). There are EUV pitch metal connections several layers deep within a chip, you’d need to modify many of them all over the chip without damaging the layers above, so I expect this to be completely infeasible to do for 10K+ chips on general principle (rather than specific knowledge of how any of this works).
For clocks or counters, I guess AI accelerators normally don’t have any rewritable persistent memory at all, and I don’t know how hard it would be to add some in a way that makes it too complicated to keep resetting automatically.
My guess is that AI accelerators will have some difficult-to-modify persistent memory based on similar chips having it, but I’m not sure if it would be on the same die or not. I wrote more about how a firmware-based implementation of Offline Licensing might use H100 secure memory, clocks, and secure boot here: https://arxiv.org/abs/2404.18308
The cost for renting such a machine (FIB) is 100-350 USD/h (depending on which university lab you choose). Some universities also offer to have one of their staff do the work for you (e.g., 165 USD/h at the University of Washington).
The duration for a single modification is less than 1 hour. Additionally, there is some non-FIB preparation time, which seems to be ~1 day if you do it for one chip; see here: https://arxiv.org/pdf/2501.13276).
I am currently mentoring a SPAR project that calculates more accurate numbers and maps them to specific attack scenarios. We plan to release our results in 2-3 months.
I asked Claude how relevant this is to protecting something like a H100, here are the parts that seem most relevant from my limited understanding:
What the paper actually demonstrates:
1. Reading (not modifying) data from antifuse memory in a Raspberry Pi RP2350 microcontroller 2. Using Focused Ion Beam (FIB) and passive voltage contrast to extract information
Key differences between this and modifying an H100 GPU:
3D Transistor Structures: Modern 5nm chips use FinFET or GAAFET 3D structures rather than planar transistors. The critical parts are buried within the structure, making them fundamentally more difficult to access without destroying them.
Atomic-Scale Limitations: At 5nm, we’re approaching atomic limits (silicon atoms are ~0.2nm). The physics of matter at this scale creates fundamental boundaries that better equipment cannot overcome.
Ion Beam Physics: Even with perfect equipment, ion beams create interaction volumes and damage zones that become proportionally larger compared to the target features at smaller nodes.
1. Reading (not modifying) data from antifuse memory in a Raspberry Pi RP2350 microcontroller
That’s correct.
That said, chip modifications are done on the same FIB machine. The cost estimate still seems accurate to me.
Atomic-Scale Limitations: At 5nm, we’re approaching atomic limits (silicon atoms are ~0.2nm). The physics of matter at this scale creates fundamental boundaries that better equipment cannot overcome.
Gallium-based FIB circuit edits can go down to ~7nm.
Helium-based FIB circuit edits (~3...5x more expensive than Gallium FIB) can go down even further, 1-2nm.
3D Transistor Structures: Modern 5nm chips use FinFET or GAAFET 3D structures rather than planar transistors. The critical parts are buried within the structure, making them fundamentally more difficult to access without destroying them.
I’d attack the silicon from the backside of the waver. Currently, nearly no chip has conductive traces or protection mechanisms on the backside. And you can directly get to the transistors & gates without needing to penetrate the interconnect structure on the front side of the waver.
(Though I want to flag that I heard that there is a push to have power distribution on the backside of the waver. So this might become harder in 2-6 years.)
I also want to flag that the attack we are discussing here (modifying the logic within the H100 die) is the most advanced invasive attack I can currently think of. Another simpler attack is to read out the secret key used for authentications. Or even simpler, you could replace the CEC1736 Root-of-Trust chip on the H100-PCB (which authenticates the H100 onboard flash) with a counterfeit one.
A paper that further elaborates on the attack vectors is coming out in 2-3 months.
Off switch / flexheg / anti tampering:
Putting the “verifier” on the same chip as the GPU seems like an approach worth exploring as an alternative to anti-tampering (which seems hard)
I heard[1] that changing the logic running on a chip (such as subverting an off-switch mechanism) without breaking the chip seems potentially hard[2] even for a nation state.
If this is correct (or can be made correct?) then this seems much more promising than having a separate verifier-chip and gpu-chip with anti tampering preventing them from being separated (which seems like the current plan (cc @davidad ) , and seems hard).
@jamesian shared hacks that seems relevant, such as:
It intuitively[3] seems to me like Nvidia could implement a much more secure version of this than Raspberry Pi (serious enough to seriously bother a nation state) (maybe they already did?), but I’m mainly sharing this as a direction that seems promising, and I’m interested in expert opinions.
From someone who built chips at Apple, and someone else from Intel, and someone else who has wide knowledge of security in general
Hard enough that they’d try attacking in some other way
It seems like the attacker has much less room for creativity (compared to subverting anti-tampering), and also it’s rare[4] to hear from engineers working on something that they guess it IS secure.
See xkcd
Chips have 15+ metal interconnect layers, so if verification is placed sufficiently all over the place physically, it probably can’t be circumvented. I’m guessing a more challenging problem is replay attacks, where the chip needs some sort of persistent internal clocks or counters that can’t be reset to start in order to repeatedly reuse old (but legitimate) certificates that enabled some computations at some point in the past.
Thanks! Could you say more about your confidence in this?
Yes, specifically I don’t want an attacker to reliably be able to reset it to whatever value it had when it sent the last challenge.
If the attacker can only reset this memory to 0 (for example, by unplugging it) - then the chip can notice that’s suspicious.
Another option is a reliable wall clock (though this seems less promising).
I think @jamesian told me about a reliable clock (in the sense of the clock signal used by chips, not a wall clock), I’ll ask
Training frontier models needs a lot of chips, situations where “a chip notices something” (and any self-destruct type things) are unimportant because you can test on fewer chips and do it differently next time. Complicated ways of circumventing verification or resetting clocks are not useful if they are too artisan, they need to be applied to chips in bulk and those chips then need to be able to work for weeks in a datacenter without further interventions (that can’t be made into part of the datacenter).
AI accelerator chips have 80B+ transistors, much more than an instance of certificate verification circuitry would need, so you can place multiple of them (and have them regularly recheck the certificates). There are EUV pitch metal connections several layers deep within a chip, you’d need to modify many of them all over the chip without damaging the layers above, so I expect this to be completely infeasible to do for 10K+ chips on general principle (rather than specific knowledge of how any of this works).
For clocks or counters, I guess AI accelerators normally don’t have any rewritable persistent memory at all, and I don’t know how hard it would be to add some in a way that makes it too complicated to keep resetting automatically.
My guess is that AI accelerators will have some difficult-to-modify persistent memory based on similar chips having it, but I’m not sure if it would be on the same die or not. I wrote more about how a firmware-based implementation of Offline Licensing might use H100 secure memory, clocks, and secure boot here: https://arxiv.org/abs/2404.18308
Changing the logic of chips is possible:
https://www.nanoscopeservices.co.uk/fib-circuit-edit/
h/t @Jonathan_H from TemperSec
Open question: How expensive is this, and specifically can this be done in scale for the chips of an entire data center?
TL;DR: Less than you think, likely < 1000 USD.
The cost for renting such a machine (FIB) is 100-350 USD/h (depending on which university lab you choose). Some universities also offer to have one of their staff do the work for you (e.g., 165 USD/h at the University of Washington).
The duration for a single modification is less than 1 hour.
Additionally, there is some non-FIB preparation time, which seems to be ~1 day if you do it for one chip; see here: https://arxiv.org/pdf/2501.13276).
I am currently mentoring a SPAR project that calculates more accurate numbers and maps them to specific attack scenarios. We plan to release our results in 2-3 months.
Thanks! Is this true for a somewhat-modern chip that has at least some slight attempt at defense, or more like the chip on a raspberry pi?
I asked Claude how relevant this is to protecting something like a H100, here are the parts that seem most relevant from my limited understanding:
What the paper actually demonstrates:
1. Reading (not modifying) data from antifuse memory in a Raspberry Pi RP2350 microcontroller
2. Using Focused Ion Beam (FIB) and passive voltage contrast to extract information
Key differences between this and modifying an H100 GPU:
3D Transistor Structures: Modern 5nm chips use FinFET or GAAFET 3D structures rather than planar transistors. The critical parts are buried within the structure, making them fundamentally more difficult to access without destroying them.
Atomic-Scale Limitations: At 5nm, we’re approaching atomic limits (silicon atoms are ~0.2nm). The physics of matter at this scale creates fundamental boundaries that better equipment cannot overcome.
Ion Beam Physics: Even with perfect equipment, ion beams create interaction volumes and damage zones that become proportionally larger compared to the target features at smaller nodes.
That’s correct.
That said, chip modifications are done on the same FIB machine. The cost estimate still seems accurate to me.
H100s are manufactured on TSMC’s “3nm” node (a brand name), which has a Gate Pitch of 48 nm and a Metal Pitch of 24 nm. The minimum feature size is 9-12nm, according to Claude.
You are not at the physical limitations:
Gallium-based FIB circuit edits can go down to ~7nm.
Helium-based FIB circuit edits (~3...5x more expensive than Gallium FIB) can go down even further, 1-2nm.
I’d attack the silicon from the backside of the waver. Currently, nearly no chip has conductive traces or protection mechanisms on the backside. And you can directly get to the transistors & gates without needing to penetrate the interconnect structure on the front side of the waver.
(Though I want to flag that I heard that there is a push to have power distribution on the backside of the waver. So this might become harder in 2-6 years.)
I also want to flag that the attack we are discussing here (modifying the logic within the H100 die) is the most advanced invasive attack I can currently think of. Another simpler attack is to read out the secret key used for authentications. Or even simpler, you could replace the CEC1736 Root-of-Trust chip on the H100-PCB (which authenticates the H100 onboard flash) with a counterfeit one.
A paper that further elaborates on the attack vectors is coming out in 2-3 months.