Are supercomputers the right target to benchmark against? My naive model is that they’ll be heavily optimized for things like FLOPs and bandwidth and not be particularly concerned with power usage or weight. What about systems that are more concerned about power-efficiency or weight?
So, I did some more research, and the general view is that GPUs are more power efficient in terms of Flops/watt than CPUs, and the most power efficient of those right now is the Nvidia 1660 Ti, which comes to 11 TeraFlops at 120 watts, so 0.000092 PetaFlops/Watt, which is about 6x more efficient than Fugaku. It also weighs about 0.87 kg, which works out to 0.0126 PetaFlops/kg, which is about 7x more efficient than Fugaku. These numbers are still within an order of magnitude, and also don’t take into account the overhead costs of things like cooling, case, and CPU/memory required to coordinate the GPUs in the server rack that one would assume you would need.
I used the supercomputers because the numbers were a bit easier to get from the Top500 and Green500 lists, and I also thought that their numbers include the various overhead costs to run the full system, already packaged into neat figures.
Even further research shows the most recent Nvidia RTX 3090 is actually slightly more efficient than the 1660 Ti, at 36 TeraFlops, 350 watts, and 2.2 kg, which works out to 0.0001 PetaFlops/Watt and 0.016 PetaFlops/kg. Once again, they’re within an order of magnitude of the supercomputers.
The new Apple M1-based mac mini appears to be able to do 2.6 teraflops on a power consumption of 39 W. That comes out to 0.000066 W/petaflop, or ~4x the efficiency of Fugaku.