I am Farhan, based in Canada. By educational background, I am in the chip design domain, with a PhD from Tokyo Institute of Technology. I am particularly interested in AI safety and Climate Action.
Farhan
Thank you. That clarifies a lot of my questions. I will re-read your paper. Let’s talk in detail, in a couple of weeks. :)
Thanks for the detailed response Max. This is certainly a very valid take. I am still brushing up my information theory, and frankly I do not have a very deep understanding of the field.
My initial understanding from a very shallow reading of your paper was that it is essentially calculating “bits passing through the logic unit/second”. However, having given it a little more attention, I realize that it goes deeper. Essentially, it’s calculating the change of bit values as they pass through the compute(Is this framing correct? suppose we have 1 input and the compute just reproduces the same value, the estimation of this compute would be 0? So the amount of work done is dependent on the actual input values as much as it is on the underlying hardware? What if input1 is 0 or 1 and the compute op is a multiply, this will output one of the inputs (either 0 or input2)? do we estimate that this op did half as much work as when both inputs are >1 ). If this is the case, how does it measure a compute’s performance “rating” independently of the workload running on it?
(If I have entirely miss-understood the concept, please link something I can read to understand the underlying concepts)
Also is there an “information per time” component in the performance metric?
This is a very good point. I am studying the paper, and will incorporate it into our upcoming work. Thanks for bringing it up.
Thanks for your input.
This is one of our concerns, i.e. “utilization optimization makes more compute available, without moving the roof-line, and hence go under the radar”. The deployment factor plays a big role in improving utilization, hence it is one of the layers we want to look at.
”I wonder if that 50% could even end up being used by rogue workloads”—I am a bit unclear on how this might work. Do you have any thoughts to share on possible scenarios?
Ah if only I had as good a way with words as you :)
So at this point, we are trying to get as much feedback on the identified gaps, as possible. We strongly believe that a gap does exist and needs filling. However, we still need to solidify our ideas on alternative modeling methodologies and frameworks. Your paper is a good direction for us to look into. I really like the “computing as the transformation of information through a channel” framing. TPP is already being used in export controls so there is clearly precedence for it.
How do you think this framing might capture deployment parameters or SRAM based architectures?
My God. Are you telling me we could have just written a Dr. Seuss style poem instead of this article :). Thoroughly enjoyed the poem. We will study the “Back to bits” paper as we research in this direction. Thank you for the pointer.
Thanks for the comment. Would absolutely love feedback from Daniel. I think that advanced packaging, specifically 3D packaging, certainly has the potential for a high impact on memory bandwidth as well as power dissipation. With that said, I think the efficiency problem in AI hardware is being approached from several angles, where HW/SW co-design and novel deployment configurations also dictate how the compute changes over time.
“I do really like the holistic approach from manufacturing through operational outcomes” Happy to hear that :)
“what additional value does this broader evaluation bring? For whom?” I have quite a few ideas, but I am working on polishing them up a bit more. Expect to hear more from us :).
“I personally would like to see a continually-updated characterization of the ecosystem”, :) Happy to hear that too. How might such a “continually-updated characterization” look like? A series of blogs? videos? A consistent structured framework that gets updated continually? Any thoughts on this?