Steven Byrnes comments on Thoughts on hardware /​ compute requirements for AGI

• You say “Having a lot of capacity appears to be important” but that’s “essentially assuming the conclusion”, right? hehe :)

You claim that there’s a lot of capacity, but I say we don’t really know that. As a stupid example, if my computer’s SRAM has N cells, but it uses an error-correcting code by redundantly storing each bit in three different cells, then its “capacity” is ⅓ the number of cells. (And in 6T SRAM, the number of cells is in turn ⅙ the number of transistors, etc.)

Anyway, all things considered right now, the most plausible-to-me theory is that counting synapses gives a 2-3OOM overestimate of capacity. I don’t see this as particularly implausible. For one thing, as I wrote in the OP, the synapse is not just an information-storage-unit, it’s also a thing-that-does-calculations. If one bit of stored information (e.g. information about how the world works) needs to be involved in 1000 different calculations, it seems plausible that it would need to be associated with 1000 synapses. For another thing, here’s a model where one functional “connection” requires a group of 10 nearby synapses onto the same dendrite. That’s 1 OOM right there! I think there’s another OOM or two lurking in the fact that each cortical minicolumn is 100 neurons and each cortical column is 100 minicolumns, but there’s some sense in which minicolumns (and to a lesser degree, columns) are a single functional unit. So, without getting into details, which I’m hazy on anyway, I wouldn’t be surprised to learn that “one connection” involved not only 10 nearby synapses on one dendrite, but a similar group on 10 synapses onto a neuron within each of 10 neighboring minicolumns, and those 10 minicolumns are working together to implement a certain kind of computation, which by the way you could trivially do on a GPU in a few clock cycles. Or whatever, I dunno.

Or maybe you’re saying “Having a lot of capacity appears to be important” because humans can do things that modern ML can’t, and we need to explain that somehow, and capacity seems like an obvious candidate? If so, I disagree, I think there are other more-plausible candidates, again see footnote 16 and surrounding discussion.

It could be that the preprocessing necessary to guide our future behavior unavoidably increases the amount of stored data by a large factor.

You mean, cached computations or something? I’m not sure what you have in mind. Everything I can think of has some analogy in things-that-LLMs-can-do, or other types of sub-GB ML systems. LLMs do in fact have “behavior” of a sort, in the sense that they output text, and (implicitly) plan out multiple tokens in advance.

• I haven’t fully digested this comment, but:

You mean, cached computations or something?

In some sense there’s probably no option other than that, since creating a synapse should count as a computational operation. But there’d be different options for what the computations would be.

The simplest might just be storing pairwise relationships. That’s going to add size, even if sparse.

I agree that LLMs do that too, but I’m skeptical about claims that LLMs are near human ability. It’s not that I’m confident that they aren’t—it just seems hard to say. (I do think they now have surface-level language ability similar to humans, but they still struggle at deeper understanding, and I don’t know how much improvement is needed to fix that weakness.)