You seem to be talking about a compute-dominated process, with almost perfect data locality. I suspect that brain emulation may be almost entirely communication-dominated with poor locality and (comparatively) very little compute. Most neurons in the brain have a great many synapses, and the graph of connections has relatively small diameter.
So emulating any substantial part of a human brain may well need data from most of the brain every “tick”. Suppose emulating a brain in real time takes 10 units per second of compute, and 1 unit per second of data bandwidth (in convenient units where a compute node has 10 units per second of each). So a single node is bottlenecked on compute and can only run at real time.
To achieve 2x speed you can run on two nodes to get the 20 units per second of compute capability, but your data bandwidth requirement is now 4 units/second: both the nodes need full access to the data, and they need to get it done in half the time. After 3x speed-up, there is no more benefit to adding nodes. They all hit their I/O capacity, and adding more will just slow them all down due to them all needing to access every node’s data every tick.
This is even making the generous assumption that links between nodes have the same capacity and no more latency or coordination issues than a single node accessing its own local data.
I’ve obviously just made up numbers to demonstrate scaling problems in an easy way here. The real numbers will depend upon things we still don’t know about brain architecture, and on future technology. The principle remains the same, though: different resource requirements scale in different ways, which yields a “most efficient” speed for given resource constraints, and it likely won’t be at all cost-effective to vary from that by an order of magnitude in either direction.
Yeah, maybe my intuition was pointing a different way: that the brain is a physical object, physics is local, and the particular physics governing the brain seems to be very local (signals travel at tens of meters per second). And signals from one part of the brain to another have to cross the intervening space. So if we divide the brain into thousands of little cubes, then each one only needs to be connected to its six neighbors, while having plenty of interesting stuff going inside—rewiring and so on.
Edit: maybe another aspect of my intuition is that “tick” isn’t really a thing. Each little cube gets a constant stream of incoming activations, at time resolution much higher than typical firing time of one neuron, and generates a corresponding outgoing stream. Generating the outgoing stream requires simulating everything in the cube (at similar high time resolution), and doesn’t need any other information from the rest of the brain, except the incoming stream.
Thanks, making use of the relatively low propagation speed hadn’t occurred to me.
That would indeed reduce the scaling of data bandwidth significantly. It would still exist, just be not quite as severe. Area versus volume scaling still means that bandwidth dominates compute as speeds increase (with volume emulated per node decreasing), just not quite as rapidly.
I didn’t mean “tick” as a literal physical thing that happens in brains, just a term for whatever time scale governs the emulation updates.
You seem to be talking about a compute-dominated process, with almost perfect data locality. I suspect that brain emulation may be almost entirely communication-dominated with poor locality and (comparatively) very little compute. Most neurons in the brain have a great many synapses, and the graph of connections has relatively small diameter.
So emulating any substantial part of a human brain may well need data from most of the brain every “tick”. Suppose emulating a brain in real time takes 10 units per second of compute, and 1 unit per second of data bandwidth (in convenient units where a compute node has 10 units per second of each). So a single node is bottlenecked on compute and can only run at real time.
To achieve 2x speed you can run on two nodes to get the 20 units per second of compute capability, but your data bandwidth requirement is now 4 units/second: both the nodes need full access to the data, and they need to get it done in half the time. After 3x speed-up, there is no more benefit to adding nodes. They all hit their I/O capacity, and adding more will just slow them all down due to them all needing to access every node’s data every tick.
This is even making the generous assumption that links between nodes have the same capacity and no more latency or coordination issues than a single node accessing its own local data.
I’ve obviously just made up numbers to demonstrate scaling problems in an easy way here. The real numbers will depend upon things we still don’t know about brain architecture, and on future technology. The principle remains the same, though: different resource requirements scale in different ways, which yields a “most efficient” speed for given resource constraints, and it likely won’t be at all cost-effective to vary from that by an order of magnitude in either direction.
Yeah, maybe my intuition was pointing a different way: that the brain is a physical object, physics is local, and the particular physics governing the brain seems to be very local (signals travel at tens of meters per second). And signals from one part of the brain to another have to cross the intervening space. So if we divide the brain into thousands of little cubes, then each one only needs to be connected to its six neighbors, while having plenty of interesting stuff going inside—rewiring and so on.
Edit: maybe another aspect of my intuition is that “tick” isn’t really a thing. Each little cube gets a constant stream of incoming activations, at time resolution much higher than typical firing time of one neuron, and generates a corresponding outgoing stream. Generating the outgoing stream requires simulating everything in the cube (at similar high time resolution), and doesn’t need any other information from the rest of the brain, except the incoming stream.
Thanks, making use of the relatively low propagation speed hadn’t occurred to me.
That would indeed reduce the scaling of data bandwidth significantly. It would still exist, just be not quite as severe. Area versus volume scaling still means that bandwidth dominates compute as speeds increase (with volume emulated per node decreasing), just not quite as rapidly.
I didn’t mean “tick” as a literal physical thing that happens in brains, just a term for whatever time scale governs the emulation updates.