You can theoretically run a model on fewer GPUs by putting just the first layer into GPU memory, forward passing on it, then deleting it and loading the second layer from RAM, and so forth (see ZeRO-Infinity). But this comes with high latency
Latency shouldn’t be a problem, as you can pipeline. At least as long as you don’t run into Little’s Law problems.
(Depending on the structure of the connection matrix, you may be able to even pipeline at a sub-layer granularity.)
GPU bus bandwidth is likely more of a problem. PCIe gen3x16 is “only” ~16GB/s.
One more formal method of describing much of this might be the Kolmogorov complexity of the state of your consciousness over the timeframe. (So outputting t=0: state=blah; t=1: state=blah, etc).
This has many of the features you are looking for.
This guides me to an interesting question: is looping in an infinite featureless plain of flat white any worse than looping in an infinite featureless plain of random visual noise?
(Of course, this is both noncomputable and has a nontrivial chance that the Turing Machine attaining the Kolmogorov complexity is itself simulating you, but meh. Details.)