Thanks for writing this! I’d like to put together a few hopefully relevant remarks.
The gap between theory and practice is strongly pronounced in computer science. The theory is governed by its inner logic (what’s more tractable, what leads to more rich, more intellectually satisfying formalisms, and so on). As a result, the fit of theory to practical matters is often so-so.
For example, with run-time complexity, the richest, most developed, most taught to students theory deals with worst-case complexity, whereas in practice average complexity tends to be more relevant (e.g. simplex method, quicksort, and so on).
Computers operating in real world are at the very least “computations with an oracle” (somewhat modeled by type-2 Turing machines), with the world being the oracle. If one believes together with Penrose that the world is not computable, then the “Penrose argument” is falling apart. If one believes together with some other thinkers that the world is computable, then any valid version of the “Penrose argument” would be equally applicable to humans.
In reality, even the “computations with an oracle” paradigm is not quite adequate for the real world, since when computers write into an external world and read from it, they can activate “computational circuits” in the external world, and therefore the notion of strict boundary between the internal computations within the computer and the dynamics of the external world becomes inadequate.
The theoretical computational paradigms are not very hardware-friendly (e.g. neither Turing machines, nor lambda-calculus map to our computers all that well). At the same time, a typical computer does not have well-developed autonomous capability to expand memory in an unlimited fashion and, therefore, is not Turing-complete in a formal sense (e.g. if memory is limited, the halting problem stops being undecidable, and so on).
In particular, these days we are missing a GPU-friendly computational paradigm. It is actually possible to create a neuromorphic computational paradigm, and such paradigms go decades back, but a typical incarnation would not be GPU-friendly, so a sizeable gap between theory and practice would remain.
For example, results of Turing universality of RNNs with memory expanding in an unlimited fashion go back to at least 1987. However, originally those results were using unlimited precision reals and were abusing binary expansions of those reals as tapes of Turing machines. This was incompatible with the key property of practical neural computations, namely their relative robustness to noise. This incompatibility is yet another example of the gap between theory and practice.
More recently, this particular incompatibility has been fixed. One can take an RNN and replace neurons handling scalar flows with neurons handling vector flows, and one can consider countably-sized vectors having finite number of non-zero coordinates at any given moment of time, replacing linear combinations of scalar inputs with linear combinations of vector inputs (“generalized attention”). Then one gets a theoretically nice neuromorphic platform for stream-oriented computations which is more robust to noise. It is still not GPU-friendly in its fully generality.
These remarks above are about computational architectures. Much more can be said about the great diversity of possible methods of generating (synthesizing) computations. When our architectures are neuromorphic rather than discrete, we have greater variety of such methods potentially available (not just gradient methods, but, for example, decomposition of connectivity matrices into series, and so on).
Thanks for writing this! I’d like to put together a few hopefully relevant remarks.
The gap between theory and practice is strongly pronounced in computer science. The theory is governed by its inner logic (what’s more tractable, what leads to more rich, more intellectually satisfying formalisms, and so on). As a result, the fit of theory to practical matters is often so-so.
For example, with run-time complexity, the richest, most developed, most taught to students theory deals with worst-case complexity, whereas in practice average complexity tends to be more relevant (e.g. simplex method, quicksort, and so on).
Computers operating in real world are at the very least “computations with an oracle” (somewhat modeled by type-2 Turing machines), with the world being the oracle. If one believes together with Penrose that the world is not computable, then the “Penrose argument” is falling apart. If one believes together with some other thinkers that the world is computable, then any valid version of the “Penrose argument” would be equally applicable to humans.
In reality, even the “computations with an oracle” paradigm is not quite adequate for the real world, since when computers write into an external world and read from it, they can activate “computational circuits” in the external world, and therefore the notion of strict boundary between the internal computations within the computer and the dynamics of the external world becomes inadequate.
The theoretical computational paradigms are not very hardware-friendly (e.g. neither Turing machines, nor lambda-calculus map to our computers all that well). At the same time, a typical computer does not have well-developed autonomous capability to expand memory in an unlimited fashion and, therefore, is not Turing-complete in a formal sense (e.g. if memory is limited, the halting problem stops being undecidable, and so on).
In particular, these days we are missing a GPU-friendly computational paradigm. It is actually possible to create a neuromorphic computational paradigm, and such paradigms go decades back, but a typical incarnation would not be GPU-friendly, so a sizeable gap between theory and practice would remain.
For example, results of Turing universality of RNNs with memory expanding in an unlimited fashion go back to at least 1987. However, originally those results were using unlimited precision reals and were abusing binary expansions of those reals as tapes of Turing machines. This was incompatible with the key property of practical neural computations, namely their relative robustness to noise. This incompatibility is yet another example of the gap between theory and practice.
More recently, this particular incompatibility has been fixed. One can take an RNN and replace neurons handling scalar flows with neurons handling vector flows, and one can consider countably-sized vectors having finite number of non-zero coordinates at any given moment of time, replacing linear combinations of scalar inputs with linear combinations of vector inputs (“generalized attention”). Then one gets a theoretically nice neuromorphic platform for stream-oriented computations which is more robust to noise. It is still not GPU-friendly in its fully generality.
These remarks above are about computational architectures. Much more can be said about the great diversity of possible methods of generating (synthesizing) computations. When our architectures are neuromorphic rather than discrete, we have greater variety of such methods potentially available (not just gradient methods, but, for example, decomposition of connectivity matrices into series, and so on).