Narrative Theory. Part 6. Artificial Neural Networks

“Alan Turing started off by wanting to 'build the brain' and ended up with a computer”
- Henry Markram, The Blue Brain Project

Recently I’ve come to terms with the idea that I have to publish my research even if it feels unfinished or slightly controversial. The mind is too complex (who would have thought), each time you think you get something, the new bit comes up and crushes your model. Time after time after time. So, waiting for at least remotely good answers is not an option. I have to “fail fast” even though it’s not a widely accepted approach among scientists nowadays.

With that, Reinforcement learning and in-depth analysis of the mentioned models will be covered later. The goal of this part is to explain the reasoning behind the choice of the surface area.

Artificial Neural Networks are the face of modern artificial intelligence and the most successful branch of it too. But success unfortunately doesn’t mean biological plausibility. Even though most ML algorithms have been inspired by the aspects of biological neural networks final models end up pretty far from the source material. This makes their usefulness for the quest of reverse engineering the mind questionable. What I mean here is that almost no insights can be directly brought back to neuroscience to help with the research. I’ll explain why so in a bit. (note, this doesn’t mean that they can not serve as an inspiration. This is very much possible and, I’m sure, a good idea.)

There are three main show-stoppers:

(Reason #1) is the use of an implausible learning algorithm (read backpropagation). There were numerous attempts at finding something analogous to the backpropagation but all of them felt short as far as I know. The core objection to the biological plausibility of backpropagation is that weight updates in multi-layered networks require access to information that is non-local (i.e. error signals generated by units many layers downstream) In contrast, plasticity in biological synapses depends primarily on local information (i.e., pre- and post-synaptic neuronal activity)^[1].

(Reason #2) is the fact that ANNs are being used to solve “synthetic” problems. The vast majority of ANNs originated from industry, designed to solve some practical real-world problem. For us, this means that the training data used for these models would have almost nothing in common with the human ontogenetic curriculum (or part of it) and hence not allow us to use it for this kind of research.

(Reason #3) is the use of implausible building blocks and morphology of the network, resulting in implausible neural dynamics. (e.g. use of point neurons instead of full-blown multi-compartment neurons, the use of all types of neural interaction instead of just STDP). We still don’t know crucial those alternative modes are, but the consensus on this matter is “we need more than we use right now”.

However, there are three notable exceptions:

(The first exception) is convolutional neural networks and their successors. They have been copied from the mammalian visual cortex and are considered sufficiently biologically plausible. The success of convNets is based on the utilization of design principles specific to the visual cortex, specifically shared weights and pooling^[2]. The area of applicability of these principles is an open question.

(The second) is highly biologically plausible networks like Izhikevich’s, The Blue Brain project, and others. Izhkevich’s model is built from multi-compartment high-fidelity neurons displaying all the alternative modes of neural/ganglia interaction^[3]. Among the results, my personal is “Network exhibits sleeplike oscillations, gamma (40 Hz) rhythms, conversion of firing rates to spike timings, and other interesting regimes. Due to the interplay between the delays and STDP, the spiking neurons spontaneously self-organize into groups and generate patterns of stereotypical polychronous activity. To our surprise, the number of coexisting polychronous groups far exceeds the number of neurons in the network, resulting in an unprecedented memory capacity of the system.”

(The third) is Hierarchical Temporal Memory by Jeff Hawkins. It’s a framework inspired by the principles of the neocortex. It claims that the role of neocortex is to integrate the upstream sensory data and then find patterns within the combined stream of neural activity. It views neocortex as an auto-association machine (the view I at least partially endorse). HTM has been developed almost two decades ago but, to my best knowledge, failed to earn much recognition. Still, it’s the best model of this type, so it is worth considering.

^
Demis Hassabis. Neuroscience-Inspired Artificial Intelligence. https://www.sciencedirect.com/science/article/pii/S0896627317305093
^
Y. Lecun, Y. Bengio. Gradient-based learning applied to document recognition. https://ieeexplore.ieee.org/abstract/document/726791
^
E. Izhikevich. Polychronization: Computation with Spikes. https://direct.mit.edu/neco/article-abstract/18/2/245/7033/Polychronization-Computation-with-Spikes