Ex software developer, ex QA. Currently an independent AI Safety researcher.
Prior to working in industry was involved with academic research of cognitive architectures. I’m a generalist with a focus on human-like AIs (know a couple of things about developmental psychology, cognitive science, ethology, computational models of the mind).
Personal research vectors: ontogenetic curriculum and the narrative theory. The primary theme is consolidating insights from various mind related areas into plausible explanation of human value dynamics.
A long-time lesswronger (~8 years). Mostly been active in the local LW community (as a consumer and as an org).
Recently I’ve organised a sort peer-to-peer accelerator for anyone who wants to become AI Safety researcher. Right now there are 17 of us.
Was a part of AI Safety Camp 2023 (Positive Attractors team).
Open for funding. The past 7 months self-funded my research.
A Thousand Narratives. Theory of Cognitive Morphogenesis
Part 6⁄20. Artificial Neural Networks
Artificial Neural Networks are the face of modern artificial intelligence and the most successful branch of it too. But success unfortunately doesn’t mean biological plausibility. Even though most ML algorithms have been inspired by the aspects of biological neural networks final models end up pretty far from the source material. This makes their usefulness for the quest of reverse engineering the mind questionable. What I mean here is that almost no insights can be directly brought back to neuroscience to help with the research. I’ll explain why so in a bit. (note, this doesn’t mean that they can not serve as an inspiration. This is very much possible and, I’m sure, a good idea.)
There are three main show-stoppers:
(Reason #1) is the use of an implausible learning algorithm (read backpropagation). There were numerous attempts at finding something analogous to the backpropagation but all of them felt short as far as I know. The core objection to the biological plausibility of backpropagation is that weight updates in multi-layered networks require access to information that is non-local (i.e. error signals generated by units many layers downstream) In contrast, plasticity in biological synapses depends primarily on local information (i.e., pre- and post-synaptic neuronal activity)[1].
(Reason #2) is the fact that ANNs are being used to solve “synthetic” problems. The vast majority of ANNs originated from industry, designed to solve some practical real-world problem. For us, this means that the training data used for these models would have almost nothing in common with the human ontogenetic curriculum (or part of it) and hence not allow us to use it for this kind of research.
(Reason #3) is the use of implausible building blocks and morphology of the network, resulting in implausible neural dynamics. (e.g. use of point neurons instead of full-blown multi-compartment neurons, the use of all types of neural interaction instead of just STDP). We still don’t know crucial those alternative modes are, but the consensus on this matter is “we need more than we use right now”.
However, there are three notable exceptions:
(The first exception) is convolutional neural networks and their successors. They have been copied from the mammalian visual cortex and are considered sufficiently biologically plausible. The success of convNets is based on the utilization of design principles specific to the visual cortex, specifically shared weights and pooling[2]. The area of applicability of these principles is an open question.
(The second) is highly biologically plausible networks like Izhikevich’s, The Blue Brain project, and others. Izhkevich’s model is built from multi-compartment high-fidelity neurons displaying all the alternative modes of neural/ganglia interaction[3]. Among the results, my personal is “Network exhibits sleeplike oscillations, gamma (40 Hz) rhythms, conversion of firing rates to spike timings, and other interesting regimes. Due to the interplay between the delays and STDP, the spiking neurons spontaneously self-organize into groups and generate patterns of stereotypical polychronous activity. To our surprise, the number of coexisting polychronous groups far exceeds the number of neurons in the network, resulting in an unprecedented memory capacity of the system.”
(The third) is Hierarchical Temporal Memory by Jeff Hawkins. It’s a framework inspired by the principles of the neocortex. It claims that the role of neocortex is to integrate the upstream sensory data and then find patterns within the combined stream of neural activity. It views neocortex as an auto-association machine (the view I at least partially endorse). HTM has been developed almost two decades ago but, to my best knowledge, failed to earn much recognition. Still, it’s the best model of this type, so it is worth considering.
Demis Hassabis. Neuroscience-Inspired Artificial Intelligence.
https://www.sciencedirect.com/science/article/pii/S0896627317305093
Y. Lecun, Y. Bengio. Gradient-based learning applied to document recognition. https://ieeexplore.ieee.org/abstract/document/726791
E. Izhikevich. Polychronization: Computation with Spikes. https://direct.mit.edu/neco/article-abstract/18/2/245/7033/Polychronization-Computation-with-Spikes