D𝜋′s Spiking Network

Update: The algorithm performs poorly when applied to data even slightly more complicated than MNIST. Update #3: But can get to just under 90% with tuning.


Update #2: There are a few mistakes in my description. See here for details.


D𝜋 posted an interesting article full of original research Self-Organised Neural Networks: A simple, natural and efficient way to intelligence. D𝜋 describes his invention as a “message in a bottle”.

Is it real? I don’t know yet but I have publicly registered my belief that there exist simple algorithms which are vastly superior to the multilayer perception. D𝜋′s algorithm lit up my heuristics as something worth looking into. D𝜋′s open source code compiles, run and claims[1] to perform well on the standard permutation invariant dataset PI-MNIST. D𝜋′s work is precise enough to be falsifiable.

[A]t least a wild goose chase gives you some exercise.

In the Beginning was the Command Line by Neal Stephenson

This article is my attempt to communicate D𝜋′s theory in my own words. I have applied my own names to many of D𝜋′s ideas. These names are not yet fixed. I may change them.

Spiking Networks

D𝜋 calls his invention a Self-Organized Neural Network (SONN). I prefer to call it a Spiking Neural Network (SNN) or spiking network for short. In this article, I will use the term “spiking network” to refer exclusively to D𝜋′s invention. (I ignore the fact that other people have been working on SNNs for decades.)

Most artificial neural networks (ANNs) in use today are based on the multilayer perceptron. The multilayer perceptron is designed to handle continuous data. You can feed discrete data into a multilayer perceptron by embedding it into a vector space. A spiking network works the opposite way. It accepts only binary values. If you want to feed continuous data into a spiking network you need to discretize it into binary values first. D𝜋 buckets MNIST’s greyscale training data into four shades of grey per pixel. Our system thus has four receptors per pixel. Exactly one receptor activates per pixel per image.

MNIST’s dataset contains images of handwritten digits. There are ten digits 0123456789. D𝜋 creates ten groups of neurons. He calls them “groups”. I will call them columns instead. D𝜋 creates one column of neurons for each digit.

D𝜋′s spiking network as presented utilizes no hidden layers. Neurons do not read each other. D𝜋′s ten columns of neurons are connected directly to the receptors. It may be possible to improve D𝜋′s spiking network by adding hidden layers. We will not do so in this article. D𝜋′s spiking network performs adequately on MNIST without hidden layers.

Each neuron is connected to “a few” receptors of input data. Each connection between a neuron and a receptor has a weight. When a receptor is on, the weight associated with its connections are added to the potential of its respective neuron. Since receptors are binary, no multiplication is necessary.

The potential of a neuron is a non-negative number representing the neuron’s internal state. Neuron potentials start at zero. When a neuron’s potential passes a threshold value, the neuron spikes and the neuron’s potential is reset to zero. Our system counts spikes.

To classify a digit, we first preprocess the pixels into receptor activations. We use the receptor activations to determine which weights to add to the neuron potentials. Some of the neurons spike[2]. We count the spikes in each column. Whichever column has the most spikes is our prediction.

Learning

The spiking network learns in two ways.

  • The spiking network can adjust the weights of the active connections. We reinforce a column by adding a fixed value to all of its active connections. We negatively reinforce a column by subtracting a fixed value from all of tis active connections. We may, optionally, add a maximum absolute value of each connection beyond which it cannot increase.

  • All connections slowly decay toward zero over time. Positive weights go down. Negative weights go up.

  • Connections grow and are pruned.

The question to ask is not ‘how’ to learn, but ‘when’.

The column representing the true digit is called the [true] column. All other columns are called [false] columns. Columnar dominance is a relative measure. We do not care about the absolute spike count of a column. We care about whether the column has more spikes than the highest column.

The column and the columns are reinforced according to separate rules.

  • Training the column. Whenever we present a sample, we calculate the difference between the spike count of the column and the maximum spike count of the columns. We call the difference . We remember every . For the column, if is in the top 5% of all those seen so far then we perform a positive reinforcement.

  • Training each column . Whenever we present a sample, we calculate the the difference between the spike count of our column and the average spike count of all the columns. We call the difference . We remember every . For each column, if is in the top 0.66% of those seen before then we perform negative reinforcement.

Many variations are possible. The only way to figure out what works best is via experiment.

Quantilizers

D𝜋 doesn’t actually remember every and . That would be expensive. Instead, D𝜋 uses a simple algorithm called a quantilizer.

Stand in the street with a vertical measuring stick. Place the mark at the bottom. When somebody passes by, if they are taller than the mark, up the mark one notch. If they are shorter, down the mark one notch. As people pass by, it will go up and down.

Think about it: where will it converge to?

That is correct (most people figure it out): It will be the average height of the passers by. Actually, it is the median height. But for heights, a gaussian distribution, it is the same.

If (instead of moving up one notch per tall person) you move up nineteen notches per tall person (but you still move down one notch per short person) then your quantilizer will converge to the top 5% instead of the top 50%.

Neuron Growth & Pruning

The system starts with no connections. Each neuron starts with open slots for connections. When a sample is viewed, a small number of connections with random weights are grown from random receptors to random neurons with open slots. When the absolute value of a weight drops below a minimum threshold, the connection is pruned and its slot opens up.


  1. ↩︎

    I have not examined D𝜋′s code closely enough to confirm it contains nothing fraudulent. I am taking D𝜋′s article at face value. It seems like too much work for a practical joke.

  2. ↩︎

    Should a neuron be allowed multiple times for a single image? I don’t know.