D𝜋

Karma: 178

D𝜋 5 Jan 2022 9:19 UTC
23 points
in reply to: lsusr’s comment on: D𝜋′s Spiking Network
Update: 3 runs (2 random) , 10 million steps. All three over 88.33 (average 9.5-10.5 million on the 3: 88.43). New SOTA ? Please check and update.
Update 2: 89.85 at step 50 Million with QuantUpP = 3.2 and quantUpN = 39. It does perform very well. I will leave it at that. As said in my post, those are the two important parameters (no, it is not a universal super-intelligence in 600 lines of code). Be rational, and think about what the fact that this mechanism works so well means (I am talking to everybody, there).
I looked at it, the informed way.
It gets over 88% with very limited effort.
As I pointed, the two dataset are similar in technical description, but they are ‘reversed’ in the data.
MNIST is black dots on white background. F-MNIST is white dots on black background. The histograms are very different.
I tried to make it work despite that, just with parameter changes, and it does.
Here are the changes to the code:
on line 555: quantUpP = 1.9 ;
on line 556: quantUpN = 24.7 ;
with rand(1000), as it is in the code, you already clear 86% at step 300,000 and 87% at step 600,000 and 88% at 3 Million.
I had made another, small and irrelevant, change, in my full tests, so I am running the full tests again without it (the value/steps above are from that new series). It seems to be better again without it… maybe a new SOTA (update: touched 88.33% at step 4,800,000 ! … and 88.5 at 6.8 Millions !. MLPs perform poorly when applied to data even slightly more complicated than MNIST)
I do not understand what is all the hype around MNIST. Once again, this is PI-MNIST and that makes it very different (to put it simply: no geometry, so no convolution).
I would like anybody to give me a reference to some ‘other method that worked on MNIST but did not make it further’, that uses PI-MNIST and gets more than 98.4% on it.
And if anybody tries it on yet another dataset, could they please notify me so I look at it, before they make potentially damaging statements.
What links here?
- D𝜋′s Spiking Network by lsusr (4 Jan 2022 4:08 UTC; 50 points)

D𝜋 2 Jan 2022 22:29 UTC
14 points
in reply to: Randomized, Controlled’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
I think I have expressed my views on the matter of responsibility quiet clearly in the conclusion.
I just checked Yudkowsky on Google. He founded this website, so good.
Here is not the place to argue my views on super-intelligence, but I clearly side with Russell and Norvig. Life is just too complex; luckily.
As for safety, the title of Jessica Taylor’s article is:
“Quantilizers: A Safer Alternative to Maximizers for Limited Optimization”.
I will just be glad to have proved that alternative to be effective.

D𝜋 4 Jan 2022 15:26 UTC
13 points
on: D𝜋′s Spiking Network
I just discovered about the ‘ping back’ on LessWrong...
I gave a first read of your description. Most of it is correct. I will check in more details.
I used the terms ‘total’ and ‘groups’ to make things simpler, but yours are better.
four corrections:
1.
The potential of a neuron can be negative. It is the pure sum of all weights, positive and negative. There is no ‘negative spiking’ (It is one of the huge number of things I tried that did not bring any benefit). It think I remember trying to set a bottom limit at 0 (no negative potential) and that, as always, it did not make any real difference...
2.
‘Our system thus has four receptors per pixel. Exactly one receptor activates per pixel per image’ is incorrect.
The MNIST pixels are grey shades 0-255.
It is reduced down to 4: 0, 1-63, 64-127, 128-191, >=192. (only keep the 2 top bits). That is enough for MNIST. Many papers have noted that the depth can be reduced and it is true.
Images are presented over 4 ‘cycles’, filtered by those 4 limits. In the first cycle, only pixels with a value over 191 are presented, in the second one, those over 127, the third 63, the last, 0 (not nul).
In the code the array ‘wd’ contains the 4 limits, and at each cycle, the pixel values are tested to be superior or equal to those limits.
Connexions are established with matrix pixels. Over the 4 cycles, they are presented with 4 successive 0 or 1.
If a connexion is on a pixel that shows value 112, on the first cycle it will not be active (<192), on the second it will not be active (<128), but on the third and forth, it will be (>63 and >0).
That is what allows the ‘model averaging across cycles’.
From there, you can understand a first, fatal, reason why F-MNIST cannot be processed as-is: the grey shades are reversed. In MNIST, the background is white, in F-MNIST, it is black. So the cycle limits would have to be reversed.
I will have a look at it.
3.
The $Δ ⊥_{i}$ computation includes the $⊤$ column.
Note that the ‘highest’ ⊥ selection can be easily implemented using population coding with inhibition of the ⊤ column.
I do not know if the options I used are only valid for this dataset or if there have larger validity across dataset as I only have used that one. Maybe they are and you won’t have to figure out each time.
4.
When a new connection is established, the initial weight is always the same. It is given as a fraction of the threshold in the variable ‘divNew’, that is the divider. You can do random, it does not make a difference. You can change it to another value. But it has to be small enough (the divider) that the number of connections of a neuron multiplied by the number of cycles be superior to the threshold, or the system will never ‘boot’ as no neuron would ever spike. So I use ¹⁄₁₀ of the threshold with 10 connections and 4 cycle, and it is fine.

D𝜋 3 Jan 2022 10:34 UTC
12 points
in reply to: Lech Mazur’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
I have, deliberately, taken away everything relating to geometry from this presentation.
It took 12 years (1986-1998) and (how much research effort ?) ,to go from BP/SGD to convolutions.
This is a one man effort, on my own personal time (20,000 hours over the past 6 years), that I am giving away for the community to freely take over. I am really sorry if it is not enough. Their choice.
It is not an add-on to something that exist but a complete restart. One thing at a time.
As for CUDA, if you have a lot of threads, it is bearable, and you can use old, cheap, GPUs with very little loss (they have been optimised recently for FP multiply/add, at the expense of ADD of INT).
FYI, I got >99.3% with only adding a fix layer of simple preset filters (no augmentation) and the idea behind can be readily extended. And you can also train, unsupervised, convolutions.

D𝜋 4 Jan 2022 9:34 UTC
10 points
in reply to: jimrandomh’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
See my answer to mlem_mlem_mlem for the second part of your comment.
You are bringing another interesting point: scaling up and tuning.
As I indicated in the roadmap, nature has chosen the way of width to that of depth.
The cortical sheet is described as a 6 layers structure, but only 3 are neurons and 2 pyramidal neurons. That is not deep. Then we see columns, functional ‘zones’, ‘regions’… There is an organisation, but it is not very deep. The number of columns in each ‘zone’ is very large. Also note that the neuron is deemed ‘stochastic’, so precision is not possible. Lastly, note (sad but true) that those who got the prize worked on technical simplification for practical use.
There is two options at this stage:
We consider, as the symbolic school has since 1969, that the underlying substrate is unimportant and, if we can find mathematical ways to describe it, we will be able to reproduce it, or...
We consider that nature has done the work (we are here to attest to that), properly, and we should look at how it did it.
1986 was an acceptable compromise, for a time. 2026, will mark one century of the 5th Solvay conference.

D𝜋 5 Jan 2022 10:50 UTC
9 points
in reply to: Maxime Riché’s comment on: D𝜋′s Spiking Network
no convolution.
You are comparing pears and apples.
I have shared the base because it has real scientific (and philosophical) value.
Geometry and other are separate, and of lesser scientific value. they are more technology.

D𝜋 3 Jan 2022 15:08 UTC
9 points
on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
Update:
I changed the adjustment values for the 98.65% version to 500/-500 (following lsusr comments).
1000/-1000 is good for the larger network (98.9%), but too much for the smaller ones. It makes the accuracy reduce after it has reached the peak value.
I was too fast publishing and did not go through all the required verifications. My fault.
I am running a series of tests to confirm. The first two are in spec and stable at 10 million steps.
Larger values speed up the convergence, and I was trying to make it as fast as possible, to not waste the time of those who would spend it verifying. Sorry about that.

D𝜋 2 Jan 2022 13:15 UTC
9 points
in reply to: tailcalled’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
It is PI-MNIST.
Permutation Invariant. To keep it simple, you cannot use convolutions. It is all explained in the text.
Real SOTA on that version is 99.04% (Maxout), but that is with 65 Millions+ parameters. I do not have the hardware (or time).
I stopped at 98.9% with 750,000 connections (integers and additions) and this is close to what BP/SGD (table 2) gets with 3 hidden layer of 1024 units each, for a total of >3,000,000 parameters (floating-points and multiplication) with max-norm and Relu.
For a similar accuracy, the number of ‘parameters’ is almost an order of magnitude lower with this system and efficiency even more.
Remember, it is not supposed to work at all, and it is not optimised.

D𝜋 2 Jan 2022 16:46 UTC
8 points
in reply to: Richard_Kennaway’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
We never stop learning.
To kill the program, shoot ‘Ctrl+C’.
Seriously, this system works ‘online’. I gave the exemple of the kids and the Dutch to illustrate that in nature, things change around us and we adjust what we know to the new conditions. A learning process should not have a stopping criteria.
The system converges, on PI-MNIST, at 3-5 million steps. To compare, recent research papers stop at 1 million, but keep in mind that we only update about 2 out of 10 groups each time, so it is equivalent.
So you can use “for( ; b<5000000 ; b++ )” instead of “while( 1 == 1 )” in the batch() function.
After convergence, it stays within a 0.1% margin forever after. You can design a stop test around that if you want, or around the fact that weights stabilise, or anything of that kind.
If you were to use a specific selection of the dataset, wait until it stabilises and, then, use the whole set, the system would ‘start learning again’ and adjust to that change. Forever.
It is a feature, not a bug.

D𝜋 16 Jan 2022 11:19 UTC
6 points
in reply to: NicholasKross’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
I am going to answer this comment because it is the first to address the analysis section. Thank you.
I close the paragraph saying that there is no functions anywhere and it will aggrieve some. The shift I am trying to suggest is for those who want to analyse the system using mathematics, and could be dismayed by the absence of functions to work with.
Distributions can be a place to start. The quantilisers are a place to restart mathematical analysis. I gave some links to an existing field of mathematical research that is working along those lines.
Check this out: they are looking for a multi-dimensional extension to the concept. Here it is, I suggest.

D𝜋 3 Jan 2022 17:57 UTC
6 points
in reply to: lsusr’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
I wrote a comment on that but this is a better to place for it.
I changed the update value from 1000 to 500 for that network size (in the code).
1000 is for the large network (98.9%). At size 792 (for 98.6%) it is too much, and the accuracy goes down after reaching the top. I did not take the time to check properly before publishing. My fault.
If you check it out now, it will get to >98.6% and stay there (tested up to 10 millions, three times, random).

D𝜋 2 Jan 2022 8:07 UTC
6 points
in reply to: lsusr’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
The link I typed in and appears when hovering over is, indeed, ‘http://yann.lecun.com/exdb/mnist/’, and it works on my machine… Thanks for the additional link.

D𝜋 4 Jan 2022 8:54 UTC
5 points
in reply to: mlem_mlem_mlem’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
You are comparing step and ladder (I had to seize on it !).
If you look at Table 2 in your last reference, you will see that they, carefully, show results improving has steps are added. Ladder is just another step (an optimisation one). There is a reason why researchers use PI-MNIST: it is to reduce the size of the ladder to make comparisons clearer.
What I am trying to bring here is a new first step.
I could have tried a 784-25-10 BP/SGD network (784*25 = 19600 parameters) to compare with this system with 196 neurons and 10 connections. I have managed to get 98% with that. How much for the same with BP/SGD ?
The current paradigm has been building up since 1986, and was itself based on the perceptron from 1958.
Here, I take the simplest form of the perceptron (single layer), only adjoin a, very basic, quantiliser to drive it, and already get near SOTA. I also point out that this quantiliser is just another form of neuron.
I am trying to show it might be an interesting step to take.
What links here?
- Bucky's comment on D𝜋′s Spiking Network by lsusr (5 Jan 2022 14:11 UTC; 3 points)

D𝜋 3 Jan 2022 14:53 UTC
5 points
in reply to: procgen’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
They belong to the same, forgotten, family.
T.Kuhn said it all.

D𝜋 3 Jan 2022 11:30 UTC
5 points
in reply to: Lech Mazur’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
The only function that is implemented in CUDA is the test one (test_gpu).
It is also implement for CPU (test_mt_one), identically.
What matters is all clearly (I hope) explained in the text. It is simple enough that its reach is not limited to ML researchers and clearly within that of a lot of coders. The IT revolution started when amateurs got PCs.
In this version of the code, I had to make a tradeoff between completeness, usability and practicality. Write your own code, it does not matter. It is the concept that does.
The (upcoming) website will give separate, readable, versions. I am waiting to get a proper idea of what is demanded before I do that, so thank you for that input.

D𝜋 5 Jan 2022 12:20 UTC
4 points
in reply to: Bucky’s comment on: D𝜋′s Spiking Network
Spot on.
I hope your explanation will be better understood than mine. Thank you.
It ‘so happens’ that MNIST (but not PI) can also be used for basic geometry. That is why I selected it for my exploration (easy switch between the two modes).

D𝜋 3 Jan 2022 14:14 UTC
4 points
in reply to: sullyj3’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
No, there isn’t, but it is interesting.
I gave it a quick look. It seems to be closer to this (this is closer to the point)
I was heavily influenced, back in the 70s, by the works of Mandelbrot and the chaos theory that developed at the time, and has gone nowhere.
The concept of self-organisation has been around for a long time but it is hard to study from the mathematical point of view, and, probably for that reason, it has never ‘picked up’.
So, of course, there are similarities, and, please, go back to all of those old papers and re-think it all.
You will benefit from an hands-on approach rather then a theoretical one. First you experiment, then you find, then you analyse and finally, you theorise. This is not quantum physics and we have the tools (computers) to easily conduct experiments.
This is just another exemple, one that could prove very useful. That’s it.

D𝜋 3 Jan 2022 9:59 UTC
4 points
in reply to: Daniel Kokotajlo’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
That is a question of low philosophical value, but of the highest practical importance.
At line 3,000,000 with the 98.9% setup, in the full log, there is these two informations:
‘sp: 1640 734 548’, and ‘nbupd: 7940.54 6894.20’ (and the test accuracy is exactly 98.9%)
It means that the average spike count for the IS-group is 1640 per sample, 734 for the highest ISNOT-group and 548 for the other ones. The number of weight updates per IS-learn is 7940.54 and 6894.20 per ISNOT-learn. With the coding scheme used, the average number of inputs per sample over the four cycles is 2162 (total number of activated pixels in the input matrix) for 784 pixels. There is 7920 neurons per group with 10 connections each (so ¹⁰⁄₇₈₄ th of the pixel matrix), for a total of 79200 neurons.
From those numbers:
The average number of integer additions done for all neurons in a group when a sample is presented is: 79200 * 2162 * ¹⁰⁄₇₈₄ = 2,184,061 integer additions in total.
And for the spike counts: 1640 + 734 + 8*548 = 6758 INCrements (counting the spikes).
When learning, for each IS-learn, there is 7940, and for each ISNOT-learn, 6894, weight updates . Those are additions of integers. So an average of (7940 + 9*6894) / 10 = 7000 additions of integer.
That is to be compared with a 3 fully connected, say 800 units (to make up for the difference between 98.90% and 98.94%) layers.
That would be at least 800*800*2 + 800*784 = 1,907,200 floating point multiplications, plus what would be used for Max-norm, ReLu,… that I am not qualified to evaluate, but might roughly double that ?
And the same for each update (low estimate).
Even with recent works on sparse updates that do reduce that by 93%, it is still more than 133,000 floating-point multiplications (against 7000 additions of integers).
I have managed to get over 98.5% with 2000 neurons (20,000 connections). I would like to know if BP/SGD can perform to that level with such a small number of parameters (that would be one fully connected layer of 25 units) ? And, as I said in the roadmap, that is what will matter for full real systems.
That is the basic building bloc. the 1x1x1 Lego brick. 1.5/1.1 = 36% improvement with 40 times the ressources is useless in practice.
And that is missing the real point laid out in the Roadmap: this system CAN and MUST be implemented in analog (hybrid until we get practical memristors), whereas BP/SGD CAN NOT.
There is, at least, another order of magnitude in efficiency to be gained there.
There is a lot of effort invested, at this time, in the industry to implement AI at IC-level. Now is the time.

D𝜋 2 Jan 2022 20:12 UTC
4 points
in reply to: Randomized, Controlled’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
I am not sure I understand your question (sorry, I do not know what is Yudkowsky’DMs)
I basically disclosed, to all, that the way we all think we think, does work.
What kind of responsibility could that bear ?

D𝜋 18 Jan 2022 20:23 UTC
3 points
in reply to: Kenoubi’s comment on: D𝜋′s Spiking Network
Welcome onboard this IT ship to baldly go where no one as gone before !
Indeed, I just wrote ‘when it spikes’ and further as the ‘low threshold’ and no more. I work in complete isolation and some things are so obvious inside my brain that I do not consider them as non obvious to others.
It is part of the ‘when’ aspect of learning, but uses an internal state of the neuron instead of an external information from the quantilisers.
If there is little reaction to a sample in a neuron (spiking does happen slowly, or not), it is meaningless and you should ignore it. If it comes too fast, it is already ‘in’ the system and there is no point in adding to it. You are right to say the first rule is more important than the second.
Originally, there was only one threshold instead of 3. When learning, the update would only take place if the threshold was reached after a minimum of two cycles (or 3, but then it converges unbearably slowly), and only for the connections that had been active at least twice. I ‘compacted’ it for use within one cycle (to make it look simpler), so it was 50% of the threshold minimum, and then adjusted (might as well) that value by scanning around and, then, added the upper threshold, but more to limit the number of updates than to improve the accuracy (although it contributes a small bit). The best result is with 30% and 120%, whatever the size or the other parameters.
Before I write this, I quickly checked on PI-F-MNIST. It is still ongoing, but it seems to hold true even on that dataset (BTW: use quantUpP = 3.4 and quantUpN = 40.8 to get to 90.2% with 792 neurons and 90.5% with 7920).
As it seems you are interested, feel free to contact me through private message. There is plenty more in my bag than can fit in a post or comment. I can provide you some more complete code (this one is triple distilled).
Thank you very much for your interest.