Require some proof of knowledge of the people you pay attention to. At least academic success, but ideally market success. This guy has been peddling his ideas for over 10 years with no results or buy in.
His conception of the ML field is distorted, in that for his points to stand one has to ignore the last 10-20 years of RNN R&D.
Even assuming he made a more sophisticated point, there’s hardly any reason to believe brain designs are the pinecle of efficiency, indeed they likely aren’t, but building digital circuits or quantum computers via evolution might require too much slack.
I don’t like this way of beginning comments. Just make your substantive criticism without the expression of exasperation.
(I’m having a little trouble putting into words exactly why this bothers me. Is it wrong to feel exasperated? No. Is it wrong to express it? No. But this particular way of expressing it feels impolite and uncharitable to me.
I think saying, “I think this is wrong, and I’m frustrated that people make this kind of mistake”, or something like that, would be okay—you’re expressing your frustration, and you’re owning it.
But squeezing that all into just, “Oh God” feels too flippant and dismissive. It feels like saying, “O geez, I can’t believe you’ve written something so dumb. Let me correct you...” which is just not how I want us to talk to each other on this forum.)
Yeah, bad choice of words in hindsight, especially since I was criticizing the subject of the article, not necessarily its contents.
But now there’s 2 comments which are in part reacting to the way my comment opens up, so by editing it I’d be confusing any further reader of this discussion if ever there was one.
So I think it’s the lesser of two evils to leave it as is.
Require some proof of knowledge of the people you pay attention to. At least academic success, but ideally market success.
Sure, I mean, that’s not a bad idea for people who won’t or can’t use their judgment to sort good ideas from bad, but I don’t think that applies to me in this case. I mean, it’s not like Jeff Hawkins is a fringe crackpot or anything, I think his papers get at least as many citations as your average university neuroscience professor, and I think his book was pretty influential and well-regarded, and he interacts regularly with university neuroscientists who seem to take his ideas seriously and aren’t ashamed to be interacting with him, etc. I certainly don’t take his ideas as gospel truth! And I never did. In this post I tried to relay some of his ideas without endorsing them. If I were writing this article today (18 months later), I would have and express a lot more opinions, including negative ones.
His conception of the ML field is distorted
Strong agree. He doesn’t really know any ML, and seems to struggle with other algorithms too. He’s one of the people who say “Intelligence is obviously impossible unless we faithfully copy the brain, duh”, which I don’t agree with.
I do think there’s an argument that more neocortex-like algorithms in general, and the knowledge gained from scientists studying the neocortex in particular, will turn out to be very relevant to eventual AGI development, even moreso than, say, GPT-3. I myself made that argument here. But it’s a complicated and uncertain argument, especially since deep neural nets can interface with pretty much any other data structure and algorithm and people still call it a success for deep neural nets. (“Oh, causal models are important? OK, let’s put causal models into PyTorch...”)
That said, I don’t think his sequence memory thing is quite like an RNN. You should really read Dileep George’s papers about sequence memory rather than Jeff Hawkins’s, if you have an ML background and don’t care about dendritic spikes and whatnot. Dileep George calls the algorithm “cloned hidden Markov model”, and seems to get nice results. He can train it either by MAP (maximum a posteriori) or by SGD. I don’t know if it’s won any benchmarks or anything, but it’s a perfectly respectable and practical algorithm that is different from RNNs, as far as I can tell.
there’s hardly any reason to believe brain designs are the pinecle of efficiency, indeed they likely aren’t
I think one can make an argument (and again I did here) that certain aspects of brain designs are AGI-relevant, but it has to be argued, not assumed from some a priori outside-view, like Jeff Hawkins does. I certainly don’t expect our future AGIs to have firing neurons, but I do think (>50%) that they will involve components that resemble some higher-level aspects of brain algorithms, either by direct bio-inspiration or by converging to the same good ideas.
I don’t particularly disagree with anything you said here, my reaction was more related towards the subject of the article than the article itself.
Well, I deeply disagree with the idea of using reason to judge the worth of an idea, I think as a rule of thumb that’s irrational, but that’s not really relevant.
Anyway, HMMs are something I was unaware on, I just skimmed the george d paper and it looks interesting, the only problem is that it’s compared with what I’d call a “2 generations” old lang model in the form of char RNN.
I’d actually be curious in trying to replicate those experiments and use the occasion to bench against some aiayn models. I’ll do my own digging beforehand, but since you’re familiar with this area it seems, any clue if there’s some follow-up work to this that I should be focusing on instead ?
I strongly suspect that cloned hidden Markov model is going to do worse in any benchmark where there’s a big randomly-ordered set of training / tasting data, which I think is typical for ML benchmarks. I think its strength is online learning and adapting in a time-varying environment (which of course brains need to do), e.g. using this variant. Even if you find such a benchmark, I still wouldn’t be surprised if it lost to DNNs. Actually I would be surprised if you found any benchmark where it won.
I take (some) brain-like algorithms seriously for reasons that are not “these algorithms are proving themselves super useful today”. Vicarious’s robots might change that, but that’s not guaranteed. Instead there’s a different story which is “we know that reverse-engineered high-level brain algorithms, if sufficiently understood, can do everything humans do, including inventing new technology etc. So finding a piece of that puzzle can be important because we expect the assembled puzzle to be important, not because the piece by itself is super useful.”
The point of benchmarking something is not to see if it’s “better” necessarily, but to see how much worst it is.
For example, a properly tuned FCNN will almost always beat a gradient booster at a mid-sized (say < 100,000 features once you bucketize your numbers, since a GB will require that, and OHE your categories and < 100,000 samples) problem.
But gradient boosting has many other advantages around time, stability, ease of tuning, efficient ways of fitting on both CPUs and GPUs, more tradeoffs flexibility between compute and memory usage, metrics for feature importance, potentially faster inference time logic and potentially easier to train online (though both are arguable and kind of besides the point, they aren’t the main advantages).
So really, as long as benchmark tell me a gradient booster is usually just 2-5% worst than a finely tuned FCNN on this imaginary set of “mid-sized” tasks, I’d jump at the option to never use FCNNs here again, even if the benchmarks came up seemingly “against” them.
I guess I should add: an example I’m slightly more familiar with is anomaly detection in time-series data. Numenta developed the “HTM” brain-inspired anomaly detection algorithm (actually Dileep George did all the work back when he worked at Numenta, I’ve heard). Then I think they licensed it into a system for industrial anomaly detection (“the machine sounds different now, something may be wrong”), but it was a modular system, so you could switch out the core algorithm, and it turned out that HTM wasn’t doing better than the other options. This is a vague recollection, I could be wrong in any or all details. Numenta also made an anomaly detection benchmark related to this, but I just googled it and found this criticism. I dunno.
Oh God, a few quick points:
Require some proof of knowledge of the people you pay attention to. At least academic success, but ideally market success. This guy has been peddling his ideas for over 10 years with no results or buy in.
His conception of the ML field is distorted, in that for his points to stand one has to ignore the last 10-20 years of RNN R&D.
Even assuming he made a more sophisticated point, there’s hardly any reason to believe brain designs are the pinecle of efficiency, indeed they likely aren’t, but building digital circuits or quantum computers via evolution might require too much slack.
I don’t like this way of beginning comments. Just make your substantive criticism without the expression of exasperation.
(I’m having a little trouble putting into words exactly why this bothers me. Is it wrong to feel exasperated? No. Is it wrong to express it? No. But this particular way of expressing it feels impolite and uncharitable to me.
I think saying, “I think this is wrong, and I’m frustrated that people make this kind of mistake”, or something like that, would be okay—you’re expressing your frustration, and you’re owning it.
But squeezing that all into just, “Oh God” feels too flippant and dismissive. It feels like saying, “O geez, I can’t believe you’ve written something so dumb. Let me correct you...” which is just not how I want us to talk to each other on this forum.)
Yeah, bad choice of words in hindsight, especially since I was criticizing the subject of the article, not necessarily its contents.
But now there’s 2 comments which are in part reacting to the way my comment opens up, so by editing it I’d be confusing any further reader of this discussion if ever there was one.
So I think it’s the lesser of two evils to leave it as is.
Sure, I mean, that’s not a bad idea for people who won’t or can’t use their judgment to sort good ideas from bad, but I don’t think that applies to me in this case. I mean, it’s not like Jeff Hawkins is a fringe crackpot or anything, I think his papers get at least as many citations as your average university neuroscience professor, and I think his book was pretty influential and well-regarded, and he interacts regularly with university neuroscientists who seem to take his ideas seriously and aren’t ashamed to be interacting with him, etc. I certainly don’t take his ideas as gospel truth! And I never did. In this post I tried to relay some of his ideas without endorsing them. If I were writing this article today (18 months later), I would have and express a lot more opinions, including negative ones.
Strong agree. He doesn’t really know any ML, and seems to struggle with other algorithms too. He’s one of the people who say “Intelligence is obviously impossible unless we faithfully copy the brain, duh”, which I don’t agree with.
I do think there’s an argument that more neocortex-like algorithms in general, and the knowledge gained from scientists studying the neocortex in particular, will turn out to be very relevant to eventual AGI development, even moreso than, say, GPT-3. I myself made that argument here. But it’s a complicated and uncertain argument, especially since deep neural nets can interface with pretty much any other data structure and algorithm and people still call it a success for deep neural nets. (“Oh, causal models are important? OK, let’s put causal models into PyTorch...”)
That said, I don’t think his sequence memory thing is quite like an RNN. You should really read Dileep George’s papers about sequence memory rather than Jeff Hawkins’s, if you have an ML background and don’t care about dendritic spikes and whatnot. Dileep George calls the algorithm “cloned hidden Markov model”, and seems to get nice results. He can train it either by MAP (maximum a posteriori) or by SGD. I don’t know if it’s won any benchmarks or anything, but it’s a perfectly respectable and practical algorithm that is different from RNNs, as far as I can tell.
I think one can make an argument (and again I did here) that certain aspects of brain designs are AGI-relevant, but it has to be argued, not assumed from some a priori outside-view, like Jeff Hawkins does. I certainly don’t expect our future AGIs to have firing neurons, but I do think (>50%) that they will involve components that resemble some higher-level aspects of brain algorithms, either by direct bio-inspiration or by converging to the same good ideas.
I don’t particularly disagree with anything you said here, my reaction was more related towards the subject of the article than the article itself.
Well, I deeply disagree with the idea of using reason to judge the worth of an idea, I think as a rule of thumb that’s irrational, but that’s not really relevant.
Anyway, HMMs are something I was unaware on, I just skimmed the george d paper and it looks interesting, the only problem is that it’s compared with what I’d call a “2 generations” old lang model in the form of char RNN.
I’d actually be curious in trying to replicate those experiments and use the occasion to bench against some aiayn models. I’ll do my own digging beforehand, but since you’re familiar with this area it seems, any clue if there’s some follow-up work to this that I should be focusing on instead ?
I strongly suspect that cloned hidden Markov model is going to do worse in any benchmark where there’s a big randomly-ordered set of training / tasting data, which I think is typical for ML benchmarks. I think its strength is online learning and adapting in a time-varying environment (which of course brains need to do), e.g. using this variant. Even if you find such a benchmark, I still wouldn’t be surprised if it lost to DNNs. Actually I would be surprised if you found any benchmark where it won.
I take (some) brain-like algorithms seriously for reasons that are not “these algorithms are proving themselves super useful today”. Vicarious’s robots might change that, but that’s not guaranteed. Instead there’s a different story which is “we know that reverse-engineered high-level brain algorithms, if sufficiently understood, can do everything humans do, including inventing new technology etc. So finding a piece of that puzzle can be important because we expect the assembled puzzle to be important, not because the piece by itself is super useful.”
The point of benchmarking something is not to see if it’s “better” necessarily, but to see how much worst it is.
For example, a properly tuned FCNN will almost always beat a gradient booster at a mid-sized (say < 100,000 features once you bucketize your numbers, since a GB will require that, and OHE your categories and < 100,000 samples) problem.
But gradient boosting has many other advantages around time, stability, ease of tuning, efficient ways of fitting on both CPUs and GPUs, more tradeoffs flexibility between compute and memory usage, metrics for feature importance, potentially faster inference time logic and potentially easier to train online (though both are arguable and kind of besides the point, they aren’t the main advantages).
So really, as long as benchmark tell me a gradient booster is usually just 2-5% worst than a finely tuned FCNN on this imaginary set of “mid-sized” tasks, I’d jump at the option to never use FCNNs here again, even if the benchmarks came up seemingly “against” them.
Interesting!
I guess I should add: an example I’m slightly more familiar with is anomaly detection in time-series data. Numenta developed the “HTM” brain-inspired anomaly detection algorithm (actually Dileep George did all the work back when he worked at Numenta, I’ve heard). Then I think they licensed it into a system for industrial anomaly detection (“the machine sounds different now, something may be wrong”), but it was a modular system, so you could switch out the core algorithm, and it turned out that HTM wasn’t doing better than the other options. This is a vague recollection, I could be wrong in any or all details. Numenta also made an anomaly detection benchmark related to this, but I just googled it and found this criticism. I dunno.