Even when contrarians win, they lose: Jeff Hawkins
Related: Even When Contrarians Win, They Lose
I had long thought that Jeff Hawkins (and the Redwood Center, and Numentia) were pursuing an idea that didn’t work, and were continuing to fail to give up for a prolonged period of time. I formed this belief because I had not heard of any impressive results or endorsements of their research. However, I recently read an interview with Andrew Ng, a leading machine learning researcher, in which he credits Jeff Hawkins with publicizing the “one learning algorithm” hypothesis—the idea that most of the cognitive work of the brain is done by one algorithm. Ng says that, as a young researcher, this pushed him into areas that could lead to general AI. He still believes that AGI is far though.
I found out about Hawkins’ influence on Ng after reading an old SL4 post by Eliezer and looking for further information about Jeff Hawkins. It seems that the “one learning algorithm” hypothesis was widely known in neuroscience, but not within AI until Hawkins’ work. Based on Eliezer’s citation of Mountcastle and his known familiarity with cognitive science, it seems that he learned of this hypothesis independently of Hawkins. The “one learning algorithm” hypothesis is important in the context of intelligence explosion forecasting, since hard takeoff is vastly more likely if it is true. I have been told that further evidence for this hypothesis has been found recently, but I don’t know the details.
This all fits well with Robin Hanson’s model. Hawkins had good evidence that better machine learning should be possible, but the particular approaches that he took didn’t perform as well as less biologically-inspired ones, so he’s not really recognized today. Deep learning would definitely have happened without him; there were already many people working in the field, and they started to attract attention because of improved performance due to a few tricks and better hardware. At least Ng’s career though can be credited to Hawkins.
I’ve been thinking about Robin’s hypothesis a lot recently, since many researchers in AI are starting to think about the impacts of their work (most still only think about the near-term societal impacts rather than thinking about superintelligence though). They recognize that this shift towards thinking about societal impacts is recent, but they have no idea why it is occurring. They know that many people, such as Elon Musk, have been outspoken about AI safety in the media recently, but few have heard of Superintelligence, or attribute the recent change to FHI or MIRI.
I feel like the lesson here is “if your plan depends on a conjunction of propositions, then you might fail even if you are right about some of those propositions and everyone else is wrong.”
I followed Jeff Hawkins’ work closely for a long time. In my view, several things happened, although due to the somewhat secretive nature of the startup, I may be grossly wrong and invite corrections to my viewpoint.
Firstly, their algorithm idea (HTM) was from the start based on a set of limiting simplifying assumptions that made it hard to generalize their work to problem domains outside of computer vision. Approximately the same time work on HTMs was starting, people in the deep learning community were starting to work seriously on complicated vision problems, making a large part of HTM moot/obsolete by the time a public release was made (by the way, deep neural nets and HTM share a lot in common, which is interesting since they were arrived at from very different directions).
Later, Hawkins and Dileep George (the ‘technical’ lead) had something of a rift where Hawkins emphasized temporal learning and George wanted to focus more on getting vision right. This led to George leaving Numenta and Numenta essentially becoming one of many companies offering ‘data mining’ and ‘big data analytics’ services. George, meanwhile, started his own company (Vicarious), focused on human-like computer vision software. Vicarious has not yet released a product, but have admitted their approach uses probabilistic graphical models, which would put it in line with most of the ‘mainstream’ work on the subject.
tl;dr: Numenta’s work was significant but the machine learning field as a whole is moving so rapidly that yesterday’s breakthroughs are today’s mundane trivialities.
Numenta’s stuff made a lot of sense. They kept things simple by removing the recursion of HTMs...and I think that is probably the key to the whole thing working.
All that being said, their latest product Grok seems to have some success in the network monitoring space. http://numenta.com/grok/
On Intelligence was my first intro to the idea of Bayesian thinking.
‘At least a part’? Also,
???
The quote from Ng is
I think it’s pretty clear that he would have worked on different things if not for Hawkins. He’s done a lot of work in robotics, for example, so he could have continued working on robotics if he didn’t get interested in general AI. Maybe he would have moved into deep learning later in his career, as it started to show big results.
Worth mentioning that some parts of Superintelligence are already a less contrarian version of many arguments made here in the past.
Also note that although some people do believe that FHI is some sense “contrarian”, when you look at the actual hard data on this the fact is FHI has been able to publish in mainstream journals (within philosophy at least) and reach important mainstream researchers (within AI at least) at rates comparable, if not higher, to excellent “non-contrarian” institutes.
Yeah, I didn’t mean to contradict any of this. I wonder how much a role previous arguments from MIRI and FHI played in changing the zeitgeist and contributing to the way Superintelligence was received. There was a slow increase in uninformed fear-of-AI sentiments over the preceding years, which may have put people in more of a position to consider the arguments in Superintelligence. I think that much of this ultimately traces back to MIRI and FHI; for example many anonymous internet commenters refer to them or use phrasing inspired by them, though many others don’t. I’m more sceptical that this change in zeitgeist was helpful though.
Of course specific people who interacted with MIRI/FHI more strongly, such as Jaan Tallinn and Peter Thiel, were helpful in bring the discourse to where it is today.
I also read On Intelligence and it had a large impact on my reading habits. I was not previously aware that Andrew Ng had a similar experience, which leads me to wonder how many people became interested in neuroscience as a result of that one book.
On a side note: the only significance of Andrew Ng’s stated belief that AGI is far is as an indicator that he doesn’t see a route to get there in the near term. On a related note, he gave a kind of wierd comment recently at the end of a conference talk to the effect of “Worrying about the dangers of machine superintelligence today is like worrying about overpopulation on Mars.”
In one sense, the “one learning algorithm” hypothesis should not seem very surprising. In the fields of AI/machine learning, essentially all practical learning algorithms can be viewed as some approximation of general Bayesian inference (yes—this includes stochastic gradient descent). Given a utility function and a powerful inference system, defining a strong intelligent agent is straightforward (general reinforcement learning, AIXI, etc.)
The difficulty of course is in scaling up practical inference algorithms to compete with the brain. One of the older views in neuroscience was that the brain employed a huge number of specialized algorithms that have been fine tuned in deep time by evolution—specialized vision modules, audio modules, motor, language, etc etc. The novelty of the one learning hypothesis is the realization that all of that specialization is not hardwired, but instead is the lifetime accumulated result of a much simpler general learning algorithm.
On Intelligence is a well written pop sci book about a very important new development in neuroscience. However, Hawkin’s particular implementation of the general ideas—his HTM stuff—is neither groundbreaking, theoretically promising, nor very effective. There are dozens of unsupervised generative model frameworks that are more powerful in theory and in practice (as one example, look into any of Bengio’s recent work), and HTM itself has had little impact on machine learning.
I wonder also about Hassibis (founder of DeepMind) - who studied computational neuroscience and then started a deep learning company—did he read On Intelligence? Regardless, you can see the flow of influence in how deep learning papers cite neuroscience.
I downvoted this post because it is basically meta discussions above arguments from authority and tribalism: Andrew Ng and MIRI == good, turns out Jeff Hawkins influenced Ng and shares some conceptual ideas with MIRI, therefore Hawkins == good. That’s faulty reasoning which has the capability to reinforce wrong beliefs.
Tell me, what about Hawkins/Numentia’s work makes it wrong or right on its own merits? Why or why not is it likely to lead to capable general purpose intelligences?
I didn’t see the post in those lights at all. I think it gave a short, interesting and relevant example about the dynamics of intellectual innovation in “intelligence research” (Jeff) and how this could help predict and explain the impact of current research(MIRI/FHI). I do agree the post is about “tribalism” and not about the truth, however, it seems that this was OP explicit intention and a worthwhile topic. It would be naive and unwise to overlook these sorts of societal considerations if your goal is to make AI development safer.
As far as I can tell, you’ve misunderstood what I was trying to do with this post. I’m not claiming that Hawkins’ work is worth pursuing further; passive_fist’s analysis seems pretty plausible to me. I was just trying to give people some information that they may not have on how some ideas developed, to help them build a better model of such things.
(I did not downvote you. If you thought that I was arguing for further work towards Hawkins’ progam, then your comment would be justified, and in any case this is a worthwhile thing for me to explicitly disclaim.)