How to Grow a Mind (video)

From the recent NIPS conference, here’s a talk by MIT cognitive scientist Josh Tenenbaum on what he calls “rich” machine learning. Should be of some relevance for people who are interested in AI or human cognitive development. I found it really interesting.

http://​​videolectures.net/​​nips2010_tenenbaum_hgm/​​

The gist is: children are able, from a young age, to learn the meanings of words from just a few examples. Adults, given pictures of abstract, made-up objects, and given that some of them are called “tufas,” can pick out which other pictures are tufas and which aren’t. We can do this much faster than a typical Bayesian estimator can, with less training data. This is partly because we have background knowledge about the world and what sort of categories and structures it forms.

For instance, we learn fairly young (~2 years) that non-solid objects are defined by substance rather than shape: that “toothpaste” is that sticky substance we brush our teeth with, whether it’s in the toothpaste tube or on the toothbrush or smeared on the sink, all very different shapes in terms of the raw images hitting our visual cortex. Pour a new liquid called “floo” in a glass, and we’ll predict that it’s still called “floo” when you spill it or while it’s splashing through the air. On the other hand, some objects are about shape more than color or texture: a chair is a chair no matter what it’s made of. Some sets of objects fall into tree-like organization (taxonomy of living things) and some fall into “flat” clusters without hierarchy (the ROYGBIV colors). It takes children three years or so to understand that the same object can be both a dog and a mammal. We learn over time which structural organization is best for which sorts of concepts in the world.

Research in machine learning/​computer vision/​statistics/​related fields often focuses on optimizing the description of data given a certain form, and not assuming that the machine has any “life experience” or “judgment” about what format is best. Clustering algorithms give the best way to sort data into clusters, if we think it falls into clusters; dimensionality reduction techniques give the best way to sort data onto low-dimensional subspaces, if we think it lies in a subspace; manifold learning techniques give the best way to sort data onto low-dimensional manifolds, if we think it lies on a manifold. There’s less attention paid to how we identify the best structural model (and whether the process of identifying the best model can somehow be automated, moved from human judgment to machine judgment.) See Jordan Ellenberg’s piece in Slate for more on this.

We usually don’t use completely different techniques in computer vision for identifying pictures of cows vs. pictures of trucks. But there’s evidence that the human brain does do exactly that—specialized techniques based on the experiential knowledge that cows, trucks, and faces are different sorts of objects, and are identified by different sets of features.

I’m sympathetic to Tenenbaum’s main point: that we won’t achieve computer counterparts to human learning and sensory recognition until we incorporate experiential knowledge and “learning to learn.” There is no single all-purpose statistical algorithm that Explains All Of Life—we’re going to have to teach machines to judge between algorithms based on context. That seems intuitively right to me, but I’d like to hear some back-and-forth on whether other folks agree.