LDL 7: I wish I had a map

When I was an undergraduate considering going to graduate school in mathematics, one thing that I knew a reasonably amount about and spent a good amount of time thinking about was what the field of math looked like overall.

In very, very broad strokes I knew that geometry, algebra, and analysis were generally different things pursued by different people. I knew that these three were very mature sets of tools at least as much as active research areas, and that other areas of math which might be more naively “interesting,” such as chaos theory or number theory or topology, were generally divided up in the research community based on how amenable certain problems were to using tools from each of those three core branches and how much the practitioners like using each of those types of tools. I also knew that these tools weren’t mutually exclusive, and that a lot of the interesting progress in my chosen subfield of number theory was driven by rapid movement and conversion between these tools—viewing an equation as related to a curve geometrically, placing the geometry of that curve in an algebraic structure, and then using analysis to find some solution in this space and trace it back to a solution of the original equation, for example.

While this is a long way from being sufficient for doing mathematics, I do think this whole process and conception of what mathematics is is very important. In particular, if I had a question and I didn’t know if the answer was known or unknown, or if there were techniques that might help, I had a good sense of who to ask and what books to look in to find answers. Sort of a technical academic equivalent of Google Fu—understanding the field well enough to search through it efficiently.

I do not have this sense for deep learning research.

I know there are three fundamental problems in machine learning; supervised, unsupervised, and reinforcement learning. I know that supervised learning problems have many, many avenues of attack and that reinforcement learning has made progress in large part by trying to reduce itself (via something like deep approximate Q-learning) to a supervised learning task, and that unsupervised learning is difficult because it’s unclear how to do this reduction.

But if I have a question like, “how can I use transfer learning to apply knowledge from a large dataset to a small dataset?” I’m going to have a lot of trouble answering it unless I can guess the magical password that this is the “few shot learning” problem which is widely studied under that name. Even if I do get that far, I often don’t know how to identify the latest updates in the field. I managed mostly on my own to find a recent paper from DeepMind (though is 2016 really recent in this paradigm? How would I know?) detailing “matching networks” which use bottlenecks from well trained networks and then train LSTMs on one-shot learning tasks utilizing the bottlenecks as features, which is actually very helpful for the application I have in mind. But when my task moves from a one-shot learning task to simply a small data task (accumulating maybe 10 thousand examples before the model becomes obsolete) will this complete change of paradigm actually help me? And what if I also have a problem with strong imbalances in my data? Will I be able to layer on a solution I find by googling that problem, or will there be major interference? And it is going to be a pain to combine those codebases since I’m now going to have to probably sort through code from two different grad students replicating two different experiments.

Of course, these sorts of difficulties are a reasonable thing to spend time working on, and reading other people’s code is an important part of being a software engineer (never mind that I’m not a software engineer). And the questions I have are the sorts of stupid questions that I would (now that I’ve spent a lot of time as a grad student) feel very comfortable asking if I were a grad student and there were professors around who had some reason to spend time talking to me.

But I am not a grad student, and I don’t have a lot of great opportunities for mentorship around me right now (though I’m hoping that may change soon). I’m not sure how to develop this intuition about how to go about finding the correct tools and how to intuit whether and how to start hitting things with the standard successful tools, or how to go about efficiently modifying other ideas and code to get the results I want.

And beyond that, even if I did have this intuition, it wouldn’t solve the problem for everyone else. Maybe there are resources on this subject that would be helpful, but I’m sure there aren’t online courses about it and I don’t trust textbooks too deeply when it comes to cutting edge CS research. And I’m not even sure if a book COULD convey what I’m talking about! Certainly it would be much harder for me to write in a way that conveys my intuition about academic math that it is for me to write in a way that explains specific math.

This does leave me with a question though, for readers who engage with deep learning: how did you develop your intuition for research in the field? What would you recommend for newer researchers?