Learning Deep Learning: Joining data science research as a mathematician

About two years ago I finished my PhD in mathematics on an obscure technical topic in number theory. I left academic math because I wanted to do something that had a bigger (i.e. any) impact on the world around me. I also wanted to get out of the extremely perverse academic job market.

Since then, I’ve designed taught courses on machine learning and am now working as a data scientist for a large company you’ve heard of (but not that one). In some respects I feel like my background in math better prepared me for this job than I can imagine a data science program doing—I think my desire for a higher burden of proof than p=.05 is one of the most important things I’ve brought to the table in all the projects I’ve touched. I’ve also gotten a lot out of my background on LessWrong, mostly because it’s the only place I’ve ever really studied statistics. You’d think you couldn’t get a math PhD without doing at least a little stats, but you’d be wrong.

Anyway there is one aspect of data science that I’m definitely behind the curve on, and that’s the software engineering side. In particular, as a mathematician I’m very read to grab a hold of a lot of abstraction and then tuck it into a black box and then reopen it whenever it doesn’t work exactly how I expect. But in the modern data science community, there are a ton of abstractions and they’ve only been boxed up inconsistently.

I read a lot about deep learning research, both on LessWrong and for work, and there are a ton of interesting experiments that I would like to replicate and a handful of original research ideas I’d like to try out, just in brief prototypes to see if they are even worth exploring or thinking about. To do this, what I want is something like the ability to write the following code:

model = pre_trained_alexnet()

GAN = generative_adversarial_network()
model.transfer(new_task())
GAN.aggrieve(model,new_task())

While it may be possible to do things like this, especially with libraries like keras, this isn’t where most introductions to deep learning start.

Frustratingly for me, they almost all start with a huge series of videos on how to implement backpropagation where they insist that it’s not really that bad if you don’t understand all the calculus. To me this feels like doing absolutely nothing—these parts of the process have already been well established and optimized beyond my ability to contribute meaningfully. What I want to work with is higher levels of abstraction, looking at what architectures perform well on which problems, what level of data augmentation improves accuracy and what level causes overfitting, etc.

Anyway I don’t get anywhere by sitting around complaining about how teaching is hard and not everyone is perfect at adapting to my unique situation as a student, so I’m focusing on self-improvement and I’m hoping to keep myself responsible about it for at least a couple of weeks by writing about my progress. With any luck this will end up being a useful resource for others in my circumstances but no promises.

So for now, I’ll outline a few near-term (the next week), medium term (by the end of the year), and longer term (within a couple of years) goals that I have for myself.

Near term activities:

Right now I am taking the coursera deep learning specialization. I’m not very synced up with the official course schedule, mostly because in these initial “compute derivatives and parrot definitions of bias and variance” stages I can get through 2-3 weeks of material in a day. I don’t feel like I’ve had a strong learning experience yet, though I’m optimistic that this will improve in later courses, but this does keep me focused and will be advantageous for future interviews since it reinforces basics and will give me a certificate I can put on a resume.

Medium term goals:

There are a ton of research papers I’d like to replicate the results of, but there are two in particular that I’d like to understand the connections between: ‘Deep Reinforcement Learning for Human Preferences’ by Christiano et. al., and ‘”Why should I trust you?” explaining the predictions of any classifier’ by Ribeiro et. al.

The first uses weakly directed reinforcement learning augmented by occasional sparse human feedback to build complicated reward functions. The second constructs model agnostic explanations for single classifications of supervised learners and dictates a mechanism for non-experts to select quality learners or potentially improve learners by rating their explanations for different predictions.

I’d like to use the process of the first on the second, to see if I can train a very poorly calibrated network on image classification or to see if I can use explanations to make such a network more robust to adversarial examples. Hopefully in the next 10 weeks I can learn enough basic tools for deep learning that setting up this kind of experiment is not so daunting.

Long term goals:

What I would really like to have is a nice setup of things like Atari environments for deep reinforcement learning, the ability to easily do apprenticeship and inverse reinforcement learning from this environment, and a familiar code base for building evolutionary algorithms and adversarial examples. My guess is that 80-90% of the codebase I want already exists, but that something like 30-40% of it is maintained on personal github pages by grad students and the libraries take significant work to compile.

I’ll try to continue this with 10-15 minute TIL updates until November, when my writing time will be dedicated to NaNoWriMo.

If anyone wants to follow along or talk on some Discord some time about studying, or if you have suggestions for good/​better places to learn these sorts of things, please let me know!