Beginning Machine Learning

I recently finished the Machine Learning course on Coursera that is recommended by MIRI’s research guide for developing a practical familiarity with machine learning.

This post contains my thoughts about the course and tries to convey the updates my mental models went through as the eleven week course progressed.

I started the course with the perspective of an experienced, employed software developer with an engineering degree who had never focused on machine learning before, so I’m sure there’s some background knowledge I take for granted, as well as some things I should have realized long before taking this course.

What is machine learning?

A definition introduced early in the lectures defines machine learning as the field of study that gives computers the ability to learn without being explicitly programmed.

I knew that much before this class.

What I didn’t know were the nuts, bolts, and gears of how to start writing actual code that uses machine learning algorithms.

Prior to the course, I briefly tried to imagine how an application of machine learning might work (e.g. a program learning to play a game). What I came up with was a complex set of rules and loops where other complex rules somehow tweaked the original rules based on whatever results were used to represent desired outcomes.

What those rules should even be was vague, and I thought the whole system I imagined would be a fragile, error prone, maintenance nightmare.

That didn’t square with the robust uses of machine learning by tons of profitable companies, or with the fact that I knew neural nets are increasingly popular and useful. I had a rough conceptual sketch of neural nets being nodes connected in layers with weights and outputs but not how to represent that in code.

I was eager for this class to tell me what I was missing.

Machine learning is math.

March 6, 2018 - As of today, I’m in the middle of week 5, and my initial reaction to the course has been: I’m basically writing programs to do statistics.

Before this course, calculating a linear regression or using the normal equation wouldn’t have struck me as tasks of “machine learning.” I may have thought of these as tools data scientists use, but they seemed too basic for the hype around machine learning.

In reality, the lines between artificial intelligence, data science, and machine learning are blurry and ill defined. I think in many common uses, the phrases are interchangeable.

I should have expected some degree of buzzword bingo to be going on. This is technology after all.

I will say that a linear regression is entry level stuff, and machine learning gets way more complex, but everything still boils down to math! Even neural networks amount to feeding numerical data through a series of equations.

I imagine this is true even for AlphaZero.

I’m hammering this obvious-in-hindsight point home because this is the piece I was seriously missing prior to this class. It feels embarrassing to have gone through so much higher level math for my degree and never needed any of it for my programming jobs. I’m not sure I ever gave much thought to what anyone expected we would do with the math they taught us in school.

And I’m very chagrined I didn’t think of using probabilities in my pre-course wild guessing at a possible machine learning system.

A light bulb has come on for me that makes machine learning less mysterious. It’s math!

Okay machine learning requires more than just math.

There are many subskills and lots of knowledge that comes from experience that would make a person a better developer of machine learning algorithms.

If you don’t have good data, you’ve nothing to use your fancy math equations on. (Collecting and cleaning data sounds like an art.)

There’s also deciding which algorithms to use for which tasks.

There’s knowing how to verify your algorithms are working correctly or telling you what you think they’re telling you.

There’s knowing how to string together a pipeline of many single machine learning tasks that together accomplish more elaborate goals.

There’s knowing pitfalls and mistakes that are common when implementing machine learning algorithms.

There’s knowing techniques for dealing with overfitting or underfitting data.

Understanding computational complexity helps.

And there’s even knowing how to program in the first place.

Reality is Data

While doing this course, I also happened to read the book Algorithms to Live By: The Computer Science of Human Decisions.

It’s a clever book that shows many ways that computer algorithms can provide optimal solutions to everyday questions and situations, without ever getting a computer involved.

Algorithms in computer science do things with data. This book gave me the aha moment of understanding that reality IS data, and that’s why algorithms are applicable outside of the domain of computer science.

Meanwhile, the machine learning course reinforced for me that machine learning is how computer science has learned to make predictions using data.

It’s handy to wrap the making of predictions up in programs in order to extend our prediction making to data we haven’t already calculated everything for. Or data where we’re not sure what matters. Or making all the efforts we’ve gone through fast and reusable.

But when we do this, we are using hardware to facilitate a process we could, in theory, do without computers. Because it’s real. These concepts don’t only exist inside computers, they’re levers for the real world.

Machine learning is how computer science makes predictions using data. And reality is made up of data.

Feeling Potential on a Gut Level

I’ve written so many programs, but they were always driven by the steps a programmer could enumerate and understand. The clients and servers I could imagine talking to each other. The threads I could conceive of coordinating.

I thought I understood the potential of machine learning before, but it feels like it’s clicked on another level.

Drop some data into a neural network (it’s math!) and transformations happen that I can’t follow.

Machine learning algorithms aren’t anything truly intelligent yet, but man, the potential is huge.

Software is eating the world and machine learning is following on its heels.

Course Difficulty

Each of the eleven weeks of the course include a couple hours of lectures, a couple quizzes, and a programming assignment estimated to take up to three hours to complete. The last two weeks don’t have programming assignments, which I think is intentional to give people playing catch-up a chance of passing the course before the session ends.

The vast majority of my effort on the programming assignments was dominated by trying not to mess up vector/​matrix math.

After the first assignment where it became clear that well vectorized code runs much faster than using loops to add and multiply, I went straight for the vectorized implementations when doing assignments from then on.

The programming assignments all involved implementing math equations in reusable functions. (Machine learning algorithms are math! (Sorry, but I’m still giddy about this.))

The actual equations are provided in the lectures and assignment instructions, which made me feel vaguely like I was cheating.

It would certainly be too much to ask students just getting familiar with machine learning to come up with the equations for algorithms themselves, but I didn’t expect this class to be as easy for me as it turned out to be. Maybe it would have been harder if I wasn’t a programmer. I wouldn’t want to take on the course if I’d never coded.

As far as math requirements go, linear algebra and calculus are involved, but understanding anything more complicated than matrix operations isn’t necessary to complete the course. (And the professor includes an optional review section on linear algebra with explanations of the necessary matrix operations.)

The professor says things like, “This value requires calculating the partial derivative,” but then also gives you what the partial derivative works out to be.

I also felt vaguely guilty that I didn’t remember enough to do the derivatives myself, but reviewing the calculus I haven’t touched since college is already part of my AI background learning plans, so I took the professor’s answers and moved on.

Even though the course was easier than I expected, I also expected to move through the material faster than I did. It wasn’t easy to get myself to sit down and listen to the lectures as promptly as I would have preferred. I finished each week on time, but part of me actually expected to finish the entire course several weeks ahead of schedule.

In the end, I finished all quizzes and programming assignments a week early and ended with a grade of 100%. Coursera will bug you 50+ times to purchase the certificate for the class, but it is completely unnecessary to do so to access all of the materials or even receive a grade.

Course Age

This course was originally created in 2011, so it may be a bit dated now. I still think it holds up well and is a worthwhile intro to machine learning.

I did all the programming assignments using the latest version of Octave. The course does not remotely attempt to introduce you to all of the machine learning frameworks or development environments out there. It is focused on foundational concepts and explaining some of the most common machine learning algorithms..

This course isn’t going to get one up to the bleeding edge of machine learning, but I doubt any eleven week introductory course could do such a thing anyway.

The professor does say things along the lines of, “Now you know more than many of the so-called experts making big money on this stuff in Silicon Valley.”

However, the age of the course is why I can’t accept this claim at face value.

Imagining a Machine Learning Cookbook

March 7, 2018 - As I think about what I want to remember from this machine learning course, I don’t feel scared about forgetting the concepts and intuitions. I feel scared about forgetting the equations that make up the different machine learning algorithms.

I found myself briefly tempted to create a cheat sheet of machine learning equations, with minimal other explanation.

A follow up thought to that was: does a corpus of machine learning algorithms (with equations to implement them) already exist? I haven’t been able to find such a thing. My impression is that this information is scattered throughout many papers, textbooks, and websites.

I have a Machine Learning textbook I haven’t cracked open yet, but I doubt it contains the cookbook style approach I’m imagining.

I don’t see a reason why, in principle, cookbook style development wouldn’t be good deliberate practice for building the skill of creating code implementations of AI papers.

Maybe AlphaZero would be an advanced recipe.

How much further study should I do?

The course is a good start for the topic of machine learning, but one could always do more. My main goal is to flesh out the background knowledge I need to understand leading developments in the field of Artificial Intelligence.

I could work through a textbook next.

I could spend time coming up with my own applications, practicing without the training wheels of the course directing my coding efforts.

I could try to shoehorn applications of machine learning into my day job.

I could try assembling a cookbook like the one I imagined.

I could jump into trying to understand and implement code from machine learning papers.

Or perhaps I should set machine learning aside and focus on other topics, because this course was enough of an overview to move on.