Rage Against The MOOChine

[This was a review of Andrew Ng’s Machine Learning course for a math and computer science guide I was making that ended up spiraling into a rant against MOOC’s (massive open online course), I left it un-posted for a while but I think it remains mostly true what the problems are with the current MOOC paradigm.]

Course: Machine Learning, Andrew Ng

I started out excited for this course and left disappointed with a changed outlook on online learning.

I’ll start off saying that I liked the theory section in this course. I think Andrew did a good job of explaining the theory of why he was doing the things he was doing. I liked the explanations of supervised and unsupervised learning. The later chapters on recommender systems and principal component analysis were also very interesting.

But the problem is that other than a brief overview of the skills there was not much other depth to the course. The course glazed over math topics like how to find the partial derivative of the cost function for gradient descent. As somebody without a rigorous math background I was fine with that because I don’t know how to do partial differential equations. Then we got to programming in Octave, during the lecture he said it was better this way because it’s faster and it doesn’t really pick sides of which data science programming language you should use. The programming exercises being in octave could be overlooked if the programming exercises were good, but sadly they aren’t.

The programming assignments don’t teach you anything, at least not anything I could at this point say is useful. Usually they are some form of transcribing the loss function of the model you are learning from its mathematical representation to it’s coding representation. With everything usually given to you as a hint at the top of the problem in comments, or it’s vectorized form mentioned in the document you are given to work through the problem.

Which made me feel like I was mostly doing this for each coding assignment:

It was trivial, and it felt trivial. All of the actual hard part of making the model was usually made for you. The data being collected, cleaned, visualized, loaded, split and then sent to a function where you just type the loss function pretty much as described. The only time I really ran into problems with getting code to run is when I didn’t understand Octave syntax. I started thinking towards the middle point of the course about how much it matters if I know the Octave syntax if I am never planning on using it again. Why bash my head into the learning of esoteric Octave features if i will just relearn all this in python? Why not just do it in python to begin with?

I think the main reason it’s not in python is mainly due to the course being older. Andrew Ng’s most recent specializations are done in python and I think that’s because it’s the industry leader now. I understand it would be too much to rebuild the course from the ground up but I do believe that people learning Octave for practical machine learning from the course are mostly wasting their time. It would be better to try to implement the coding assignments in python using numpy and PyTorch to test your own learning and getting used to manipulating vectors and matrices in python. I was scrolling through the reviews of this course and one of the more glowing reviews said they fondly remember looking at Octave documentation for hours to try and get their code to work. I couldn’t help but think, “what a waste of time”.

So the coding assignments were kinda sucky and required outside work to be practical in the modern day machine learning environment, even though that’s rough at least the theory is good right? Yeah, it is good. Andrew Ng is a good teacher, with nice explanations of one topic flowing into the next. But like with the programming assignments the further I got into the course I started to ask myself, “what am I really doing here?”.

The problem is math. Going over principle component analysis and anomaly detection, Andrew Ng did a great job of brushing over the math components to get to a more broad overview of how these things work. Which is great for getting that broad overview, but not useful either. As for stuff like PCA or calculating loss functions, there are already some modules in PyTorch that will just do that for you. I learned in one of my other MOOC’s that modules that do calculations for you are usually better than writing your own (other than figuring out implementations), purely for reasons that usually they are made by practitioners to be as quick and as computationally efficient as possible. So the reason for learning the theory of anomaly detection, PCA, gradient descent and all other machine learning algorithms would then be to have a baseline of that knowledge and build off it. But...

( I typed “Andrew Ng math memes” into google for this image, there are many like it. )

Math is hard. AI and machine learning have been getting a lot of hype in popular culture, with recent interesting releases making people like me want to join the field and contribute. The problem is, I’m assuming, a large chunk of us have not developed a good enough math skill set but still want to get involved where all the cool stuff is happening. This means that when making courses teachers will be incentivized to make courses which are inclusive to the vast amount of people wanting to join ML but not knowing much (or almost any) math, because if they don’t the courses that are inclusive to those lacking higher level math skills will be making the money and not them. This is also why, I’m assuming, many machine learning courses include an “intro to python” section as well. And I’m going to be pessimistic and assume the teaching of python isn’t to get those with many years in the software development field up to date with python. It’s to seemingly lower the bar for entry, giving the assuming learner the idea that math and ml and python can all be learned in a big bundle together which usually isn’t the case. (Which it theoretically could, but it would need to be a really really really big book or course).

Math is vitally important in this field. Since all the lectures in this course contained math and these models only function and make sense because of math, it seems like the course is then doing students a disservice by not being more rigorous with math and in my opinion is worse for future wannabe practitioners for brushing over it. But again I understand why it is this way because many new students don’t want to deal with the long road to being math proficient and the teachers of all courses of this type don’t want to limit their applicant pool.

This is all to say I don’t think learning PCA or anomaly detection was good for me with my limited math skills. I enjoy having a broad idea of what they do, but with no background of how those methods came to be and how they could be built off of in the future I realized I was goodhearting myself by learning the theory provided in this course. I wasn’t building strong foundations of knowledge that would help me in future rigorous applications, I was building little islands of knowledge with no coherent whole. I was optimizing for learning the course content because that’s what I thought would help me become a machine learning engineer. This applied to the coding exercises as well.

But the more I thought about the more I started to see this trend in almost all of the MOOC courses I’ve taken. I was goodhearting myself almost the whole time. I had the autodidact equivalent of tires being stuck in mud, I was moving through the courses fast, but also getting almost nowhere. It was a disappointing realization, if I sound bitter it’s because I am. I don’t know if other online learners feel the same but it is disheartening. I still like MOOCs as a concept, I think they could really help with practical understanding, but the current way isn’t doing it. The current ML MOOC landscape is filled with courses that are too coddling and shallow for anybody that wants real working knowledge. These bad courses are made intentionally or unintentionally by what I believe are perverse incentives by educators, who are operating on the misguided wants of the educatee. There are exceptions but they are few and far between and this course isn’t one of them.

I don’t want to optimize for the wrong thing, I want to gain skills and work on hard problems. I think the modern learning landscape sucks because of these misaligned incentives. I can complete the course and get a little certificate which looks cool to my layman peer group, but if I were put in front of a raw dateset right now and told to build a machine learning model I wouldn’t do a very good job (if I even could make one), and I don’t like that.

So I’m gonna accept that this experiment didn’t work out. I tried to learn computer science by MOOC’s but I don’t feel comfortable with what I’ve learned even though I’ve been through 7 different courses by a handful of different websites by now. I’m going to reevaluate my courses and build a better guide to learn math and computer science that will actually build long lasting and cumulative skills.


Reflection: After leaving this post alone for a month and reading it again I still think it’s correct. I think most academic institutions online or not have people optimize for the degree /​ certificate /​ accreditation rather than practical knowledge. Partly because people don’t know when starting a program what would be practical knowledge and things in the field change all the time so accreditation it seems is usually just a rough proxy for, “this person probably knows enough in this field to be useful”.

I am sympathetic to how hard it is to build useful courses, and how hard it is to understand if a student actually knows something or can just regurgitate it and then will forget it immediately after. Its uncharitable to MOOC’s to say only they have the problem of lacking usefulness while they were directly made after how institutional classes are. But the lack of student interaction and tangent environment (clubs, professor interaction, etc) does leave MOOC’s more limited for a useful comprehensive learning environment.

I think the biggest problem MOOC’s seem to have is it’s lack of advancing structure. A degree has set classes increasing in difficultly so the higher level classes know that the student is prepared (or at least passed the lower level classes) to be there. If Andrew Ng’s course was the 5th course behind a calc, linear algebra, statistics and probability theory course it would feel like a nice payoff. But the course assumes nothing of its learner and suffers for it. The universities that offer degree like multi-class MOOC’s usually have an application process and cost thousands of dollars, which is prohibitive but understandable as they usually give a degree that doesn’t say “online” on it, and for societal reasons “real” accreditation can’t come cheap.

I think a solid education curriculum for math and computer science could be made to be less expensive and much more coherently, and since neither the learners or the educators have any incentive to change how MOOC’s fit together or function for the better, I decided to make a better guide for learning. You can it here as a GitHub read-me. I spent a lot of time on it but its still in its infancy, it uses mostly textbook sprinkled with the MOOC classes I think are /​ will be actually useful. I would greatly appreciate any comments to make it better, especially what resources would be good for self learning Machine Learning /​ Deep Learning with rigorous math. Either way I will keep it updated for the autodidact who wants to learn Math and CS a more rigorous way, no matter where they start. I obviously can’t offer accreditation so hopefully it will be helpful for people who want to optimize for pure competency in the field. I’m staking my own future on it so I have every incentive for this guide to be as good and useful as possible.