Hi, thanks for sharing and experimentally trying out the theory in the previous post! Super cool.
Do you have the code for this up anywhere?
I’m also a little confused by the training procedure. Are you just instantiating a random vector and then doing GD with regards to the loss function you defined? Do the charts show the loss averaged over many random vectors (and splotch function variants)?
Overall enjoying this series and your take on CFAR-style rationality. Thanks for putting in the time to write this up.
Michael Nielsen also has some great stuff.
Especially his quantum.country and neural networks one.
Slight ntipick: Simply because logistic isn’t actually used in practice anymore, it might be better to start people new to the sequence with a better activation function like reLU or tanh?
When I was first learning this material, due to people mentioning sigmoid a lot, I thought it would be a good default, and then I learned later on that it’s actually not the activation function of choice anymore, and hasn’t been for a while. (See, for example, Yann LeCun here in 1998 on why normal sigmoid has drawbacks.)
As a university student with ties to EA (and also looking at future opportunities), the EA forum post you linked gave some useful anecdotes to think about. Thank you for sharing the list.
Just wanted to thank you for writing up this series. I’ve been slowly going through the book on my own. Just finished Chapter 2 and it’s awesome to have these notes to review.
A friend I know actually goes everyday with a GoPro recording his interactions.
Also, I’m wondering if you have thoughts on where to store this preserved information? Making sure that future people have access to it seems like the important part. But obviously just making it all available publicly online for everyone seems too vulnerable. Maybe some sort of dead-man’s switch type setup, where it gets made public after you die?
Fading Novelty is the first post, so it’s supposed to be read from top to bottom.
Finally finished up polishing old posts in my series on instrumental rationality. Didn’t cross-post it to LW because much of the stuff is cannibalized, but the link is here https://mlu.red/. Posts are meant to be read sequentially, but I haven’t added “next post” functionality yet.
Whoa! I wrote about something similar here a while ago under the same name, at least about the aesthetics part.
Note: Anna Salamon has a public response on the FB post here (unsure to what extent it’s official)
Seconded. In my view, the anecdotes are there such that the idea is more salient and hangs around longer in your head.
Sure, you can read 10 self-help summaries in an hour, but I don’t think that gives you 10x the same amount of benefit as reading about one concept for an hour. (If anything, I don’t even think you get 1x the same amount of benefit, as you have to factor in potential confusion sorting everything out, etc.)
The padding can also be useful if you’re trying to learn via example, or learn what the stereotype of The Concept looks like.
LFD was my first intro to statistical learning theory, and I think it’s pretty clear. It doesn’t cover the No Free Lunch Theorem or Uniform Convergence, though, so your review actually got me wanting to read UML. I think that if you’re already getting the rigor from UML, you probably won’t get too much out of LFD.
I’m curious if you’ve looked at Learning From Data by Abu-Mostafa, Magdon-Ismail, and Lin? (There’s also a lecture series from CalTech based off the book.)
I haven’t read Understanding Machine Learning, but it does seem to be an even more technical, given my skimming of your notes. However, the Mostafa et al book does give a proof of why you can expect the VC dimension to be polynomially bounded for a set of points greater than the break point (if the VC dimension is finite), as well as a full proof of the VC Bound in the appendix.
Hmmm. I agree with you that fingernail biting didn’t seem to fit the paradigm. However, I did Google “stop biting fingernails”, though, to see if there was any domain specific suggestions. (You may have already done this.)
Two things that maybe seemed promising:
Wear gloves to prevent easy access to hands
Getting a fidget toy to keep your hands otherwise busy
Something else which seems maybe useful is to be mindful/reflective after you’ve noticed that you’ve done it.
Otherwise, I (at least right now) don’t know much about breaking habits without knowing the trigger.
Thanks for the info, Ozzie!
I checked out Observable some more. I think it might actually be a little heavier then what I want. Unsure if I’ll do the coding exercises beforehand (and just post the results + code), or if I’ll go through the work of setting up an interactive notebook so readers can follow along.
I looked into self-hosting it because it seems the default option is creating a notebook hosted on their site. My understanding is that there’s a way to embed notebooks onto my own sites (or the runtime environment is open-sourced?)
I’m going to spend some of the winter holidays working on Abu-Mostafa et al’s Learning From Data’s problem set. I think this should be fun, and I’ll also look into learning Observable for some interactive notebooks for the coding problems.
This piece was helpful in outlining how different people in the AI safety space disagree, and what the issues with Paul’s approaches seem to be. Paul’s analogies with solving hard problems was especially interesting to me (the point where most problems don’t seem to occupy a position midway between totally impossible and solvable). The inline comments by Paul were also good to read as counterpoints to Eliezer’s responses.
Sidenote: Loved the small Avatar reference in the picture of the cabbage vendor.