If the goal of your comment was to push people to learn things you think they should know, pointing towards some stuff (not an exhaustive list) is the bare minimum for that to be effective.
Here’s an obvious next step for people: google for resources on RL, ask others for recommendations on RL, try out some of the resources and see which one works best for you, and then choose one resource and dive deep into it, potentially repeat until you understand new RL papers by reading. I think people would be better off executing that algorithm than looking at specific resources that I might name.
I wouldn’t be surprised if other people have better algorithms for self-learning new fields—I’m pretty atypical and shouldn’t be expected to know what works for people who aren’t me. E.g. TurnTrout has done a lot of self-learning from textbooks and probably has better advice.
I would hope most AF readers are capable of coming up with and executing something like this algorithm. If not, there are bigger problems than the lack of RL knowledge.
----
I also don’t buy that pointing out a problem is only effective if you have a concrete solution in mind. MIRI argues that it is a problem that we don’t know how to align powerful AI systems, but doesn’t seem to have any concrete solutions. Do you think this disqualifies MIRI from talking about AI risk and asking people to work on solving it?
E.g. TurnTrout has done a lot of self-learning from textbooks and probably has better advice [for learning RL]
I have been summoned! I’ve read a few RL textbooks… unfortunately, they’re either a) very boring, b) very old, or c) very superficial. I’ve read:
Reinforcement Learning by Sutton & Barto (my book review)
Nice book for learning the basics. Best textbook I’ve read for RL, but that’s not saying much.
Superficial, not comprehensive, somewhat outdated circa 2018; a good chunk was focused on older techniques I never/rarely read about again, like SARSA and exponential feature decay for credit assignment. The closest I remember them getting to DRL was when they discussed the challenges faced by function approximators.
AI: A Modern Approach 3e by Russell & Norvig (my book review)
Engaging and clear, but most of the book wasn’t about RL. Outdated, but 4e is out now and maybe it’s better.
Markov Decision Processes by Puterman
Thorough, theoretical, very old, and very boring. Formal and dry. It was written decades ago, so obviously no mention of Deep RL.
Neuro-Dynamic Programming by Tsitsiklis
When I was a wee second-year grad student, I was independently recommended this book by several senior researchers. Apparently it’s a classic. It’s very dry and was written in 1996. Pass.
OpenAI’s several-page web tutorial Spinning Up with Deep RL is somehow the most useful beginning RL material I’ve seen, outside of actually taking a class. Kinda sad.
So when I ask my brain things like “how do I know about bandits?”, the result isn’t “because I read it in {textbook #23}”, but rather “because I worked on different tree search variants my first summer of grad school” or “because I took a class”. I think most of my RL knowledge has come from:
My own theoretical RL research
the fastest way for me to figure out a chunk of relevant MDP theory is often just to derive it myself
Watercooler chats with other grad students
Sorry to say that I don’t have clear pointers to good material.
I do share your opinion on the Sutton and Barto, which is the only book I read from your list (except a bit of the Russell and Norvig, but not the RL chapter). Notably, I took a lot of time to study the action value methods, only to realise later that a lot of recent work focus instead of policy-gradient methods (even if actor critics do use action-values).
From your answer and Rohin’s, I gather that we lack a good resource in Deep RL, at least of the kind useful for AI Safety researchers. It makes me even more curious of the kind of knowledge that would be treated in such a resource.
Here’s an obvious next step for people: google for resources on RL, ask others for recommendations on RL, try out some of the resources and see which one works best for you, and then choose one resource and dive deep into it, potentially repeat until you understand new RL papers by reading.
Agreed. Which is exactly why I asked you for recommendations. I don’t think you’re the only one someone interested in RL should ask for recommendation (I already asked other people, and knew some resource before all this), but as one of the (apparently few) members of the AF with the relevant skills in RL, it seemed that you might offer good advice on the topic.
About self-learning, I’m pretty sure people around here are good on this count. But knowing how to self-learn doesn’t mean knowing what to self-learning. Hence the pointers.
I also don’t buy that pointing out a problem is only effective if you have a concrete solution in mind. MIRI argues that it is a problem that we don’t know how to align powerful AI systems, but doesn’t seem to have any concrete solutions. Do you think this disqualifies MIRI from talking about AI risk and asking people to work on solving it?
No, I don’t think you should only point to a problem with a concrete solution in hands. But solving a research problem (what MIRI’s case is about) is not the same as learning a well-established field of computer science (what this discussion is about). In the latter case, you ask for people to learn things that already exists, not to invent them. And I do believe that showing some concrete things that might be relevant (as I repeated in each comment, not an exhaustive list) would make the injunction more effective.
That being said, it’s perfectly okay if you don’t want to propose anything. I’m just confused because it seems low effort for you, net positive, and the kind of “ask people for recommendation” that you preach in the previous comment. Maybe we disagree on one of these points?
Which is exactly why I asked you for recommendations.
Yes, I never said you shouldn’t ask me for recommendations. I’m saying that I don’t have any good recommendations to give, and you should probably ask other people for recommendations.
showing some concrete things that might be relevant (as I repeated in each comment, not an exhaustive list) would make the injunction more effective.
In practice I find that anything I say tends to lose its nuance as it spreads, so I’ve moved towards saying fewer things that require nuance. If I said “X might be a good resource to learn from but I don’t really know”, I would only be a little surprised to hear a complaint in the future of the form “I deeply read X for two months because Rohin recommended it, but I still can’t understand this deep RL paper”.
If I actually were confident in some resource, I agree it would be more effective to mention it.
I’m just confused because it seems low effort for you, net positive, and the kind of “ask people for recommendation” that you preach in the previous comment.
I’m not convinced the low effort version is net positive, for the reasons mentioned above. Note that I’ve already very weakly endorsed your mention of Sutton and Barto, and very weakly mentioned Spinning Up in Deep RL. (EDIT: TurnTrout doesn’t endorse Sutton and Barto much, so now neither do I.)
In practice I find that anything I say tends to lose its nuance as it spreads, so I’ve moved towards saying fewer things that require nuance. If I said “X might be a good resource to learn from but I don’t really know”, I would only be a little surprised to hear a complaint in the future of the form “I deeply read X for two months because Rohin recommended it, but I still can’t understand this deep RL paper”.
Hum, I did not think about that. It makes more sense to me now why you don’t want to point people towards specific things. I still believe the result will be net positive if the right caveat are in place (then it’s the other’s fault for misinterpreting your comment), but that’s indeed assuming that the resource/concept is good/important and you’re confident in that.
Here’s an obvious next step for people: google for resources on RL, ask others for recommendations on RL, try out some of the resources and see which one works best for you, and then choose one resource and dive deep into it, potentially repeat until you understand new RL papers by reading. I think people would be better off executing that algorithm than looking at specific resources that I might name.
I wouldn’t be surprised if other people have better algorithms for self-learning new fields—I’m pretty atypical and shouldn’t be expected to know what works for people who aren’t me. E.g. TurnTrout has done a lot of self-learning from textbooks and probably has better advice.
I would hope most AF readers are capable of coming up with and executing something like this algorithm. If not, there are bigger problems than the lack of RL knowledge.
----
I also don’t buy that pointing out a problem is only effective if you have a concrete solution in mind. MIRI argues that it is a problem that we don’t know how to align powerful AI systems, but doesn’t seem to have any concrete solutions. Do you think this disqualifies MIRI from talking about AI risk and asking people to work on solving it?
I have been summoned! I’ve read a few RL textbooks… unfortunately, they’re either a) very boring, b) very old, or c) very superficial. I’ve read:
Reinforcement Learning by Sutton & Barto (my book review)
Nice book for learning the basics. Best textbook I’ve read for RL, but that’s not saying much.
Superficial, not comprehensive, somewhat outdated circa 2018; a good chunk was focused on older techniques I never/rarely read about again, like SARSA and exponential feature decay for credit assignment. The closest I remember them getting to DRL was when they discussed the challenges faced by function approximators.
AI: A Modern Approach 3e by Russell & Norvig (my book review)
Engaging and clear, but most of the book wasn’t about RL. Outdated, but 4e is out now and maybe it’s better.
Markov Decision Processes by Puterman
Thorough, theoretical, very old, and very boring. Formal and dry. It was written decades ago, so obviously no mention of Deep RL.
Neuro-Dynamic Programming by Tsitsiklis
When I was a wee second-year grad student, I was independently recommended this book by several senior researchers. Apparently it’s a classic. It’s very dry and was written in 1996. Pass.
OpenAI’s several-page web tutorial Spinning Up with Deep RL is somehow the most useful beginning RL material I’ve seen, outside of actually taking a class. Kinda sad.
So when I ask my brain things like “how do I know about bandits?”, the result isn’t “because I read it in {textbook #23}”, but rather “because I worked on different tree search variants my first summer of grad school” or “because I took a class”. I think most of my RL knowledge has come from:
My own theoretical RL research
the fastest way for me to figure out a chunk of relevant MDP theory is often just to derive it myself
Watercooler chats with other grad students
Sorry to say that I don’t have clear pointers to good material.
Thanks for the in-depth answer!
I do share your opinion on the Sutton and Barto, which is the only book I read from your list (except a bit of the Russell and Norvig, but not the RL chapter). Notably, I took a lot of time to study the action value methods, only to realise later that a lot of recent work focus instead of policy-gradient methods (even if actor critics do use action-values).
From your answer and Rohin’s, I gather that we lack a good resource in Deep RL, at least of the kind useful for AI Safety researchers. It makes me even more curious of the kind of knowledge that would be treated in such a resource.
Agreed. Which is exactly why I asked you for recommendations. I don’t think you’re the only one someone interested in RL should ask for recommendation (I already asked other people, and knew some resource before all this), but as one of the (apparently few) members of the AF with the relevant skills in RL, it seemed that you might offer good advice on the topic.
About self-learning, I’m pretty sure people around here are good on this count. But knowing how to self-learn doesn’t mean knowing what to self-learning. Hence the pointers.
No, I don’t think you should only point to a problem with a concrete solution in hands. But solving a research problem (what MIRI’s case is about) is not the same as learning a well-established field of computer science (what this discussion is about). In the latter case, you ask for people to learn things that already exists, not to invent them. And I do believe that showing some concrete things that might be relevant (as I repeated in each comment, not an exhaustive list) would make the injunction more effective.
That being said, it’s perfectly okay if you don’t want to propose anything. I’m just confused because it seems low effort for you, net positive, and the kind of “ask people for recommendation” that you preach in the previous comment. Maybe we disagree on one of these points?
Yes, I never said you shouldn’t ask me for recommendations. I’m saying that I don’t have any good recommendations to give, and you should probably ask other people for recommendations.
In practice I find that anything I say tends to lose its nuance as it spreads, so I’ve moved towards saying fewer things that require nuance. If I said “X might be a good resource to learn from but I don’t really know”, I would only be a little surprised to hear a complaint in the future of the form “I deeply read X for two months because Rohin recommended it, but I still can’t understand this deep RL paper”.
If I actually were confident in some resource, I agree it would be more effective to mention it.
I’m not convinced the low effort version is net positive, for the reasons mentioned above. Note that I’ve already very weakly endorsed your mention of Sutton and Barto, and very weakly mentioned Spinning Up in Deep RL. (EDIT: TurnTrout doesn’t endorse Sutton and Barto much, so now neither do I.)
Hum, I did not think about that. It makes more sense to me now why you don’t want to point people towards specific things. I still believe the result will be net positive if the right caveat are in place (then it’s the other’s fault for misinterpreting your comment), but that’s indeed assuming that the resource/concept is good/important and you’re confident in that.