I like your comments, 307th, and your linked post on RL SotA. I don’t agree with everything you say, but I some of what you say is quite on point. In particular I agree that ‘RL is currently being rather unimpressive in achieving complicated goals in complex wide-possible-action-space simulation worlds’. I agree that some fundamental breakthroughs are needed to change this, not just scaling existing methods. I disagree that such breakthroughs will necessarily require many calendar years of research. I think probably the eyes of the big research labs will soon be turning to focus more fully upon tackling complex-world RL, and that it won’t be long at all before significant breakthroughs start being made.
I think rather than thinking about research progress in terms of years, or even ‘researcher hours’, it’s more helpful to think of progress in terms of ‘research points’ devoted to the specific topic. An hour of a highly effective researcher at a well-funded lab, with a well-setup research environment that makes new experiments easy to run is worth vastly more ‘research points’ towards a topic than an hour of a compute-limited grad student without polished experiment-running code patterns, without access to huge compute resources, and without much experience running large experiments over many variables.
I like your comments, 307th, and your linked post on RL SotA. I don’t agree with everything you say, but I some of what you say is quite on point. In particular I agree that ‘RL is currently being rather unimpressive in achieving complicated goals in complex wide-possible-action-space simulation worlds’. I agree that some fundamental breakthroughs are needed to change this, not just scaling existing methods. I disagree that such breakthroughs will necessarily require many calendar years of research. I think probably the eyes of the big research labs will soon be turning to focus more fully upon tackling complex-world RL, and that it won’t be long at all before significant breakthroughs start being made.
I think rather than thinking about research progress in terms of years, or even ‘researcher hours’, it’s more helpful to think of progress in terms of ‘research points’ devoted to the specific topic. An hour of a highly effective researcher at a well-funded lab, with a well-setup research environment that makes new experiments easy to run is worth vastly more ‘research points’ towards a topic than an hour of a compute-limited grad student without polished experiment-running code patterns, without access to huge compute resources, and without much experience running large experiments over many variables.