FactorialCode comments on Outperforming the human Atari benchmark

FactorialCode 31 Mar 2020 23:00 UTC
16 points
I’m going to parrot a comment from the hackernews discussion on this:

This whole evolution looks more and more like expert systems from 1980s where people kept adding more and more complexity to “solve” a specific problem. For RL, we started with simple DQN that was elegant but now the new algorithms looks like a massive hodge podge of band aids. NGU, as it is, extraordinarily complex and looks adhoc mix of various patches. Now on the top of NGU, we are also throwing in meta-controller and even bandits among other things to complete the proverbial kitchen sink. Sure, we get to call victory on Atari but this is far and away from elegant and beautiful. It would be surprising if this victory generalizes to other problems where folks have built different “expert systems” specific to those problems. So all this feels a lot like Watson winning jeopardy moment to me...

PS: Kudos to DeepMind for pushing for median or even betten bottom percentile instead of simplistic average metric which also hides variance.

A lot of techniques had to be thrown together to make this work, and in that sense it reminds me of rainbow DQN since they’ve thrown together a bunch of things to solve the problem. However, a quick glance at the tables in appendix H.4 of the paper makes it hard to tell if this is really much of an improvement over the other atari agents Deepmind has put together.