Alpha Go Zero comments

Link post

DeepMind has published a new paper in Nature detailing “Alpha Go Zero”, a Go AI that was trained with only self-play. Alpha Go Zero equalled the ability of a system trained with supervised learning on a Go profession game corpus in 24 hours (using the same computing resources) and exceeded the ability of the version of Alpha Go that defeated Lee Sedol in 48 hours.

This surprised me. I remember Demis Hassabis saying during the Lee Sedol game coverage that he’d like to look at learning Go from scratch, using only self play and no supervised learning from human games. I thought that that sounded much harder, and guessed that if supervised training times were on the order of a month then from scratch it would take anywhere from three months to years to recapitulate what was learned from human games.

Other things I noted from the paper were that Alpha Go Master (which played and won 50 games against professional players in 2016-2017) used a different architecture to the previous versions, and is about 12 times more computationally efficient as well as being a significantly better player. Zero is based on that architecture (I think), and took about 35 days of training (i.e. self-play) to equal, and then exceed Master’s ability. I don’t know how long Master took to train, but going by the results of this paper I’m guessing that the supervised training would have provided maybe only a day or so worth of head-start compared to zero.