Did Google actually say how long it took to train Alpha Go?
They did. In the methodology part they give an exact breakdown of how much wallclock time it took to train each step (I excerpted it in the original discussion here or on Reddit), which was something like 5 weeks total IIRC. Given the GPU counts on the various steps, it translated to something like 2 years on a regular laptop GPU, so the parallelization really helped; I don’t know what the limit on parallelization for reinforcement learning is, but note the recent DeepMind paper establishing that you can even throw away experience-replay entirely if you go all-in on parallelization (since at least one copy will tend to be playing something relevant while the others explore, preventing catastrophic forgetting), so who knows what one could do with 1k GPUs or a crazy setup like that?
The answer is “mine Bitcoin in the pre-FPGA days” :-)
This year Nvidia is releasing its next generation of GPUs (Pascal) which is supposed to provide a major speed-up (on the order of 10x) for neural net applications.
They did. In the methodology part they give an exact breakdown of how much wallclock time it took to train each step (I excerpted it in the original discussion here or on Reddit), which was something like 5 weeks total IIRC. Given the GPU counts on the various steps, it translated to something like 2 years on a regular laptop GPU, so the parallelization really helped; I don’t know what the limit on parallelization for reinforcement learning is, but note the recent DeepMind paper establishing that you can even throw away experience-replay entirely if you go all-in on parallelization (since at least one copy will tend to be playing something relevant while the others explore, preventing catastrophic forgetting), so who knows what one could do with 1k GPUs or a crazy setup like that?
The answer is “mine Bitcoin in the pre-FPGA days” :-)
This year Nvidia is releasing its next generation of GPUs (Pascal) which is supposed to provide a major speed-up (on the order of 10x) for neural net applications.