jkrause

Karma: 55

jkrause 26 Feb 2015 18:42 UTC
14 points
in reply to: gwern’s comment on: “Human-level control through deep reinforcement learning”—computer learns 49 different games
Can confirm that hardware (and data!) are the two main culprits here. The actual learning algorithms haven’t changed much since the mid 1980s, but computers have gotten many times faster, GPUs are 30-100x faster still, and the amount of data has similarly increased by several orders of magnitude.

jkrause 26 Feb 2015 21:09 UTC
9 points
in reply to: skeptical_lurker’s comment on: “Human-level control through deep reinforcement learning”—computer learns 49 different games
That was indeed one of the hypotheses about why it was difficult to train the networks—the vanishing gradient problem. In retrospect, one of the main reasons why this happened was the use of saturating nonlinearities in the network—nonlinearities like the logistic function or tanh which asymptote at 1. Because they asymptote, their derivatives always end up being really small, and the deeper your network the more this effect compounds. The first large-scale network that fixed this was by Krizhevsky et al., which used a Rectified Linear Unit (ReLU) for their nonlinearity, given by f(x) = max(0, x). The earliest reference I can find to using ReLUs is Jarrett et al., but since Krizhevsky’s result pretty much everyone uses ReLUs (or some variant thereof). In fact, the first result I’ve seen showing that logistic/tanh nonlinearities can work is the batch normalization paper Sean_o_h linked, which gets around the problem by normalizing the input to the nonlinearity, which presumably prevents the units from saturating too much (though this is still an open question).

jkrause 26 Feb 2015 17:33 UTC
8 points
in reply to: skeptical_lurker’s comment on: “Human-level control through deep reinforcement learning”—computer learns 49 different games
Training networks layer by layer was the trend from the mid to late 2000s up until early 2012, but that changed in mid 2012 when Alex Krizhevsky and Geoff Hinton finally got neural nets to work for large-scale tasks in computer vision. They simply trained the whole network jointly with stochastic gradient descent, which has remained the case for most neural nets in vision since then.

jkrause 19 Mar 2014 3:39 UTC
2 points
in reply to: Ben Pace’s comment on: Open thread, 18-24 March 2014
Yes, this happens to me in Windows, but not Ubuntu (both Chrome).