Yeah—of course the brain was always an example of a big neural net that worked, the question was how accessible that design is/was. The core of the crucial update for me—which I can’t pinpoint precisely but I’d guess was somewhere between 2010 to 2014 - was the realization that GD with a few simple tricks really is a reasonable general approximation of bayesian inference, and a perfectly capable global optimizer in the overcomplete regime (the latter seems obvious now in retrospect, but apparently wasn’t so obvious when nets were small: it was just sort of known/assumed that local optima were a major issue). Much else just falls out from that. The ‘groupthink’ I was referring to is that some here are still deriving much of their core AI/ML beliefs from reading the old sequences/lore rather than the DL literature and derivations.
Yeah—of course the brain was always an example of a big neural net that worked, the question was how accessible that design is/was. The core of the crucial update for me—which I can’t pinpoint precisely but I’d guess was somewhere between 2010 to 2014 - was the realization that GD with a few simple tricks really is a reasonable general approximation of bayesian inference, and a perfectly capable global optimizer in the overcomplete regime (the latter seems obvious now in retrospect, but apparently wasn’t so obvious when nets were small: it was just sort of known/assumed that local optima were a major issue). Much else just falls out from that. The ‘groupthink’ I was referring to is that some here are still deriving much of their core AI/ML beliefs from reading the old sequences/lore rather than the DL literature and derivations.