I am not a ML Research Scientist—but have studied and used it, and find it very interesting.
As far as I understand, deep learning is able to discover and fit any nonlinear dynamics in a set of data, which then of course has to be trained/regularized/cross-validated to prune away over-fitting. If we accept the view that reality is just a huge set of nonlinear equations and information, and NN/DL can discover these at any level of granularity, then it is a reasonable prediction that they are well posed to be the best.
Also I’m not confident in this next point, but would love some additional feedback. Read this part with skepticism: As I understand effective DL works so well because it combines filtering and tractability within the model structure, and with variation on layering/neurons/optimization techniques, it opens up a greater set of potential models than many other model classes. This makes the fact that it works so well with GPU not a lucky accident, but rather an intrinsic feature of the mathematical structure of the model. Perhaps that’s why we evolved to use NN type structure in our brain—due to its tractability and parallel information processing abilities?
To use an example from my own research, in Financial Econometric asset pricing we often use this tool called a Kalman Filter to filter out states of the world from sets of stochastic PDEs. Optimizing those models, when they have more than ~10 parameters, is such a hassle. It requires lots of optimization black-magic, which is a quasi-scientific method where over months you run different optimization algorithms on the whole model, then single parameters, then the whole model, and if some parameter looks ‘weird’ you manually change it to what you ‘think’ it should be.
Basically a neural network could learn the dynamics here (without revealing them to us), and provide a potentially equal forecast (haven’t tested this). It could also do this much faster than our optimization method (I think). The forecast would be less useful to a human analyst, because without the model dynamics made explicit, it is much harder to run simulations and study specific parameters. But it’s within the realm of reason to predict that a computer, were it self aware, would be able to understand the way the parameters work in the NN/DL model itself.
For this reason I think the natural structure of the model makes it somewhat true that it is #2, but that this makes it a special case of #1 (as MrMind pointed out before me).
Again, I’m not a ML Research Scientist—so if I’ve totally messed something up I’d love to know what and why.
I can’t comment usefully on everything you wrote, so I’ll just say a couple of things.
First, don’t be too credulous: the field of AI has been surrounded and plagued by hype since its inception, the current era isn’t much different. Researchers have every incentive to encourage the hype.
Second, it’s interesting that you bring up the Kalman Filter, because it makes a nice contrast to DNNs. The Kalman filter is actually kind of nice aesthetically, it has a pleasing mathematical elegance to it. People who use the KF know more or less the limits of its applicability. When I’m reading DNN papers, I feel like the whole field has given up on the notion of aesthetics and wholeheartedly embraced architecture hacking as a methodology.
Third, I think you’ll find that the DNNs are much much harder to use than you imagine or expect. The problem is that all DNN research relies on architecture hacking: write down a network, train it up, look at the result, then tweak the architecture and repeat. There is very little, embarrassingly little theory behind it all. The phrase “we have found” is prominent in DNN papers, meaning “we tweaked the network a bunch of times in various ways and found that this trick worked the best.” Furthermore, each cycle of code/test/tweak takes a really long time since DNN training, almost by definition, is very time-consuming.
To address your third point first, I’m sure you are right. I have only played around with simple NNs, and shouldn’t have spoken freely on how it would be easy to estimate a more complex one, when I don’t know much about it.
As a follow up question to your second point: The Kalman filter is a very aesthetically pleasing model, I agree. Something I wonder, but have no idea on, is whether there are mathematical concepts similar to the Kalman filter (in terms of aesthetics and usefulness) that are entirely outside of the understanding of the human brain. So, hypothetically, if we engineered humans with IQ 200+ (or whatever), they would uncover things like the Kalman Filter that normal humans couldn’t grasp.
If that’s true, does it stand to reason that we could still use those models with a sufficiently well optimized/built DNN? We would just never understand what’s going on inside the network?
I often think of self-driving cars as learning the dynamic interactions of a set of nonlinear equations that are beyond the scope of a human to ever derive.
I’ll note I realize some of my questions might be too vague or pseudo-philosophical to be answered.
PS: I did a little internet sleuthing and have read the first ~12 pages of your book so far, which is very interesting and similar to how I think of the world (yours is much more well developed). I am also incredibly interested in empirical philosci and read/write/think about it a ton.
I am not a ML Research Scientist—but have studied and used it, and find it very interesting.
As far as I understand, deep learning is able to discover and fit any nonlinear dynamics in a set of data, which then of course has to be trained/regularized/cross-validated to prune away over-fitting. If we accept the view that reality is just a huge set of nonlinear equations and information, and NN/DL can discover these at any level of granularity, then it is a reasonable prediction that they are well posed to be the best.
Also I’m not confident in this next point, but would love some additional feedback. Read this part with skepticism: As I understand effective DL works so well because it combines filtering and tractability within the model structure, and with variation on layering/neurons/optimization techniques, it opens up a greater set of potential models than many other model classes. This makes the fact that it works so well with GPU not a lucky accident, but rather an intrinsic feature of the mathematical structure of the model. Perhaps that’s why we evolved to use NN type structure in our brain—due to its tractability and parallel information processing abilities?
To use an example from my own research, in Financial Econometric asset pricing we often use this tool called a Kalman Filter to filter out states of the world from sets of stochastic PDEs. Optimizing those models, when they have more than ~10 parameters, is such a hassle. It requires lots of optimization black-magic, which is a quasi-scientific method where over months you run different optimization algorithms on the whole model, then single parameters, then the whole model, and if some parameter looks ‘weird’ you manually change it to what you ‘think’ it should be.
Basically a neural network could learn the dynamics here (without revealing them to us), and provide a potentially equal forecast (haven’t tested this). It could also do this much faster than our optimization method (I think). The forecast would be less useful to a human analyst, because without the model dynamics made explicit, it is much harder to run simulations and study specific parameters. But it’s within the realm of reason to predict that a computer, were it self aware, would be able to understand the way the parameters work in the NN/DL model itself.
For this reason I think the natural structure of the model makes it somewhat true that it is #2, but that this makes it a special case of #1 (as MrMind pointed out before me).
Again, I’m not a ML Research Scientist—so if I’ve totally messed something up I’d love to know what and why.
I can’t comment usefully on everything you wrote, so I’ll just say a couple of things.
First, don’t be too credulous: the field of AI has been surrounded and plagued by hype since its inception, the current era isn’t much different. Researchers have every incentive to encourage the hype.
Second, it’s interesting that you bring up the Kalman Filter, because it makes a nice contrast to DNNs. The Kalman filter is actually kind of nice aesthetically, it has a pleasing mathematical elegance to it. People who use the KF know more or less the limits of its applicability. When I’m reading DNN papers, I feel like the whole field has given up on the notion of aesthetics and wholeheartedly embraced architecture hacking as a methodology.
Third, I think you’ll find that the DNNs are much much harder to use than you imagine or expect. The problem is that all DNN research relies on architecture hacking: write down a network, train it up, look at the result, then tweak the architecture and repeat. There is very little, embarrassingly little theory behind it all. The phrase “we have found” is prominent in DNN papers, meaning “we tweaked the network a bunch of times in various ways and found that this trick worked the best.” Furthermore, each cycle of code/test/tweak takes a really long time since DNN training, almost by definition, is very time-consuming.
To address your third point first, I’m sure you are right. I have only played around with simple NNs, and shouldn’t have spoken freely on how it would be easy to estimate a more complex one, when I don’t know much about it.
As a follow up question to your second point: The Kalman filter is a very aesthetically pleasing model, I agree. Something I wonder, but have no idea on, is whether there are mathematical concepts similar to the Kalman filter (in terms of aesthetics and usefulness) that are entirely outside of the understanding of the human brain. So, hypothetically, if we engineered humans with IQ 200+ (or whatever), they would uncover things like the Kalman Filter that normal humans couldn’t grasp.
If that’s true, does it stand to reason that we could still use those models with a sufficiently well optimized/built DNN? We would just never understand what’s going on inside the network?
I often think of self-driving cars as learning the dynamic interactions of a set of nonlinear equations that are beyond the scope of a human to ever derive.
I’ll note I realize some of my questions might be too vague or pseudo-philosophical to be answered.
PS: I did a little internet sleuthing and have read the first ~12 pages of your book so far, which is very interesting and similar to how I think of the world (yours is much more well developed). I am also incredibly interested in empirical philosci and read/write/think about it a ton.