This seems to ignore regularizers that people use to try to prevent overfitting and to make their models generalize better. Isn’t that liable to give you bad intuitions versus the actual training methods people use and especially the more advanced methods of generalization that people will presumably use in the future?
“The best model” is usually regularized. I don’t think this really changes the picture compared to imagining optimizing over some smaller space (e.g. space of models with regularize<x). In particular, I don’t think my intuitions are sensitive to the difference.
I don’t understand what you mean in this paragraph (especially “since each possible parameter setting is being evaluated on what other parameter settings say anyway”)
The normal procedure is: I gather data, and am using the model (and other ML models) while I’m gathering data. I search over parameters to find the ones that would make the best predictions on that data.
I’m not finding parameters that result in good predictive accuracy when used in the world. I’m generating some data, and then finding the parameters that make the best predictions about that data. That data was collected in a world where there are plenty of ML systems (including potentially a version of my oracle with different parameters).
Yes, the normal procedure converges to a fixed point. But why do we care / why is that bad?
I wonder if you could write a fuller explanation of your views here, and maybe include your response to Stuart’s reasons for changing his mind? (Or talk to him again and get him to write the post for you. :)
I take a perspective where I want to use ML techniques (or other AI algorithms) to do useful work, without introducing powerful optimization working at cross-purposes to humans. On that perspective I don’t think any of this is a problem (or if you look at it another way, it wouldn’t be a problem if you had a solution that had any chance at all of working).
I don’t think Stuart is thinking about it in this way, so it’s hard to engage at the object level, and I don’t really know what the alternative perspective is, so I also don’t know how to engage at the meta level.
Is there a particular claim where you think there is an interesting disagreement?
Couldn’t you simulate that with Opt by just running it repeatedly?
If I care about competitiveness, rerunning OPT for every new datapoint is pretty bad. (I don’t think this is very important in the current context, nothing depends on competitiveness.)
“The best model” is usually regularized. I don’t think this really changes the picture compared to imagining optimizing over some smaller space (e.g. space of models with regularize<x). In particular, I don’t think my intuitions are sensitive to the difference.
The normal procedure is: I gather data, and am using the model (and other ML models) while I’m gathering data. I search over parameters to find the ones that would make the best predictions on that data.
I’m not finding parameters that result in good predictive accuracy when used in the world. I’m generating some data, and then finding the parameters that make the best predictions about that data. That data was collected in a world where there are plenty of ML systems (including potentially a version of my oracle with different parameters).
Yes, the normal procedure converges to a fixed point. But why do we care / why is that bad?
I take a perspective where I want to use ML techniques (or other AI algorithms) to do useful work, without introducing powerful optimization working at cross-purposes to humans. On that perspective I don’t think any of this is a problem (or if you look at it another way, it wouldn’t be a problem if you had a solution that had any chance at all of working).
I don’t think Stuart is thinking about it in this way, so it’s hard to engage at the object level, and I don’t really know what the alternative perspective is, so I also don’t know how to engage at the meta level.
Is there a particular claim where you think there is an interesting disagreement?
If I care about competitiveness, rerunning OPT for every new datapoint is pretty bad. (I don’t think this is very important in the current context, nothing depends on competitiveness.)