A point about counting arguments that I have not seen made elsewhere (although I may have missed it!).
The failure of the counting argument that SGD should result in overfitting is not a valid countexample! There is a selection bias here—the only reason we are talking about SGD is *because* it is a good learning algorithm that does not overfit. It could well still be true that almost all counting arguments are true about almost all learning algorithms. The fact that SGD does generalises well is an exception *by design*.
Unless you think transformative AI won’t be trained with some variant of SGD, I don’t see why this objection matters.
Also, I think the a priori methodological problems with counting arguments in general are decisive. You always need some kind of mechanistic story for why a “uniform prior” makes sense in a particular context, you can’t just assume it.
A point about counting arguments that I have not seen made elsewhere (although I may have missed it!).
The failure of the counting argument that SGD should result in overfitting is not a valid countexample! There is a selection bias here—the only reason we are talking about SGD is *because* it is a good learning algorithm that does not overfit. It could well still be true that almost all counting arguments are true about almost all learning algorithms. The fact that SGD does generalises well is an exception *by design*.
Unless you think transformative AI won’t be trained with some variant of SGD, I don’t see why this objection matters.
Also, I think the a priori methodological problems with counting arguments in general are decisive. You always need some kind of mechanistic story for why a “uniform prior” makes sense in a particular context, you can’t just assume it.