[Question] What are principled ways for penalising complexity in practice?


Previously I asked about Solomonoff induction but essentially I asked the wrong question. Richard_Kennaway pointed me in the direction of an answer to the question which I should have asked but after investigating I still had questions.

So:

If one has 2 possible models to fit to a data set, by how much should one penalise the model which has an additional free parameter?

A couple of options which I came across were:

AIC, which has a flat facter of e penalty for each additional parameter.

BIC, which has a factor of √n penalty for each additional parameter.

where n is the number of data points.

On the one hand having a penalty which increases with n makes sense—a useful additional parameter should be able to provide more evidence the more data you have. On the other hand, having a penalty which increases with n means your prior will be different depending on the number of data points which seems wrong.

So, count me confused. Maybe there are other options which are more helpful. I don’t know if the answer is too complex for a blog post but, if so, any suggestions of good text books on the subject would be great.

EDIT: johnswentworth has written a sequence which expands on the answer which he gives below.

No comments.