I know I sound like a retrograde, but how much of that is necessary and how much can be figured out from first principles?
My 2c is some of the hyperparameters can only be determined empirically in current practice and make all the difference (e.g. learning rate).
Other parameters are just “things that happened to work, many other things could have”, (like 84x84, convolution sizes) and are not actually that important.
My 2c is some of the hyperparameters can only be determined empirically in current practice and make all the difference (e.g. learning rate).
Other parameters are just “things that happened to work, many other things could have”, (like 84x84, convolution sizes) and are not actually that important.