Dr_Manhattan comments on Learning from Human Preferences—from OpenAI (including Christiano, Amodei & Legg)

Dr_Manhattan 13 Jun 2017 17:58 UTC
1 point
0

I know I sound like a retrograde, but how much of that is necessary and how much can be figured out from first principles?

My 2c is some of the hyperparameters can only be determined empirically in current practice and make all the difference (e.g. learning rate).

Other parameters are just “things that happened to work, many other things could have”, (like 84x84, convolution sizes) and are not actually that important.