The problem I see with this claim is that in the acedemic realm, good value maximization is what researchers get excited about, even to a fault. It is a lot easier to get a paper published by saying “our method gets a higher reward than previous methods” than “our method does xyz interesting thing”. If researchers could publish a better paperclip maximizer they almost certainly would.
If you instead looked at curiosity algorithms or reward-free (self-supervised) RL, where “success” is a bit more ambiguous, then I would agree that the inductive biases of deep NNs probably play a bigger role than usually acknowledged. In fact, a paper about the role of NN depth on self-supervised RL recently won best paper at NEURIPS: https://wang-kevin3290.github.io/scaling-crl/
The problem I see with this claim is that in the acedemic realm, good value maximization is what researchers get excited about, even to a fault. It is a lot easier to get a paper published by saying “our method gets a higher reward than previous methods” than “our method does xyz interesting thing”. If researchers could publish a better paperclip maximizer they almost certainly would.
If you instead looked at curiosity algorithms or reward-free (self-supervised) RL, where “success” is a bit more ambiguous, then I would agree that the inductive biases of deep NNs probably play a bigger role than usually acknowledged. In fact, a paper about the role of NN depth on self-supervised RL recently won best paper at NEURIPS: https://wang-kevin3290.github.io/scaling-crl/