At ICML there were many papers that seemed well motivated and had deep models, probably well over 5%. So the skill of having deep models is not limited to visionaries like Bengio.
To be clear, I would also expect “well over 5%”. 10-20% feels about right. When I said in the OP that the median researcher lacks deep models, I really did mean the median, I was not trying to claim 90%+.
Re: the TRPO vs PPO example, I don’t think this is getting at the thing the OP is intended to be about. It’s not about how “well-justified” a technique is mathematically. It’s about models of what’s going wrong—in this case, something to do with large update steps messing things up. Like, imagine someone who sees their training run mysteriously failing and starts babbling random things like “well, maybe it’s getting stuck in local minima”, “maybe the network needs to be bigger”, “maybe I should adjust some hyperparameters”, and they try all these random things but they don’t have any way to go figure out what’s causing the problem, they just fiddle with whatever knobs are salient and available. That person probably never figures out TRPO or PPO, because they don’t figure out that too-large update steps are causing problems.
To be clear, I would also expect “well over 5%”. 10-20% feels about right. When I said in the OP that the median researcher lacks deep models, I really did mean the median, I was not trying to claim 90%+.
Re: the TRPO vs PPO example, I don’t think this is getting at the thing the OP is intended to be about. It’s not about how “well-justified” a technique is mathematically. It’s about models of what’s going wrong—in this case, something to do with large update steps messing things up. Like, imagine someone who sees their training run mysteriously failing and starts babbling random things like “well, maybe it’s getting stuck in local minima”, “maybe the network needs to be bigger”, “maybe I should adjust some hyperparameters”, and they try all these random things but they don’t have any way to go figure out what’s causing the problem, they just fiddle with whatever knobs are salient and available. That person probably never figures out TRPO or PPO, because they don’t figure out that too-large update steps are causing problems.