If I understand your idea, you propose that new people will try to think of new ideas, and when they say “How about A?”, someone more “mature” says, “No, that won’t work because of X”, then they say “How about B?”, and get the response “No, that won’t work because of Y”, and so forth, until finally they say “How about Q?”, and Q is something no-one has thought of before, and so is worth investigating.
It could be that a new Q is what’s needed. But might it instead be that “won’t work because of Y” is flawed, and what is needed is someone who can see that flaw? It doesn’t seem like this proposal would encourage discovery of such a flaw, once the new person is accustomed to listening to the “mature” person’s dismissal of “non-working” ideas.
This seems like it might be a situation where personal interaction is counterproductive. Of course the new person should learn something about past work. But it’s easier to question that past work, and persist in trying to think of how to make B work, when the dismissals of B as not workable are in papers one is reading, rather than in personal conversation with a mentor.
The research community is very far from being efficient.
One of my own fields of research is Markov chain Monte Carlo methods, and their applications in computations for Bayesian models. Markov chain Monte Carlo (MCMC) was invented in the early 1950s, for use in statistical physics. It was not used by Bayesian statisticians until around 1990. There was no reason that it could not have been used before then—the methods of the 1950s could have been directly applied to many Bayesian inference problems.
In 1970, a paper generalizing the most common MCMC algorithm (the “Metropolis” method) was published in Biometrika, one of the top statistics journals. This didn’t prompt anyone to start using it for Bayesian inference.
In the early 1980s, MCMC was used by some engineers and computer scientists (eg, by Geoffrey Hinton for maximum likelihood inference for log-linear models with latent variables, also known as “Boltzmann machines”). This also didn’t prompt anyone to start using it for Bayesian inference.
After a form of MCMC starting being used by Bayeian statisticians around 1990, it took many years for the literature on MCMC methods used by physicists to actually be used by statisticians. This despite the fact that I wrote a review paper describing just about all these methods in terms readily accessible to statisticians in 1993.
In 1992, I started using the Hamiltonian Monte Carlo method (aka, hybrid Monte Carlo, or HMC) for Bayesian inference for neural network models. This method was invented by physicists in 1987. (It could have been invented in the 1950s, but just wasn’t.) I demonstrated that HMC was often hundreds or thousands of times faster than simpler methods, gave talks on this at conferences, and wrote my thesis (later book) on Bayesian learning in which this was a major theme. It wasn’t much used by other statisticians until after I wrote another review paper in 2010, which for some reason led to it catching on. It is now widely used in packages such as Stan.
Another of my research areas is error-correcting codes. In 1948, Claude Shannon proved his noisy coding theorem, establishing the theoretical (but not practical) limits of error correction. In 1963, Robert Gallager invented Low Density Parity Check (LDPC) codes. For many years after this, standard texbooks stated that the theoretical limit proved to be possible by Shannon was unlikely to ever be closely approached by codes with practical encoding and decoding algorithms. In 1996, David MacKay and I showed that a slight variation on Gallager’s LDPC codes comes very close to achieving the Shannon limit on performance. (A few years before then, “Turbo codes” had achieved similar performance.) These and related codes are now very widely used.
These are examples of good ideas that took far longer to be widely used than one would expect in an efficient research community. There are also many bad ideas that persist for far longer than they should.
I think both problems are at least partially the result of perverse incentives of researchers.
Lots of research is very incremental—what you describe as ”...there was instantly an explosion of activity as researchers raced to apply it to all the important NLP problems and be the first to publish”. Sometimes, of course, this explosion of activity is useful. But often it is not—the idea isn’t actually very good, it’s just the sort of idea on which it is easy to write more and more papers, often precisely because it isn’t very good. And sometimes this explosion of activity doesn’t happen when it would have been useful, because the activity required is not the sort that leads to easy papers—eg, the needed activity is to apply the idea to practical problems, but that isn’t the “novel” research that leads to tenure, or the idea requires learning some new tools and that’s too much trouble, or the way forward is by messy empirical work that doesn’t look as impressive as proving theorems (even if the theorems are actually pointless), or extending an idea that someone else came up with doesn’t seem like as good a career move as developing your own ideas (even when your ideas aren’t as good).
The easy rewards from incremental research may mean that researchers don’t spend much, or any, time on thinking about actual original ideas. Getting such ideas may require reading extensively in diverse fields, and getting one’s hands dirty with the low-level work that is necessary to develop real intuition about how things work, and what is important. Academic researchers can’t easily find time for this, and may be forced into either doing incremental research, or becoming research managers rather than actual researchers.
In my case, the best research environment was when I was a PhD student (with Geoffrey Hinton). But I’m not sure things are still as good for PhD students. The level of competition for short-term rewards may be higher than back in the 1990s.