Ah, Claude helped me remember the historical parallel that serves as an intuition pump: in the early days of the deep learning revolution, Hinton and Bengio found it extremely useful to do unsupervised learning on a network first, before doing supervised learning. The post-unsupervised-learning network ended up in the basin of better local optima because it already represented key concepts.
Analogously, I expect that initializing a RL algorithm with a good predictive network makes it massively better and more efficient.
Ah, Claude helped me remember the historical parallel that serves as an intuition pump: in the early days of the deep learning revolution, Hinton and Bengio found it extremely useful to do unsupervised learning on a network first, before doing supervised learning. The post-unsupervised-learning network ended up in the basin of better local optima because it already represented key concepts.
Analogously, I expect that initializing a RL algorithm with a good predictive network makes it massively better and more efficient.