[Question] Does the lottery ticket hypothesis suggest the scaling hypothesis?

The lot­tery ticket hy­poth­e­sis, as I (vaguely) un­der­stand it, is that ar­tifi­cial neu­ral net­works tend to work in the fol­low­ing way: When the net­work is ran­domly ini­tial­ized, there is a sub-net­work that is already de­cent at the task. Then, when train­ing hap­pens, that sub-net­work is re­in­forced and all other sub-net­works are damp­ened so as to not in­terfere.

By the scal­ing hy­poth­e­sis I mean that in the next five years, many other ar­chi­tec­tures be­sides the trans­former will also be shown to get sub­stan­tially bet­ter as they get big­ger. I’m also in­ter­ested in defin­ing it differ­ently, as what­ever Gw­ern is talk­ing about.