[Linkpost] Scaling Laws for Generative Mixed-Modal Language Models

In this paper authors explore the scaling properties of mixed-modal generative models, discovering new scaling laws that unify the contributions of individual modalities and the interactions between them. I find most interesting that they have found so-called competition barrier—when training with multiple modalities, after a certain number of parameters/data, the loss is smaller than if the modalities were trained independently. This seems to predict cross-modal transfer that was sought after but not found (yet) with GATO.