The Chinchilla scaling laws would predict faster progress.
(But we wouldn’t observe that on these graphs because they weren’t trained Chinchilla-style, of course.)
(But we wouldn’t observe that on these graphs because they weren’t trained Chinchilla-style, of course.)