nostalgebraist comments on [Link] Training Compute-Optimal Large Language Models