1a3orn comments on Situational Awareness: A One-Year Retrospective

1a3orn 26 Jun 2025 20:21 UTC
7 points
12

Second-order optimisers. Sophia halves steps and total FLOP on GPT-style pre-training, while Lion reports up to 5× savings on JFT, conservatively counted as 1.5–2.

I think it’s kinda commonly accepted wisdom that the heuristic you should have for optimizers that claim savings like this is “They’re probably bullshit,” at least until they get used in big training runs.

Like I don’t have a specific source for this, but a lot of optimizers claiming big savings are out there and few get adopted.