Note that the gpt-4 paper predicted the performance of gpt-4 from 1000x scaled down experiments!
Do you think they knew of GPT-4.5’s performance before throwing so much compute at it and eventually turning into a failure? I’m sure they ran a lot of scaled down experiments for GPT-4.5 too!
I heard some rumors that gpt 4.5 got good pretraining loss but bad downstream performance. If that’s true the loss scaling laws may have worked correctly. If not, yeah a lot of things can go wrong and something did, whether that’s hardware issues, software bugs, or machine learning problems or problems with their earlier experiments
Do you think they knew of GPT-4.5’s performance before throwing so much compute at it and eventually turning into a failure? I’m sure they ran a lot of scaled down experiments for GPT-4.5 too!
I heard some rumors that gpt 4.5 got good pretraining loss but bad downstream performance. If that’s true the loss scaling laws may have worked correctly. If not, yeah a lot of things can go wrong and something did, whether that’s hardware issues, software bugs, or machine learning problems or problems with their earlier experiments