I buy that 1 and 4 is the case, combined with Deepseek probably being satisfied that GPT-4-level models were achieved.
Edit: I did not mean to imply that GPT-4ish neighbourhood is where LLM pretraining plateaus at all, @Thane Ruthenis.
I buy that 1 and 4 is the case, combined with Deepseek probably being satisfied that GPT-4-level models were achieved.
Edit: I did not mean to imply that GPT-4ish neighbourhood is where LLM pretraining plateaus at all, @Thane Ruthenis.