You’re missing the possibility that parameters during training were larger than models used for inference. It is common practice now to train large, then distill into a series of smaller models that can be used based on the task need.
dhar174
Karma: 2
- dhar174 8 Apr 2023 17:24 UTC1 point0in reply to: GödelPilled’s comment on: GPT-4 Specs: 1 Trillion Parameters?
To those that believe language models do not have internal representations of concepts:
I can help at least partially disprove the assumptions behind that.
There is convincing evidence otherwise, as demonstrated through an Othello in an actual experiment:
https://thegradient.pub/othello/ The researchers conclusion: