important to note that gpt4 is more like 300x scale equivalent of gpt3, not 100x, based on gpt4 being trained with (rumored) 2e25 flops vs contemporary gpt3-level models (llama2-7b) being trained on 8e22 flops ( 250 times the compute for that particular pair)
important to note that gpt4 is more like 300x scale equivalent of gpt3, not 100x, based on gpt4 being trained with (rumored) 2e25 flops vs contemporary gpt3-level models (llama2-7b) being trained on 8e22 flops ( 250 times the compute for that particular pair)