ryan_greenblatt comments on What will GPT-2030 look like?

ryan_greenblatt 30 Jan 2024 19:13 UTC
LW: 2 AF: 2
0
AF
As far as inference speeds, it’s worth noting that OpenAI inference speeds can vary substantially and tend to decrease over time after the release of a new model.

See Fabien’s lovely website for results over time.

In particular, if we look at GPT-4-1106-preview, the results indicate that it was temporarily at around 3000 tokens/minute shortly after release, but more recent speeds are only around 1200 tokens/minute.

Similarly, GPT-3.5-turbo-1106 has speeds of about 6000 tokens/minute at the time of release, but this has decreased to more like 3000 tokens/minute more recently.