As far as inference speeds, it’s worth noting that OpenAI inference speeds can vary substantially and tend to decrease over time after the release of a new model.
In particular, if we look at GPT-4-1106-preview, the results indicate that it was temporarily at around 3000 tokens/minute shortly after release, but more recent speeds are only around 1200 tokens/minute.
Similarly, GPT-3.5-turbo-1106 has speeds of about 6000 tokens/minute at the time of release, but this has decreased to more like 3000 tokens/minute more recently.
As far as inference speeds, it’s worth noting that OpenAI inference speeds can vary substantially and tend to decrease over time after the release of a new model.
See Fabien’s lovely website for results over time.
In particular, if we look at GPT-4-1106-preview, the results indicate that it was temporarily at around 3000 tokens/minute shortly after release, but more recent speeds are only around 1200 tokens/minute.
Similarly, GPT-3.5-turbo-1106 has speeds of about 6000 tokens/minute at the time of release, but this has decreased to more like 3000 tokens/minute more recently.