because the cost and complexity to keep models available publicly for inference scales roughly linearly with the number of models we serve.
What with OpenAI having O(100) models in the API?
What with OpenAI having O(100) models in the API?