You aren’t directly paying more money for it pro rata as you would if you were using the API, but you’re getting fewer queries because they rate limit you more quickly for longer conversations because LLM inference is O(n2).
You aren’t directly paying more money for it pro rata as you would if you were using the API, but you’re getting fewer queries because they rate limit you more quickly for longer conversations because LLM inference is O(n2).