asyncio is very important to learn for empirical LLM research since it usually involves many concurrent API calls
I’ve lots of asyncio experience, but I’ve never seen a reason to use it for concurrent API calls, because concurrent.futures, especially ThreadPoolExecutor, work as well for concurrent API calls and are more convenient than asyncio (you don’t need await, you don’t need the loop etc).
Am I missing something? Or is this just a matter of taste?
I recently switched from using threads to using asyncio, even though I had never used asyncio before.
It was a combination of:
Me using cheaper “batch” LLM API calls, which can take hours to return a result
Therefore wanting to run many thousands of tasks in parallel from within one program (to make up for the slow sequential speed of each task)
But at some point, the thread pool raised a generic “can’t start a new thread” exception, without giving too much more information. It must have hit a limit somewhere (memory? hardcoded thread limit?), although I couldn’t work out where.
Maybe the general point is that threads have more overhead, and if you’re doing many thousands of things in parallel, asyncio can handle it more reliably.
Threads are managed by the OS and each thread has an overhead in starting up/switching. The asyncio coroutines are more lightweight since they are managed within the Python runtime (rather than OS) and share the memory within the main thread. This allows you to use tens of thousands of async coroutines, which isn’t possible with threads AFAIK. So I recommend asyncio for LLM API calls since often, in my experience, I need to scale up to thousands of concurrents. In my opinion, learning about asyncio is a very high ROI for empirical research.
I have one question:
I’ve lots of
asyncio
experience, but I’ve never seen a reason to use it for concurrent API calls, because concurrent.futures, especially ThreadPoolExecutor, work as well for concurrent API calls and are more convenient thanasyncio
(you don’t needawait
, you don’t need the loop etc).Am I missing something? Or is this just a matter of taste?
I recently switched from using threads to using asyncio, even though I had never used asyncio before.
It was a combination of:
Me using cheaper “batch” LLM API calls, which can take hours to return a result
Therefore wanting to run many thousands of tasks in parallel from within one program (to make up for the slow sequential speed of each task)
But at some point, the thread pool raised a generic “can’t start a new thread” exception, without giving too much more information. It must have hit a limit somewhere (memory? hardcoded thread limit?), although I couldn’t work out where.
Maybe the general point is that threads have more overhead, and if you’re doing many thousands of things in parallel, asyncio can handle it more reliably.
Threads are managed by the OS and each thread has an overhead in starting up/switching. The asyncio coroutines are more lightweight since they are managed within the Python runtime (rather than OS) and share the memory within the main thread. This allows you to use tens of thousands of async coroutines, which isn’t possible with threads AFAIK. So I recommend asyncio for LLM API calls since often, in my experience, I need to scale up to thousands of concurrents. In my opinion, learning about asyncio is a very high ROI for empirical research.