“The system executes code in a Docker container with strict resource limits (TITAN V GPU with 12GB memory, 600-second timeout). This ensures fair comparison between models and tests their ability to work within realistic constraints.”
How can you run llama-3.1/3.3-70b models with 12GB vram?
The LLMs are presented with the ML task and they write python code to solve the ML task. This python code is what is run in the isolated docker with 12GB memory.
So the LLMs themselves are not run on the TITAN V, they are mostly called through an API. Although I did in fact run a bunch of the LLMs locally through ollama, just not on the TITAN V server, but a larger one.
Hi. A question here.
“The system executes code in a Docker container with strict resource limits (TITAN V GPU with 12GB memory, 600-second timeout). This ensures fair comparison between models and tests their ability to work within realistic constraints.”
How can you run llama-3.1/3.3-70b models with 12GB vram?
The LLMs are presented with the ML task and they write python code to solve the ML task. This python code is what is run in the isolated docker with 12GB memory.
So the LLMs themselves are not run on the TITAN V, they are mostly called through an API. Although I did in fact run a bunch of the LLMs locally through ollama, just not on the TITAN V server, but a larger one.
Thanks for the clarification.