Could one package it together with OS and everything in some sort of container and have it work indefinitely (if perhaps not very efficiently) without any support?
Could we solve the efficiency problem by creating a system where one files a request to load a model to GPUs in advance (and, perhaps, by charging for time GPUs are occupied in this fashion)?
you could plausibly do this, and it would certainly reduce maintenance load a lot. every few years you will need to retire the old gpus and replace then with newer generation ones, and that often breaks things or makes them horribly inefficient. also, you might occasionally have to change the container to patch critical security vulnerabilities.
Could one package it together with OS and everything in some sort of container and have it work indefinitely (if perhaps not very efficiently) without any support?
Could we solve the efficiency problem by creating a system where one files a request to load a model to GPUs in advance (and, perhaps, by charging for time GPUs are occupied in this fashion)?
you could plausibly do this, and it would certainly reduce maintenance load a lot. every few years you will need to retire the old gpus and replace then with newer generation ones, and that often breaks things or makes them horribly inefficient. also, you might occasionally have to change the container to patch critical security vulnerabilities.