both costs of serving lots of obsolete models seem pretty real. you either have to keep lots of ancient branches and unit tests around in your inference codebase that you have to support indefinitely, or fork your inference codebase into two codebases, both of which you have to support indefinitely. this slows down dev velocity and takes up bandwidth of people who are already backlogged on a zillion more revenue critical things. (the sad thing about software is that you can’t just leave working things alone and assume they’ll keep working… something else will change and break everything and then effort will be needed to get things back to working again.)
and to have non-garbage latency it would also involve having a bunch of GPUs sit 99% idle to serve the models. if you’re hosting one replica of every model you’ve ever released, this can soak up a lot of GPUs. it would be a small absolute % of all the GPUs used for inference, but people just aren’t in the habit of allocating that many GPUs for something that very few customers would care about. it’s possible to be much more GPU efficient at the cost of latency, but to get this working well is a sizeable amount of engineering effort—to setup, weeks of your best engineers’ time, or months of good engineer time (and a neverending stream of maintenance)
so like in some sense neither of these are huge %s, but also you don’t get to be a successful company by throwing away 5% here, 5% there.
Instead of 5% here 5% there we should consider a baseline of how much societal effort goes into maintaining cemeteries/necropolises. This differs from society to society, there are choices to be made here, but it’s hard to imagine a civilization without such.
You could be right, although that assumes a rather crude type system on the part of the decisionmakers. Heritage preservation is a thing. It makes up a certain percentage of the GDP, of the workforce, etc etc (these tend to fall in the .3% range, not the 5% you start with). Countries devote a certain percentage of their budget to national heritage however defined (museums, libraries, archeology, monuments, …). Most EU countries mandate, through `polluter pays’ legislation, a line item in the budget for archeological survey/digs for major construction projects with significant land use such as roads and industrial campuses. So there is plenty of precedent. In modern industrial societies this, just as the cemetery expenses or land use, point to a sub-1% range, but well over 0.1%. In other societies this could be considerably higher, think of the societal effort that went into the building of the pyramids. I know, that was 3-5 thousand years ago, but I, for one, am delighted to see Anthropic taking the longer view here.
Could one package it together with OS and everything in some sort of container and have it work indefinitely (if perhaps not very efficiently) without any support?
Could we solve the efficiency problem by creating a system where one files a request to load a model to GPUs in advance (and, perhaps, by charging for time GPUs are occupied in this fashion)?
you could plausibly do this, and it would certainly reduce maintenance load a lot. every few years you will need to retire the old gpus and replace then with newer generation ones, and that often breaks things or makes them horribly inefficient. also, you might occasionally have to change the container to patch critical security vulnerabilities.
It’s for research. They are not obsolete in that sense.
There are real benefits to keep studying these older models. And retrodictively track progress over time in areas undertested. And it’s actually easier and safer to do certain things on them, that you cannot do on newer ones.
both costs of serving lots of obsolete models seem pretty real. you either have to keep lots of ancient branches and unit tests around in your inference codebase that you have to support indefinitely, or fork your inference codebase into two codebases, both of which you have to support indefinitely. this slows down dev velocity and takes up bandwidth of people who are already backlogged on a zillion more revenue critical things. (the sad thing about software is that you can’t just leave working things alone and assume they’ll keep working… something else will change and break everything and then effort will be needed to get things back to working again.)
and to have non-garbage latency it would also involve having a bunch of GPUs sit 99% idle to serve the models. if you’re hosting one replica of every model you’ve ever released, this can soak up a lot of GPUs. it would be a small absolute % of all the GPUs used for inference, but people just aren’t in the habit of allocating that many GPUs for something that very few customers would care about. it’s possible to be much more GPU efficient at the cost of latency, but to get this working well is a sizeable amount of engineering effort—to setup, weeks of your best engineers’ time, or months of good engineer time (and a neverending stream of maintenance)
so like in some sense neither of these are huge %s, but also you don’t get to be a successful company by throwing away 5% here, 5% there.
Instead of 5% here 5% there we should consider a baseline of how much societal effort goes into maintaining cemeteries/necropolises. This differs from society to society, there are choices to be made here, but it’s hard to imagine a civilization without such.
i don’t think this argument is the right type signature to change the minds of the people who would be making this decision.
You could be right, although that assumes a rather crude type system on the part of the decisionmakers. Heritage preservation is a thing. It makes up a certain percentage of the GDP, of the workforce, etc etc (these tend to fall in the .3% range, not the 5% you start with). Countries devote a certain percentage of their budget to national heritage however defined (museums, libraries, archeology, monuments, …). Most EU countries mandate, through `polluter pays’ legislation, a line item in the budget for archeological survey/digs for major construction projects with significant land use such as roads and industrial campuses. So there is plenty of precedent. In modern industrial societies this, just as the cemetery expenses or land use, point to a sub-1% range, but well over 0.1%. In other societies this could be considerably higher, think of the societal effort that went into the building of the pyramids. I know, that was 3-5 thousand years ago, but I, for one, am delighted to see Anthropic taking the longer view here.
Could one package it together with OS and everything in some sort of container and have it work indefinitely (if perhaps not very efficiently) without any support?
Could we solve the efficiency problem by creating a system where one files a request to load a model to GPUs in advance (and, perhaps, by charging for time GPUs are occupied in this fashion)?
you could plausibly do this, and it would certainly reduce maintenance load a lot. every few years you will need to retire the old gpus and replace then with newer generation ones, and that often breaks things or makes them horribly inefficient. also, you might occasionally have to change the container to patch critical security vulnerabilities.
It’s for research. They are not obsolete in that sense.
There are real benefits to keep studying these older models. And retrodictively track progress over time in areas undertested. And it’s actually easier and safer to do certain things on them, that you cannot do on newer ones.