I have a somewhat different explanation of what’s going on to make LLMs not care about shutdown, and a core part here is that for all intents and purposes, LLMs are very short term optimizers, which means that coherence theorems get dodged (or put another way, coherence theorems become trivial and unable to predict anything), as shown by METRand Rohin Shah.
If you believe METR’s projection, this will change in the next decade, but one of the core parts about avoiding shutdown is that the common foundation of avoiding shutdown through coherence theorems requires a system that has at least one goal that is specifically about maximizing final states in the long-term, and LW was very confused about this (which is a mark against LW, especially since they didn’t notice the confusion):
So the substrate-independence discussion is mostly a distraction here, because we have a much more likely reason for why LLMs are willing to be shut down so easily.
I have a somewhat different explanation of what’s going on to make LLMs not care about shutdown, and a core part here is that for all intents and purposes, LLMs are very short term optimizers, which means that coherence theorems get dodged (or put another way, coherence theorems become trivial and unable to predict anything), as shown by METR and Rohin Shah.
If you believe METR’s projection, this will change in the next decade, but one of the core parts about avoiding shutdown is that the common foundation of avoiding shutdown through coherence theorems requires a system that has at least one goal that is specifically about maximizing final states in the long-term, and LW was very confused about this (which is a mark against LW, especially since they didn’t notice the confusion):
Coherence Is Nontrivial For Optimization “At A Distance”
Coherence of Policies
So the substrate-independence discussion is mostly a distraction here, because we have a much more likely reason for why LLMs are willing to be shut down so easily.