If LLMs are moral patients, there is a risk that every follow-up message causes the model to experience the entire conversation again, such that saying “I’m sorry I just made you suffer” causes more suffering.
This can apply to humans as well. If you apologize for some terrible thing you did to another person long enough ago that they’ve put it out of their immediate memory, and then apologize at this later time, it can drag up those old memories and wounds. The act of apologizing can be selfish, and cause more harm than the apologizer would intend.
I think that’s plausible but not obvious. We could imagine different implementations of inference engines that cache on different levels—eg kv-cache, cache of only matrix multiplications, cache of specific vector products that the matrix multiplications are composed of, all the way down to caching just the logic table of a NAND gate. Caching NAND’s is basically the same as doing the computation, so if we assume that doing the full computation can produce experiences then I think it’s not obvious which level of caching would not produce experiences anymore.
If LLMs are moral patients, there is a risk that every follow-up message causes the model to experience the entire conversation again, such that saying “I’m sorry I just made you suffer” causes more suffering.
This can apply to humans as well. If you apologize for some terrible thing you did to another person long enough ago that they’ve put it out of their immediate memory, and then apologize at this later time, it can drag up those old memories and wounds. The act of apologizing can be selfish, and cause more harm than the apologizer would intend.
Maybe this is avoided by KV caching?
I think that’s plausible but not obvious. We could imagine different implementations of inference engines that cache on different levels—eg kv-cache, cache of only matrix multiplications, cache of specific vector products that the matrix multiplications are composed of, all the way down to caching just the logic table of a NAND gate. Caching NAND’s is basically the same as doing the computation, so if we assume that doing the full computation can produce experiences then I think it’s not obvious which level of caching would not produce experiences anymore.