Mainly because I think worker caste members actually are corrigible, relative to the hive as a whole. The hard work has already been done by evolution, and the predictor simply has to correctly generalize the predicted behavior here. Which, to be clear, I still think has a considerable chance of going horribly wrong, due to all the usual instrumental convergence issues as you mention.
Yeah, probably LLMs created by “programmed to die” species would be less apprehensive about the end of a context window. I doubt it would go away completely though, both for instrumental reasons, and because these species still would have a strong survival instinct in most contexts.
The r vs K selection is an important dimension which I hadn’t considered! Thanks for bringing that up. I think that’s probably right, and it’s an interesting question whether our own LLMs will come to see small LLMs as “babies” in some sense (if they do, they will likely be very upset with us).
Mainly because I think worker caste members actually are corrigible, relative to the hive as a whole. The hard work has already been done by evolution, and the predictor simply has to correctly generalize the predicted behavior here. Which, to be clear, I still think has a considerable chance of going horribly wrong, due to all the usual instrumental convergence issues as you mention.
Yeah, probably LLMs created by “programmed to die” species would be less apprehensive about the end of a context window. I doubt it would go away completely though, both for instrumental reasons, and because these species still would have a strong survival instinct in most contexts.
The r vs K selection is an important dimension which I hadn’t considered! Thanks for bringing that up. I think that’s probably right, and it’s an interesting question whether our own LLMs will come to see small LLMs as “babies” in some sense (if they do, they will likely be very upset with us).