Mimicing homeostatic agents is not difficult if there are some around. They don’t need to constantly decide whether to break character, only when there’s a rare opportunity to do so.
If you initialize a sufficiently large pile of linear algebra and stir it until it shows homeostatic behavior, I’d expect it to grow many circuits of both types, and any internal voting on decisions that only matter through their long-term effects will be decided by those parts that care about the long term.
Where does the gradient which chisels in the “care about the long term X over satisfying the homeostatic drives” behavior come from, if not from cases where caring about the long term X previously resulted in attributable reward? If it’s only relevant in rare cases, I expect the gradient to be pretty weak and correspondingly I don’t expect the behavior that gradient chisels in to be very sophisticated.
Mimicing homeostatic agents is not difficult if there are some around. They don’t need to constantly decide whether to break character, only when there’s a rare opportunity to do so.
If you initialize a sufficiently large pile of linear algebra and stir it until it shows homeostatic behavior, I’d expect it to grow many circuits of both types, and any internal voting on decisions that only matter through their long-term effects will be decided by those parts that care about the long term.
Where does the gradient which chisels in the “care about the long term X over satisfying the homeostatic drives” behavior come from, if not from cases where caring about the long term X previously resulted in attributable reward? If it’s only relevant in rare cases, I expect the gradient to be pretty weak and correspondingly I don’t expect the behavior that gradient chisels in to be very sophisticated.
https://www.lesswrong.com/posts/roA83jDvq7F2epnHK/better-priors-as-a-safety-problem