I claim that surviving and colonizing the galaxy are rather instrumentally-convergent, and therefore I’m not surprised that (most?) humans want that.
By the same token, if our success criterion were to make an AGI which will robustly pursue a goal that just so happens to align with a convergent instrumental subgoal for that AGI, then I would feel very much more optimistic about that happening.
Arguing about what is the true nature of natural selection seems a bit pointless to me, in this context. In RL, we often talk about how it can be ambiguous how to generalize the reward function out of training distribution. Historically / “in the training distribution”, the only things that got rewarded in animal evolution were genes made of DNA. If future transhumans replace their DNA with some other nanotech thing, should we view that as “scoring high” on the same “reward function” that was used historically? Seems like a question that doesn’t have a right or wrong answer. I can say “the thing that was happening historically was optimizing genes made of DNA, and those future transhumans will fail on that metric”, or I can say “the thing that was happening historically was optimizing genes made of any kind of nanotech, and those future transhumans will succeed on that metric”. Those are two incompatible ways to generalize from the actual history of rewards, and I don’t think there’s a right answer for which generalization is “correct”.
I claim that surviving and colonizing the galaxy are rather instrumentally-convergent, and therefore I’m not surprised that (most?) humans want that.
By the same token, if our success criterion were to make an AGI which will robustly pursue a goal that just so happens to align with a convergent instrumental subgoal for that AGI, then I would feel very much more optimistic about that happening.
Arguing about what is the true nature of natural selection seems a bit pointless to me, in this context. In RL, we often talk about how it can be ambiguous how to generalize the reward function out of training distribution. Historically / “in the training distribution”, the only things that got rewarded in animal evolution were genes made of DNA. If future transhumans replace their DNA with some other nanotech thing, should we view that as “scoring high” on the same “reward function” that was used historically? Seems like a question that doesn’t have a right or wrong answer. I can say “the thing that was happening historically was optimizing genes made of DNA, and those future transhumans will fail on that metric”, or I can say “the thing that was happening historically was optimizing genes made of any kind of nanotech, and those future transhumans will succeed on that metric”. Those are two incompatible ways to generalize from the actual history of rewards, and I don’t think there’s a right answer for which generalization is “correct”.