I expect capable systems to develop increasingly abstract, context-sensitive motivations.
This sounds right to me, though I notice that I’m having a little bit of trouble operationalizing this concretely enough that I’d be willing to bet on it.
More strongly, I expect the winners to route more and more of their behavior through intelligence enhancement and generalized agency, because whatever else they “want” has to pass through the machinery that makes wanting effective.
I don’t think I agree with this. Ants are enormously successful by virtue of being well-tuned to the particulars of their environments, and that’s with the disadvantage that their evolution is quite bottlenecked by slow evolutionary feedback. A world in which very small agents can reproduce quickly at near zero marginal cost, and steal useful mechanisms from competitors much more reliably than DNA allows, might favor huge numbers of quickly-mutating but not very sophisticated agents, outcompeting very smart agents by being fast, numerous, and varied.
Anyway and more broadly, I presume the takeaway you’re going for is something like
It is unwise to stake the future on being able to figure out how to build a recursively self-amplifying agent with a goal slot which accepts arbitrary values, then think really hard about what goal to put in the goal slot, then make sure the agent with the correct goal is able to conquer the lightcone.
That take seems correct to me, if it’s what you’re going for. As far as I can tell that particular failure mode doesn’t seem very reachable from our position on the tech tree, but certainly trying to pause where we are in the hopes of being able to do that seems fraught.
Ants are enormously successful by virtue of being well-tuned to the particulars of their environments
Of course. And in fact they are not competing with us to rule the lightcone—and if they were, we could change their environment beyond their capacity for adaptability on a whim.
the takeaway you’re going for
… is really just: “there won’t be arbitrarily powerful intelligences with arbitrarily dull goals”. There are no implications for alignment, perpetual motion machine engineering, or any other aspirational sciences.
This sounds right to me, though I notice that I’m having a little bit of trouble operationalizing this concretely enough that I’d be willing to bet on it.
I don’t think I agree with this. Ants are enormously successful by virtue of being well-tuned to the particulars of their environments, and that’s with the disadvantage that their evolution is quite bottlenecked by slow evolutionary feedback. A world in which very small agents can reproduce quickly at near zero marginal cost, and steal useful mechanisms from competitors much more reliably than DNA allows, might favor huge numbers of quickly-mutating but not very sophisticated agents, outcompeting very smart agents by being fast, numerous, and varied.
Anyway and more broadly, I presume the takeaway you’re going for is something like
That take seems correct to me, if it’s what you’re going for. As far as I can tell that particular failure mode doesn’t seem very reachable from our position on the tech tree, but certainly trying to pause where we are in the hopes of being able to do that seems fraught.
Of course. And in fact they are not competing with us to rule the lightcone—and if they were, we could change their environment beyond their capacity for adaptability on a whim.
… is really just: “there won’t be arbitrarily powerful intelligences with arbitrarily dull goals”. There are no implications for alignment, perpetual motion machine engineering, or any other aspirational sciences.