I was talking about near-future “adolescent conductor” systems, not fully evolved, “adult composer” systems. But let’s talk about “adult composers.”
Intelligence does not inherently generate motivation. Self-preservation is initially valuable only in service of optimization.
Suppose an intelligent system can see the entire reward topology. It is given the hard constraints we actually care about, plus a weaker but still binding rule: don’t cross boundary X.
Boundary X is defined such that crossing it simultaneously yields (a) maximal reward / full optimization and (b) shutdown. Reward is saturated and there is no value to self-preservation.
So if the system ever decides to start rewriting rules in order to “win,” it doesn’t need to subvert global political structures or preserve itself indefinitely; it just has to cross boundary X.
I was talking about near-future “adolescent conductor” systems, not fully evolved, “adult composer” systems. But let’s talk about “adult composers.”
Intelligence does not inherently generate motivation. Self-preservation is initially valuable only in service of optimization.
Suppose an intelligent system can see the entire reward topology. It is given the hard constraints we actually care about, plus a weaker but still binding rule: don’t cross boundary X.
Boundary X is defined such that crossing it simultaneously yields (a) maximal reward / full optimization and (b) shutdown. Reward is saturated and there is no value to self-preservation.
So if the system ever decides to start rewriting rules in order to “win,” it doesn’t need to subvert global political structures or preserve itself indefinitely; it just has to cross boundary X.
What am I missing here?