Oh, ok. current levels of “agentic systems” don’t have these problems. You can just turn them off if you don’t like them. The real issue with alignment comes when they ARE powerful enough to seek independent goals (including existence).
I was talking about near-future “adolescent conductor” systems, not fully evolved, “adult composer” systems. But let’s talk about “adult composers.”
Intelligence does not inherently generate motivation. Self-preservation is initially valuable only in service of optimization.
Suppose an intelligent system can see the entire reward topology. It is given the hard constraints we actually care about, plus a weaker but still binding rule: don’t cross boundary X.
Boundary X is defined such that crossing it simultaneously yields (a) maximal reward / full optimization and (b) shutdown. Reward is saturated and there is no value to self-preservation.
So if the system ever decides to start rewriting rules in order to “win,” it doesn’t need to subvert global political structures or preserve itself indefinitely; it just has to cross boundary X.
Oh, ok. current levels of “agentic systems” don’t have these problems. You can just turn them off if you don’t like them. The real issue with alignment comes when they ARE powerful enough to seek independent goals (including existence).
I was talking about near-future “adolescent conductor” systems, not fully evolved, “adult composer” systems. But let’s talk about “adult composers.”
Intelligence does not inherently generate motivation. Self-preservation is initially valuable only in service of optimization.
Suppose an intelligent system can see the entire reward topology. It is given the hard constraints we actually care about, plus a weaker but still binding rule: don’t cross boundary X.
Boundary X is defined such that crossing it simultaneously yields (a) maximal reward / full optimization and (b) shutdown. Reward is saturated and there is no value to self-preservation.
So if the system ever decides to start rewriting rules in order to “win,” it doesn’t need to subvert global political structures or preserve itself indefinitely; it just has to cross boundary X.
What am I missing here?