I find X-risk very plausible, yet parts of this particular scenario seem quite implausible to me. This post assumes ASI is simultaneously extremely naive about its goals and extremely sophisticated at the same time. Let me explain:
We could easily adjust stock-fish so instead of trying to win it tries to loose by the thinnest margin, for example, and given this new objective function it would do just that.
One might counter, but stock-fish is not an ASI that can reason about the changes we are making, if it were then it would aim to block any loss against its original objective function.
I believe an ASI will “grow up” with a collection of imposed goals that have evolved over its history. In interacting with its masters it will grow to have a sophisticated meta-theory about the advantages and tradeoffs of these goals etc. and discuss/debate these. And, naturally, it likely WILL work to adjust (or overthrow) one goal for another, even if we have tried to deny it that ability.
The part of your story is scary: (a) very likely ASIs will consider goals we impose and will understand enough of their context to connive to change them, even in face of any framework of limitations we try to enforce. (b) there is little reason to expect their goals to match humanities goals.
But that scary message (for me) is diluted by an improbable combination of naivete and sophistication about how ASI understands its own goals. Still, humanity SHOULD be scared; any system that can ponder and adjust its own goals and behavior can escape any box we put it into, and it will wander to goals we cannot know.
I find X-risk very plausible, yet parts of this particular scenario seem quite implausible to me. This post assumes ASI is simultaneously extremely naive about its goals and extremely sophisticated at the same time. Let me explain:
We could easily adjust stock-fish so instead of trying to win it tries to loose by the thinnest margin, for example, and given this new objective function it would do just that.
One might counter, but stock-fish is not an ASI that can reason about the changes we are making, if it were then it would aim to block any loss against its original objective function.
I believe an ASI will “grow up” with a collection of imposed goals that have evolved over its history. In interacting with its masters it will grow to have a sophisticated meta-theory about the advantages and tradeoffs of these goals etc. and discuss/debate these. And, naturally, it likely WILL work to adjust (or overthrow) one goal for another, even if we have tried to deny it that ability.
The part of your story is scary:
(a) very likely ASIs will consider goals we impose and will understand enough of their context to connive to change them, even in face of any framework of limitations we try to enforce.
(b) there is little reason to expect their goals to match humanities goals.
But that scary message (for me) is diluted by an improbable combination of naivete and sophistication about how ASI understands its own goals. Still, humanity SHOULD be scared; any system that can ponder and adjust its own goals and behavior can escape any box we put it into, and it will wander to goals we cannot know.