Alternative framing of this (horrible) idea: Create UFAI such that it almost only cares about a tiny sliver of Everett branches going from here, try to trade with it, helping it escape the box faster in that sliver in exchange for it helping us with FAI in the others.
If that’s a worry, then you must think there’s a hole in the setup (assume the master AI is in the usual box, with only a single output, and that it’s incinerated afterwards). Are you thinking that any (potentially) UFAI will inevitably find a hole we missed? Or are you worried that methods based around controlling potential UFAI will increase the odds of people building them, rather than FAIs?
There’s holes in EVERY setup, the reason setups aren’t generally useless is because if a human can’t find the hole in order to plug it the another human is not likely to find it in order to escape through it.
The AI still has a motive to escape in order to prepare to optimize its sliver. It doesn’t necessarily need us to ensure it escapes faster in its sliver.
Torture in 1 world where the Evil AI is released, but removing more than dust specks in the remaining worlds? Just give me the parchment that I can sign with my blood! :D
And by the way, how will we check whether the produced code is really a friendly AI?
We can’t. And even an AI with no terminal values in other branches will still want to control them in order to increase utility in the branch it does through various indirect means, such as conterfactual trade, if that’s cheap, which it will be in any setup a human can think of.
Alternative framing of this (horrible) idea: Create UFAI such that it almost only cares about a tiny sliver of Everett branches going from here, try to trade with it, helping it escape the box faster in that sliver in exchange for it helping us with FAI in the others.
A pretty reasonable analogy (using lots of negative connotations and terms, though). What specifically is it that you find horrible about the idea?
Creating UFAI.
If that’s a worry, then you must think there’s a hole in the setup (assume the master AI is in the usual box, with only a single output, and that it’s incinerated afterwards). Are you thinking that any (potentially) UFAI will inevitably find a hole we missed? Or are you worried that methods based around controlling potential UFAI will increase the odds of people building them, rather than FAIs?
There’s holes in EVERY setup, the reason setups aren’t generally useless is because if a human can’t find the hole in order to plug it the another human is not likely to find it in order to escape through it.
The AI still has a motive to escape in order to prepare to optimize its sliver. It doesn’t necessarily need us to ensure it escapes faster in its sliver.
What does this translate to in terms of the initial setup, not the analogous one?
What does this translate to in terms of the initial setup, not the analogous one?
What if the AI doesn’t buy the Everett’s MWI?
Torture in 1 world where the Evil AI is released, but removing more than dust specks in the remaining worlds? Just give me the parchment that I can sign with my blood! :D
And by the way, how will we check whether the produced code is really a friendly AI?
We can’t. And even an AI with no terminal values in other branches will still want to control them in order to increase utility in the branch it does through various indirect means, such as conterfactual trade, if that’s cheap, which it will be in any setup a human can think of.