Second, because even if we really did try to approximate Team Manipulation’s effects using abstract reasoning, it’s (even theoretically) hard to figure out what effects those would be without running their Turing machine (and all its similarly-short but similarly-expensive relatives) to find out.
Why it’s hard? I mean if we explicitly code “expect simple things” into AI, wouldn’t it figure out what’s the most probable kind of things for the Team Manipulation to do? Like we can speculate about what they would do.
The beings inside Team Manipulation aren’t going to to be simple creatures with simple desires just because they live in a simple universe. I mean, we live in a simple universe, and look at us!
But more importantly, the relationship of what they want to what effect that will have on the output of the Solomonoff oracle is hard to figure out without actually simulating them simulating us. Most of the time it’s just a string of bits with unclear long-term impact.
But now that you mention it, maybe there are edge cases. If I built a universe-destroying bomb, and rigged it to go off if the Solomonoff oracle output a 1, maybe we can expect them to not set off the bomb (unless they are actively competitive with other universes, or think it would be funny, or don’t gain much from influencing our universe...).
Why it’s hard? I mean if we explicitly code “expect simple things” into AI, wouldn’t it figure out what’s the most probable kind of things for the Team Manipulation to do? Like we can speculate about what they would do.
The beings inside Team Manipulation aren’t going to to be simple creatures with simple desires just because they live in a simple universe. I mean, we live in a simple universe, and look at us!
But more importantly, the relationship of what they want to what effect that will have on the output of the Solomonoff oracle is hard to figure out without actually simulating them simulating us. Most of the time it’s just a string of bits with unclear long-term impact.
But now that you mention it, maybe there are edge cases. If I built a universe-destroying bomb, and rigged it to go off if the Solomonoff oracle output a 1, maybe we can expect them to not set off the bomb (unless they are actively competitive with other universes, or think it would be funny, or don’t gain much from influencing our universe...).