Can we create a function that provably predicts the optimization power of intelligences?

Follow up to Efficient Cross-domain Optimization

When I am skeptical that we will ever understand intelligence, I am skeptical that we will ever be able to reliably map a systems description onto its optimization power. This has implications for how well we will create intelligences and how well intelligences will be at self-improving.

Obviously we can’t predict the effectiveness of an arbitrary program, due to rice’s theorem and intelligence being a non-trivial property. So the best we can hope for is predicting the effectiveness of a set of programs. Is such a function possible? This is my take on the subject.

Let o ( p ) be a function that maps a program p to its optimization power.

Mu, Omegas younger brother has a challenge for you, you get to design a system and put it in a box with 20 red and 20 green balls, it will activate itself after 10 minutes and then have the goal of removing as many red balls from the box as possible in 10 minutes. You have to decide how whether it is going to remove more or less than 5 red balls from the box. You get transported to a nirvana if you predict correctly and your world gets turned into paper clips if you get it wrong.

You whip out your trusty o and make a program and the evaluate it using o and bet according to its evaluation.

Unknown to you Mu also has a copy of your o and runs it on the systems you put in the box. Those that return a high value from the optimization power measure, it destroys before they activate, those that have a low effectiveness it performs their goals for them. In the second case it is still p that causes the goal to be fulfilled as if p were different there would be a different amount that the goal is fulfilled. You can see it as inspiring pity in someone else to make them help, who would not have done otherwise. It is still winning.

So Mu forces o to be wrong, so o was not the reliable predictor of a set of programs optimization power we had hoped for, so we have a contradiction. Is there anyway to salvage it? You could make your effectiveness measure depend upon the environment e as well, however that does not remove the potential for self-reference as o is part of the environment. So we might be able to rescue o by constraining the environment to not have any reference to o in. However we don’t control the environment nor do we have perfect knowledge of the environment, so we don’t know when it has references to o in or not, or when it is reliable.

You could make try and make it so that Mu could have no impact on what p does. Which is the same as trying to make the system indestructible, but with a reversible physics what is created can be destroyed.

So where do we go from here?