jacobt

Karma: 119

Yet another safe oracle AI proposal

jacobtFeb 26, 2012, 11:45 PM

4 points

33 comments12 min readLW link

jacobt Feb 25, 2012, 9:29 AM
0 points
in reply to: Dmytry’s comment on: Superintelligent AGI in a box—a question.

The framework, as we already have established, would not keep an AI from maximizing what ever the AI wants to maximize.

That’s only if you plop a ready-made AGI in the framework. The framework is meant to grow a stupider seed AI.

The framework also does nothing to prevent AI from creating a more effective problem solving AI that is more effective at problem solving by not evaluating your problem solving functions on various candidate solutions, and instead doing something else that’s more effective.

Program (3) cannot be re-written. Program (2) is the only thing that is changed. All it does is improve itself and spit out solutions to optimization problems. I see no way for it to “create a more effective problem solving AI”.

So what does the framework do, exactly, that would improve safety here?

It provides guidance for a seed AI to grow to solve optimization problems better without having it take actions that have effects beyond its ability to solve optimization problems.

jacobt Feb 25, 2012, 6:39 AM
2 points
in reply to: orthonormal’s comment on: Superintelligent AGI in a box—a question.

failure to imagine a loophole in a qualitatively described algorithm is far from a proof of safety.

Right, I think more discussion is warranted.

How will you be sure that the seed won’t need to be that creative already in order for the iterations to get anywhere?

If general problem-solving is even possible then an algorithm exists that solves the problems well without cheating.

And even if the seed is not too creative initially, how can you be sure its descendants won’t be either?

I think this won’t happen because all the progress is driven by criterion (3). In order for a non-meta program (2) to create a meta-version, there would need to be some kind of benefit according to (3). Theoretically if (3) were hackable then it would be possible for the new proposed version of (2) to exploit this; but I don’t see why the current version of (2) would be more likely than, say, random chance, to create hacky versions of itself.

Don’t say you’ve solved friendly AI until you’ve really worked out the details.

Ok, I’ve qualified my statement. If it all works I’ve solved friendly AI for a limited subset of problems.

jacobt Feb 25, 2012, 6:27 AM
0 points
in reply to: Dmytry’s comment on: Superintelligent AGI in a box—a question.

When you are working on a problem where you can’t even evaluate the scoring function inside your AI—not even remotely close—you have to make some heuristics, some substitute scoring.

You’re right, this is tricky because the self-optimizer thread (4) might have to call (3) a lot. Perhaps this can be fixed by giving the program more time to find self-optimizations. Or perhaps the program could use program (3)’s specification/source code rather than directly executing it, in order to figure out how to optimize it heuristically. Either way it’s not perfect. At worst program (4) will just fail to find optimizations in the allowed time.

And once you have an AI inside your framework which is not maximizing the value that your framework is maximizing—it’s potentially AI from my original post in your framework, getting out.

Ok, if you plopped your AI into my framework it would be terrible. But I don’t see how the self-improvement process would spontaneously create an unfriendly AI.

jacobt Feb 25, 2012, 5:20 AM
0 points
in reply to: Dmytry’s comment on: Superintelligent AGI in a box—a question.
Yes, it’s a very bad idea to take the AI from your original post and then stick it into my framework. But if we had programmers initially working within my framework to create the AI according to criterion (3) in good faith, then I think any self-improvements the system makes would also be safe. If we already had an unfriendly AGI we’d be screwed anyway.

jacobt Feb 25, 2012, 5:02 AM
0 points
in reply to: [deleted]’s comment on: Superintelligent AGI in a box—a question.
Right, this doesn’t solve friendly AI. But lots of problems are verifiable (e.g. hardware design, maybe). And if the hardware design the program creates causes cancer and the humans don’t recognize this until it’s too late, they probably would have invented the cancer-causing hardware anyway. The program has no motive other than to execute an optimization program that does well on a wide variety of problems.

Basically I claim that I’ve solved friendly AI for verifiable problems, which is actually a wide class of problems, including the problems mentioned in the original post (source code optimization etc.)

jacobt Feb 25, 2012, 4:50 AM
0 points
in reply to: Dmytry’s comment on: Superintelligent AGI in a box—a question.

If the resource bounded execute lets the alg get online the alg is free to hack into servers.

So don’t do that.

Plus it is not AGI, and people will be using it to make AGI or hardware for AGI.

See my other post, it can solve many many different problems, e.g. general induction and the problems in your original post (such as optimizing source code, assuming we have a specification for the source code).

You basically start off with some mighty powerful artificial intelligence.

This framework is meant to provide a safe framework for this powerful AI to become even more powerful without destroying the world in the process. Also, the training set provides a guide for humans trying to write the code.

To reiterate: no, I haven’t solved friendly AI, but I think I’ve solved friendly AI for verifiable problems.

jacobt Feb 25, 2012, 4:46 AM
0 points
in reply to: [deleted]’s comment on: Superintelligent AGI in a box—a question.
This system is only meant to solve problems that are verifiable (e.g. NP problems). Which includes general induction, mathematical proofs, optimization problems, etc. I’m not sure how to extend this system to problems that aren’t efficiently verifiable but it might be possible.

One use of this system would be to write a seed AI once we have a specification for the seed AI. Specifying the seed AI itself is quite difficult, but probably not as difficult as satisfying that specification.

jacobt Feb 25, 2012, 4:36 AM
0 points
in reply to: TimS’s comment on: Superintelligent AGI in a box—a question.

Now it doesn’t seem like your program is really a general artificial intelligence—improving our solutions to NP problems is neat, but not “general intelligence.”

General induction, general mathematical proving, etc. aren’t general intelligence? Anyway, the original post concerned optimizing things program code, which can be done if the optimizations have to be proven.

Further, there’s no reason to think that “easy to verify but hard to solve problems” include improvements to the program itself. In fact, there’s every reason to think this isn’t so.

That’s what step (3) is. Program (3) is itself an optimizable function which runs relatively quickly.

jacobt Feb 25, 2012, 4:21 AM
0 points
in reply to: TimS’s comment on: Superintelligent AGI in a box—a question.

Who exactly is doing the “allowing”?

Program (3), which is a dumb, non-optimized program. See this for how it could be defined.

There is no particular guarantee that the verification of improvement will be easier than discovering the improvement (by hypothesis, we couldn’t discover the latter without the program).

See this. Many useful problems are easy to verify and hard to solve.

jacobt Feb 25, 2012, 3:51 AM
0 points
in reply to: Dmytry’s comment on: Superintelligent AGI in a box—a question.
Ok, pseudo-Python:
```
def eval_algorithm(alg):
    score = 0
    for problem in problems:
        output = resource_bounded_execute(alg, nsteps, problem)
        score += problem.outputScore(output)
    return score - k * len(alg)
```
Where resource_bounded_execute is a modified interpreter that fails after alg executes nsteps.

edit: of course you can say it is sandboxed and haven’t got hands, but it wont be long until you start, idk, optimizing proteins or DNA or the like.

Again, I don’t see why a version of (2) that does weird stuff with proteins and DNA will make the above python program (3) give it a higher score.
What links here?
- jacobt's comment on Superintelligent AGI in a box—a question. by Dmytry (Feb 25, 2012, 4:21 AM; 0 points)

jacobt Feb 25, 2012, 3:36 AM
0 points
in reply to: TimS’s comment on: Superintelligent AGI in a box—a question.

Well, one way to be a better optimizer is to ensure that one’s optimizations are actually implemented.

No, changing program (2) to persuade the human operators will not give it a better score according to criterion (3).

In short, allowing the program to “optimize” itself does not define what should be optimized. Deciding what should be optimized is the output of some function, so I suggest calling that the “utility function” of the program. If you don’t program it explicitly, you risk such a function appearing through unintended interactions of functions that were programmed explicitly.

I assume you’re referring to the fitness function (performance on training set) as a utility function. It is sort of like a utility function in that the program will try to find code for (2) that improves performance for the fitness function. However it will not do anything like persuading human operators to let it out in order to improve the utility function. It will only execute program (2) to find improvements. Since it’s not exactly like a utility function in the sense of VNM utility it should not be called a utility function.

jacobt Feb 25, 2012, 3:33 AM
0 points
in reply to: [deleted]’s comment on: Superintelligent AGI in a box—a question.
The problems are easy to verify but hard to solve (like many NP problems). Verify the results through a dumb program. I verify that the optimization algorithms do what I want by testing them against the training set; if it does well on the training set without overfitting it too much, it should do well on new problems.

As for how useful this is: I think general induction (resource-bounded Solomonoff induction) is NP-like in that you can verify an inductive explanation is a relatively short time. Just execute the program and verify that its output matches the observations so far.

(Also, “some code that . . . finds a good solution” is just a little bit of an understatement. . .)

Yes, but any seed AI will be difficult to write. This setup allows the seed program to improve itself.

edit: I just realized that mathematical proofs are also verifiable. So, a program that is very very good at verifiable optimization problems will be able to prove many mathematical things. I think all these problems it could solve are sufficient to demonstrate that it is an AGI and very very useful.
What links here?
- jacobt's comment on Superintelligent AGI in a box—a question. by Dmytry (Feb 25, 2012, 4:21 AM; 0 points)

jacobt Feb 25, 2012, 3:25 AM
0 points
in reply to: Dmytry’s comment on: Superintelligent AGI in a box—a question.
Look at how the system is set up. Would code for (2) that eats up resources score better according to objective function (3)? No, because the criterion for (3) should evaluate program (2) by giving it only a constant amount of computing power; nothing the program does would change how much computing power (3) would give it.

jacobt Feb 25, 2012, 2:40 AM
1 point
in reply to: TimS’s comment on: Superintelligent AGI in a box—a question.
The output of the program is purely binary/textual. If a programmer gave the AI the task “give me text that, if I followed it, would optimize human society according to utility function U”, then the AI might spit out an answer that, if carried out, would have bad consequences. The easy solution is to not ask the program to solve problems that have highly variable/subjective value depending on utility function and just ask it to optimize, say, microchip design for efficiency. I don’t think that a microchip design found by simple optimization methods that perform well on other problems would be dangerous.

jacobt Feb 24, 2012, 10:55 PM
6 points
on: Superintelligent AGI in a box—a question.
If you only want the AI to solve things like optimization problems, why would you give it a utility function? I can see a design for a self-improving optimization problem solver that is completely safe because it doesn’t operate using utility functions:
1. Have a bunch of sample optimization problems.
2. Have some code that, given an optimization problem (stated in some standardized format), finds a good solution. This can be seeded by a human-created program.
3. When considering an improvement to program (2), allow the improvement if it makes it do better on average on the sample optimization problems without being significantly more complex (to prevent overfitting). That is, the fitness function would be something like (average performance—k * bits of optimizer program).
4. Run (2) to optimize its own code using criterion (3). This can be done concurrently with human improvements to (2), also using criterion (3).
This would produce a self-improving AGI that would do quite well on sample optimization problems and new, unobserved optimization problems. I don’t see much danger in this setup because the program would have no reason to create malicious output. Creating malicious output would just increase complexity without increasing performance on the training set, so it would not be allowed under criterion (3), and I don’t see why the optimizer would produce code that creates malicious output.

EDIT: after some discussion, I’ve decided to add some notes:
1. This only works for verifiable (e.g. NP) problems. These problems include general induction, writing programs to specifications, math proofs, etc. This should be sufficient for the problems mentioned in the original post.
2. Don’t just plug a possibly unfriendly AI into the seed for (2). Instead, have a group of programmers write program (2) in order to do well on the training problems. This can be crowd-sourced because any improvement can be evaluated using program (3). Any improvements the system makes to itself should be safe.
I claim that if the AI is created this way, it will be safe and do very well on verifiable optimization problems. So if this thing works I’ve solved friendly AI for verifiable problems.
What links here?
- Yet another safe oracle AI proposal by jacobt (Feb 26, 2012, 11:45 PM; 4 points)

jacobt Feb 14, 2012, 2:56 AM
1 point
in reply to: drnickbone’s comment on: Self-Indication Assumption—Still Doomed
Your example considers an infinite universe with 1000 observers (and then presumably an infinite amount of dead-space). You say this counts for the same weighted probability as a finite universe with 1000 observers (here assuming the universes had the same Levin probability originally).

In the original example I was assuming the 1000 observers were immortal so they contribute more observer-seconds. I think this is a better presentation:

We have:
1. a finite universe. 1000 people are born at the beginning. The universe is destroyed and restarted after 1000 years. After it restarts another 1000 people are born, etc. etc.
2. an infinite universe. 1000 people are born at the beginning. Every 1000 years, everyone dies and 1000 more people are born.
If both have equal prior probability and efficiency, we should assign them equal weight. This is even though the second universe has infinitely more observers than (a single copy of) the finite universe.

Alternatively, when you discuss re-running the finite 1000-observer universe from the start (so the 1000 observers are simulated over and over again), then is that supposed to increase the weight assigned to the finite universe?

Yes.

Perhaps you think that it should, but if so, why?

Because there are more total observers. If the universe is restarted there are 1000 observers per run and infinite runs, as opposed to 1000 observers total.

Why should a finite universe which stops completely receive greater weight than an otherwise identical universe whose simulation just contines forever past the stop point with loads of dead space?

For one, only the first can be simulated by a machine in a finite universe. Also, in a universe with infinite time but not memory, only the first can be simulated infinite times.

Also, the universe with the dead space might contain simulations of finite universes (after all, with infinite time and quantum mechanics everything happens). Then almost all of the observers in the infinite universe are simulated inside a finite universe, not in the infinite universe proper.

Another argument: if the 2 universes (infinite with infinite observers (perhaps by restarting), infinite with finite observers) are run in parallel, almost all observers will be in the first universe. It seems like it shouldn’t make a difference if the universes are being run in parallel or if just one of them was chosen randomly to be run.

jacobt Feb 10, 2012, 10:50 PM
0 points
in reply to: drnickbone’s comment on: Self-Indication Assumption—Still Doomed
Ok, we can posit that if any of the universes ends, we just re-start it from the beginning. Now if there is 1 universe that runs for 1000 years with 1000 observers, and 1 universe that runs forever with 1000 observers, and their laws of physics were equiprobable, then their SIA probabilities are also equiprobable. The observers in the finite universe will be duplicated infinite times, but I don’t think this is a problem (see Nick Bostrom’s duplication paper). Also, some infinite universes might have an infinite number of finite simulations inside them, so it’s somewhat likely for an observer to be in a finite universe simulated by an infinite universe.

I think you can deal with the infiniteness by noting that, for any sequence of observations, that sequence will be observed by some proportion of the observers in the multiverse. So you can still anticipate the future by comparing P(past observations + future observation) among the possible future observations.

jacobt Jan 31, 2012, 8:45 AM
2 points
on: Self-Indication Assumption—Still Doomed
Here’s something I’ve thought about as a refinement of SIA:

A universe’s prior probability is proportional to its original, non-anthropic probability, multiplied by its efficiency at converting computation-time to observer-time. You get this by imagining running all universes in parallel, giving them computational resources proportional to their (non-anthropic) prior probability (as in Lsearch). You consider yourself to be a random observer simulated in one of these programs. This solves the problem of infinite universes (since efficiency is bounded) while still retaining the advantages of SIA.

One problem is that our universe appears to be very inefficient at producing consciousness. However this could be compensated for if the universe’s prior probability is high enough. Also, I think this system favors the Copenhagen interpretation over MWI, because MWI is extremely inefficient.

Another thought regarding the anthropic principle: you can solve all anthropic questions by just using UDT and maximizing expected utility. That is, you answer the question: “A priori, before I know the laws of the universe, is it better for someone in my situation to do X?”. Unfortunately this only works if your utility function knows how to deal with infinite universes, and it leaves lots of questions (such as how to weight many different observers, or whether simulations have moral value) up to the utility function.

On the other hand if you have a good anthropic theory, then you can derive a utility function as E[personal utility | anthropic theory]; that is, what’s the utility if you don’t know who you are yet? In this case you judge an anthropic theory by P(anthropic theory | random observer experience is yours), using Bayes’s rule, and extrapolate your personal utility function to other people using it.
What links here?
- jacobt's comment on Comments on Pascal’s Mugging by [deleted] (May 4, 2012, 12:21 AM; 1 point)

jacobt Jan 29, 2012, 8:43 AM
2 points
in reply to: CarlShulman’s comment on: Efficient Charity: Cheap Utilons via bone marrow registration
Great point.

ETA: Note that AMF is a placeholder here, I don’t actually think that’s the best way to help the current generation, let alone future generations.

I can see how things like SIAI or FHI might be better for future generations, but what do you think is better than AMF for current generations?

jacobt

Yet an­other safe or­a­cle AI proposal

Yet another safe oracle AI proposal