And they all share a curious pattern. Even though the computer can destroy itself without complaint, and even salvage itself for spare parts if matter is scarce, it never seems to exhibit any instability of values.
But aren’t we talking about a thought experiment here?
We are. I wanted to show that stable goal systems aren’t difficult to build, even in a setting that allows all sorts of weird actions like self-modification. You can just see at a glance whether a given system is stable: my examples obviouly are, and Mitchell’s stochastic example obviously isn’t.
Note: These arguments stem from a re-reading of the OP, they’re not directly related to my initial comment.
If it were easy, we could do it. This is showing that if there are buildable life mechanisms that will perform task, a life computer can find it find it, and build it.
The computer never has instability of values because it never modifies itself until it has a proven plan, but finding the plan and the proof is left as an exercise for the computer.
The built mechanism might be very complicated, the proof of its behavior might be huge, and the computer might need to think for a looong time to get it. (which isn’t a problem if the goal is an operation on a stable universe and the computer is big enough.)
If you want to do things faster or smaller, your program needs to be smarter, and perhaps its creations need to be smarter. You can still pass them through a proof checker so you know they are goal-friendly but that doesn’t tell you or the program how to find things that are goal friendly in a time/space efficient manner.
You could end up with a paper clipper running inside your program somehow forced to pursue your given utility function, which to me looks a lot like the sandboxing,
Also, in each case listed the resulting mechanism probably wouldn’t itself have any general intelligence. The goal would need to be pretty complex to require it. So in these cases the program won’t be solving the stable goal problem either, just building machines.
But aren’t we talking about a thought experiment here?
We are. I wanted to show that stable goal systems aren’t difficult to build, even in a setting that allows all sorts of weird actions like self-modification. You can just see at a glance whether a given system is stable: my examples obviouly are, and Mitchell’s stochastic example obviously isn’t.
Note: These arguments stem from a re-reading of the OP, they’re not directly related to my initial comment.
If it were easy, we could do it. This is showing that if there are buildable life mechanisms that will perform task, a life computer can find it find it, and build it.
The computer never has instability of values because it never modifies itself until it has a proven plan, but finding the plan and the proof is left as an exercise for the computer.
The built mechanism might be very complicated, the proof of its behavior might be huge, and the computer might need to think for a looong time to get it. (which isn’t a problem if the goal is an operation on a stable universe and the computer is big enough.)
If you want to do things faster or smaller, your program needs to be smarter, and perhaps its creations need to be smarter. You can still pass them through a proof checker so you know they are goal-friendly but that doesn’t tell you or the program how to find things that are goal friendly in a time/space efficient manner.
You could end up with a paper clipper running inside your program somehow forced to pursue your given utility function, which to me looks a lot like the sandboxing,
Also, in each case listed the resulting mechanism probably wouldn’t itself have any general intelligence. The goal would need to be pretty complex to require it. So in these cases the program won’t be solving the stable goal problem either, just building machines.