JoshuaZ comments on Hacking the CEV for Fun and Profit

JoshuaZ 4 Jun 2010 21:46 UTC
3 points
0

That’s my point. If they do care about that, then the AI will do it. If it doesn’t, then its not working right.

So care about other people how? And to what extent? That’s the point of things like CEV.

It’s only really a contradiction to us. Either the AI has a goal to make sure that there is always a democracy or it has a goal to simply build a democracy in which case it can abolish itself if it decides to do so.

Insufficient imagination. What if for example we tell the AI to try the first one and then it decides that the solution is to kill the people who don’t support a democracy? That’s the point, even when you’ve got something resembling a rough goal, you are assuming your AI will accomplish the goals the way a human would.

To get some idea of how easily something can go wrong it might help to say read about the stamp collecting device for starters. There’s a lot that can go wrong with an AI. Even dumb optimizers often arrive at answers that are highly unexpected. Smart optimizers have the same problems but more so.

Bad AI’s can, sure. If its bad though, whats it matter who its trying to follow orders from. It will ultimately try to turn them into paper clips as well.

What matters is that an unfriendly AI will make things bad for everyone. If someone screws up just once and makes a very smart paperclipper then that’s an existential threat to humanity.

Your right. Sorry. There are a lot of variables to consider. It is one likely sceneario to consider. Currently, the internet isn’t interfaced with the actual world enough that you could control everything from it, and I can’t see any possible way any entity could take over. Doesn’t mean it can’t happen, but its also wrong to assume it will.

Well, no one is assuming that it will. But some people assign the scenario a high probability, and it only needs a very tiny probability to really be a bad scenario. Note incidentally that there’s a lot a very smart entity could do simply with basic internet access. For example, consider what happens if the AI finds a fast way to factor numbers. Well then, lots of secure communication channels over the internet are now vulnerable. And that’s aside from the more plausible but less dramatic problem of an AI finding flaws in programs that we haven’t yet noticed. Even if our AI just decided to take over most of the world’s computers to increase processing power that’s a pretty unpleasant scenario for the rest of us. And that’s on the lower end of problems. Consider how often some bad hacking incident occurs where some system that should not have been online is accessible online. Now think about how many automated or nearly fully automated plants there are (for cars, for chemicals for 3-rd printing). And that situation will only get worse over the next few years.

Worse, a smart AI can likely get people to release it from its box and allow it a lot more free reign. See the AI box test. Even if the AI has trouble dealing with that, an AI with internet access (which you seem to think wouldn’t be that harmful) might not have trouble finding someone sympathetic to the AI if it portrayed itself sympathetically. These are all only some of the most obvious of failure modes. It may well be that some of the sneakiest things such an AI could do won’t even occur to us because they are so beyond anything humans would think of. It helps for this sort of thing to not only have a minimally restricted imagination but also to realize that even such an imagination is likely too small to encompass all the possible things that can go wrong.
- [deleted] 4 Jun 2010 22:36 UTC
  7 points
  0
  Parent
  
  That’s my point. If they do care about that, then the AI will do it. If it doesn’t, then its not >>working right.
  
  So care about other people how? And to what extent? That’s the point of things like CEV.
  
  If I understand Houshalter correctly, then his idea can be presented using the following story:
  
  Suppose you worked out the theory of building self-improving AGIs with stable goal systems. The only problem left now is to devise an actual goal system that will represent what is best for humanity. So you spend the next several years engaged in deep moral reflection and finally come up with the perfect implementation of CEV completely impervious to the tricks of Dr. Evil and his ilk.
  
  However, morality upon which you have reflected for all those years isn’t an external force accessible only to humans. It is a computation embedded in your brain. Whatever you ended up doing was the result of your brain-state at the beginning of the story and stimuli that have affected you since that point. All of this could have been simulated by a Sufficiently Smart™ AGI.
  
  So the idea is: instead of spending those years coming up with the best goal system for your AGI, simply run it and tell it to simulate a counterfactual world in which you did and then do what you would have done. Whatever will result from that, you couldn’t have done better anyway.
  
  Of course, this is all under the assumption that formalizing Coherent Extrapolated Volition is much more difficult than formalizing My Very Own Extrapolated Volition (for any given value of me).
- Blueberry 4 Jun 2010 21:54 UTC
  3 points
  0
  Parent
  
  To get some idea of how easily something can go wrong it might help to read say read about the stamp collecting device for starters.
  
  Thanks for that link. That is brilliant, especially Eliezer’s comment:
  
  Seth, I see that you were a PhD student in NEU’s Electrical Engineering department. Electrical engineering isn’t very complicated, right? I mean, it’s just:
  
  while device is incomplete
  
  …get some wires
  
  …connect them
  
  The part about getting wires can be implemented by going to a hardware store, and as for connecting them, a soldering iron should do the trick.