“there is some number of colors k for which it is NP-hard (that is, effectively impossible) to distinguish between networks in which it is possible to satisfy at least 99% of the constraints and networks in which it is possible to satisfy at most 1% of the constraints”. I think this sentence is concerning for those interested in the possibility of creating FAI.
Not really. This sounds very similar to an argument I’ve heard that misuses Rice’s theorem to conclude that creating a provably friendly AI is impossible. Here’s the argument, as I remember it: “By Rice’s theorem, there is no general method for determining whether any given AI is an FAI. Therefore it is impossible to build an AI that you can prove is an FAI.” The first step of the argument is correct. However, the next step does not follow; just because there is no method that can correctly tell you which of all possible AIs is friendly, it does not follow that there does not exist a method to prove that some particular AI is friendly. You could use the exact same argument to “prove” that there is no program that provably halts; after all, there is no general method for determining whether an arbitrary program halts. But of course writing a program that provably halts is actually quite easy.
I looks to me like your argument makes the same error, but starting from this graph coloring thing that I don’t know anything about, instead of from Rice’s theorem. The problem of classifying networks based on how many of some set of constraints it is possible for it to satisfy may be NP-hard, but that does not mean that it is difficult to find some network in which you can satisfy most or even all of the constraints.
It is possible that I’m misunderstanding your argument, but even if I am, I’m not particularly worried that it could be a real problem. You don’t give a concrete reason to believe that this coloring problem has much to do with FAI. In particular, how does it relate to FAI in a way that it doesn’t relate to an operating system? Operating systems are also programs that we want to satisfy a complicated set of constraints. But we can make those.
I think my reply to Lurker, above, might clarify some things. To answer your question.
Making an operating system is easy. Deciding which operating system should be used is harder. This is true despite the fact that an operating system’s performance on most criteria can easily be assessed. Assessing whether an operating system is fast is easier than assessing whether a universe is “just”, for example. Also, choosing one operating system from a set of preexisting options is much easier than choosing one future from all possibilities that can be created (unimaginably many). Finally, there are far fewer tradeoffs involved in choice of operating system than there are in choice of satisficing criteria. You might trade money for speed, or speed for usability, and be able to guess correctly a reasonably high percentage of the time. But few other tradeoffs exist, so the situation is comparatively a simple one. If comparing two AI with different satisficing protocols, it would be very hard to guess which AI did a better job just by looking at the results they created, unless one failed spectacularly. But the fact that it’s difficult to judge a tradeoff doesn’t mean it doesn’t exist. Because I think human values are complicated, I think such tradeoffs are very likely to exist.
We are used to imagining spectacular failures resulting from tiny mistakes, on LW. An AI that’s designed to make people smiles creates a monstrous stitching of fleshy carpet. But we are not used to imagining failures that are less obvious, but still awful. I think that’s a problem.
We are used to imagining spectacular failures resulting from tiny mistakes, on LW. An AI that’s designed to make people smiles creates a monstrous stitching of fleshy carpet. But we are not used to imagining failures that are less obvious, but still awful. I think that’s a problem.
I think people understand that this is a danger, but by nature, one can’t spend a lot of time imagining these situations. Also, UGC has little to do with this problem.
Not really. This sounds very similar to an argument I’ve heard that misuses Rice’s theorem to conclude that creating a provably friendly AI is impossible. Here’s the argument, as I remember it: “By Rice’s theorem, there is no general method for determining whether any given AI is an FAI. Therefore it is impossible to build an AI that you can prove is an FAI.” The first step of the argument is correct. However, the next step does not follow; just because there is no method that can correctly tell you which of all possible AIs is friendly, it does not follow that there does not exist a method to prove that some particular AI is friendly. You could use the exact same argument to “prove” that there is no program that provably halts; after all, there is no general method for determining whether an arbitrary program halts. But of course writing a program that provably halts is actually quite easy.
I looks to me like your argument makes the same error, but starting from this graph coloring thing that I don’t know anything about, instead of from Rice’s theorem. The problem of classifying networks based on how many of some set of constraints it is possible for it to satisfy may be NP-hard, but that does not mean that it is difficult to find some network in which you can satisfy most or even all of the constraints.
It is possible that I’m misunderstanding your argument, but even if I am, I’m not particularly worried that it could be a real problem. You don’t give a concrete reason to believe that this coloring problem has much to do with FAI. In particular, how does it relate to FAI in a way that it doesn’t relate to an operating system? Operating systems are also programs that we want to satisfy a complicated set of constraints. But we can make those.
I think my reply to Lurker, above, might clarify some things. To answer your question.
Making an operating system is easy. Deciding which operating system should be used is harder. This is true despite the fact that an operating system’s performance on most criteria can easily be assessed. Assessing whether an operating system is fast is easier than assessing whether a universe is “just”, for example. Also, choosing one operating system from a set of preexisting options is much easier than choosing one future from all possibilities that can be created (unimaginably many). Finally, there are far fewer tradeoffs involved in choice of operating system than there are in choice of satisficing criteria. You might trade money for speed, or speed for usability, and be able to guess correctly a reasonably high percentage of the time. But few other tradeoffs exist, so the situation is comparatively a simple one. If comparing two AI with different satisficing protocols, it would be very hard to guess which AI did a better job just by looking at the results they created, unless one failed spectacularly. But the fact that it’s difficult to judge a tradeoff doesn’t mean it doesn’t exist. Because I think human values are complicated, I think such tradeoffs are very likely to exist.
We are used to imagining spectacular failures resulting from tiny mistakes, on LW. An AI that’s designed to make people smiles creates a monstrous stitching of fleshy carpet. But we are not used to imagining failures that are less obvious, but still awful. I think that’s a problem.
I think people understand that this is a danger, but by nature, one can’t spend a lot of time imagining these situations. Also, UGC has little to do with this problem.