Chapter 5, “Its Favorite Things,” starts with Yudkowsky’s “Correct-Nest parable” about intelligent aliens who care a lot about the exact number of stones found in their nests.
Immediately after the parable, on page 82:
Most alien species, if they evolved similarly to how known biological evolution usually works, and if given a chance to have things the way they liked them most, probably would not choose a civilization where all their homes contained a large prime number of stones. There are just a lot of other ways to be; there are a lot of other directions one could steer. Much like predicting that your next lottery ticket won’t be a winning one, this is an easy call.
Similarly, most powerful artificial intelligences, created by any method remotely resembling the current methods, would not choose to build a future full of happy, free people. We aren’t saying this because we get a kick out of being bleak. It’s just that those powerful machine intelligences will not be born with preferences much like ours.
This is just a classic “counting argument” against alignment efforts being successful, right?
I recall Alex Turner (TurnTrout) arguing that (at least some) counting arguments (that are often made) are wrong (Many arguments for AI x-risk are wrong) and quoting Nora Belrose and Quintin Pope arguing the same (Counting arguments provide no evidence for AI doom). Some people in the comments, such as Evan Hubinger, seem to disagree, but as a layperson the discussion became too technical for me to understand.
In any case, the version of the counting argument in the book seems simple enough that as a layperson I can tell that it’s wrong. To me it seems like it clearly proves too much.
Insofar as Yudkowsky and Soares are saying here that an ASI created by any method remotely resembling the current method will likely not choose to build a future full of happy, free people because there are many more possible preferences that an ASI could have than the narrow subset of preferences that would lead to it building a future of happy, free people, then I think the argument is wrong.
It seems like this counting observation is a reason to think (so maybe I think the “no evidence” in the above linked post title is too strong) that the preferences an ASI ends up having might not be the preferences its creators try training into it, because the target preferences are indeed a narrow target and narrow targets are easier to miss than broad targets. But surely this counting observation is not sufficient to conclude that ASI creators will fail to hit their narrow target. It seems like you would need more reasons to conclude that.
Yes, in my language it’s a *random potshot” fallacy.
One would be the random potshot version of the Orthogonality Thesis, where there is an even chance of hitting any mind in mindspace, and therefore a high chance ideas of hitting an eldritch, alien mind. But equiprobability is only one way of turning possibilities into probabilities, and not particularly realistic. Random potshots aren’t analogous to the probability density for action of building a certain type of AI, without knowing much about what it would be.
Does this just change the problem to one of corrigibility? If the target is narrow but AI can be guided toward it, that’s good. If the target is narrow and AI cannot be guided effectively, then it’s predictably not going to hit the target.
Chapter 5, “Its Favorite Things,” starts with Yudkowsky’s “Correct-Nest parable” about intelligent aliens who care a lot about the exact number of stones found in their nests.
Immediately after the parable, on page 82:
This is just a classic “counting argument” against alignment efforts being successful, right?
I recall Alex Turner (TurnTrout) arguing that (at least some) counting arguments (that are often made) are wrong (Many arguments for AI x-risk are wrong) and quoting Nora Belrose and Quintin Pope arguing the same (Counting arguments provide no evidence for AI doom). Some people in the comments, such as Evan Hubinger, seem to disagree, but as a layperson the discussion became too technical for me to understand.
In any case, the version of the counting argument in the book seems simple enough that as a layperson I can tell that it’s wrong. To me it seems like it clearly proves too much.
Insofar as Yudkowsky and Soares are saying here that an ASI created by any method remotely resembling the current method will likely not choose to build a future full of happy, free people because there are many more possible preferences that an ASI could have than the narrow subset of preferences that would lead to it building a future of happy, free people, then I think the argument is wrong.
It seems like this counting observation is a reason to think (so maybe I think the “no evidence” in the above linked post title is too strong) that the preferences an ASI ends up having might not be the preferences its creators try training into it, because the target preferences are indeed a narrow target and narrow targets are easier to miss than broad targets. But surely this counting observation is not sufficient to conclude that ASI creators will fail to hit their narrow target. It seems like you would need more reasons to conclude that.
Yes, in my language it’s a *random potshot” fallacy.
Does this just change the problem to one of corrigibility? If the target is narrow but AI can be guided toward it, that’s good. If the target is narrow and AI cannot be guided effectively, then it’s predictably not going to hit the target.
I think you have to assume one of incorrigibility, very rapid takeoff , or deception for a doom argument to go through.