# Positive Bias: Look Into the Dark

I am teaching a class, and I write upon the blackboard three numbers: 2-4-6. “I am thinking of a rule,” I say, “which governs sequences of three numbers. The sequence 2-4-6, as it so happens, obeys this rule. Each of you will find, on your desk, a pile of index cards. Write down a sequence of three numbers on a card, and I’ll mark it ‘Yes’ for fits the rule, or ‘No’ for not fitting the rule. Then you can write down another set of three numbers and ask whether it fits again, and so on. When you’re confident that you know the rule, write down the rule on a card. You can test as many triplets as you like.”

Here’s the record of one student’s guesses:

4-6-2 | No |

4-6-8 | Yes |

10-12-14 | Yes . |

At this point the student wrote down their guess at the rule. What do *you* think the rule is? Would you have wanted to test another triplet, and if so, what would it be? Take a moment to think before continuing.

The challenge above is based on a classic experiment due to Peter Wason, the 2-4-6 task. Although subjects given this task typically expressed high confidence in their guesses, only 21% of the subjects successfully guessed the experimenter’s real rule, and replications since then have continued to show success rates of around 20%.

The study was called “On the failure to eliminate hypotheses in a conceptual task.” Subjects who attempt the 2-4-6 task usually try to generate *positive* examples, rather than *negative* examples—they apply the hypothetical rule to generate a representative instance, and see if it is labeled “Yes.”

Thus, someone who forms the hypothesis “numbers increasing by two” will test the triplet 8-10-12, hear that it fits, and confidently announce the rule. Someone who forms the hypothesis X-2X-3X will test the triplet 3-6-9, discover that it fits, and then announce that rule.

In every case the actual rule is the same: the three numbers must be in ascending order.

But to discover this, you would have to generate triplets that *shouldn’t* fit, such as 20-23-26, and see if they are labeled “No.” Which people tend not to do, in this experiment. In some cases, subjects devise, “test,” and announce rules far more complicated than the actual answer.

This cognitive phenomenon is usually lumped in with “confirmation bias.” However, it seems to me that the phenomenon of trying to test *positive* rather than *negative* examples, ought to be distinguished from the phenomenon of trying to preserve the belief you started with. “Positive bias” is sometimes used as a synonym for “confirmation bias,” and fits this particular flaw much better.

It once seemed that phlogiston theory could explain a flame going out in an enclosed box (the air became saturated with phlogiston and no more could be released). But phlogiston theory could just as well have explained the flame *not* going out. To notice this, you have to search for negative examples instead of positive examples, look into zero instead of one; which goes against the grain of what experiment has shown to be human instinct.

For by instinct, we human beings only live in half the world.

One may be lectured on positive bias for days, and yet overlook it in-the-moment. Positive bias is not something we do as a matter of logic, or even as a matter of emotional attachment. The 2-4-6 task is “cold,” logical, not affectively “hot.” And yet the mistake is sub-verbal, on the level of imagery, of instinctive reactions. Because the problem doesn’t arise from following a deliberate rule that says “Only think about positive examples,” it can’t be solved just by knowing verbally that “We ought to think about both positive and negative examples.” Which example automatically pops into your head? You have to learn, wordlessly, to zag instead of zig. You have to learn to flinch toward the zero, instead of away from it.

I have been writing for quite some time now on the notion that the strength of a hypothesis is what it *can’t* explain, not what it *can* —if you are equally good at explaining any outcome, you have zero knowledge. So to spot an explanation that isn’t helpful, it’s not enough to think of what it does explain very well—you also have to search for results it *couldn’t* explain, and this is the true strength of the theory.

So I said all this, and then I challenged the usefulness of “emergence” as a concept. One commenter cited superconductivity and ferromagnetism as examples of emergence. I replied that non-superconductivity and non-ferromagnetism were also examples of emergence, which was the problem. But be it far from me to criticize the commenter! Despite having read extensively on “confirmation bias,” I didn’t spot the “gotcha” in the 2-4-6 task the first time I read about it. It’s a subverbal blink-reaction that has to be retrained. I’m still working on it myself.

So much of a rationalist’s skill is below the level of words. It makes for challenging work in trying to convey the Art through words. People will agree with you, but then, in the next sentence, do something subdeliberative that goes in the opposite direction. Not that I’m complaining! A major reason I’m writing this is to observe what my words *haven’t* conveyed.

Are you searching for positive examples of positive bias right now, or sparing a fraction of your search on what positive bias should lead you to *not* see? Did you look toward light or darkness?

I think something else is going on with the 2 4 6 experiment, as described. Many of the students are making the assumption about the set of potential rules. Specifically, the assumption is that most pairs of rules in this set have the following mutual relationship: most of the instances allowed by one rule, are disallowed by the other rule. This being the case, then the quickest way to test any hypothetical rule is to produce a variety of instances which conform with that rule, to see whether they conform with the hidden rule.

I’ll give you an example. Suppose that we are considering a family of rules, “the third number is an integer polynomial of the first two numbers”. The quickest way to disconfirm a hypothetical rule is to produce instances in accordance with it and test them. If the rule is wrong, then the chances are good that an instance will quickly be discovered that does not match the hidden rule. It is much less efficient to proceed by producing instances not in accordance with it.

I’ll give a specific example. Suppose the hidden rule is c = a + b, and the hypothesized rule being tested is c = a—b. Now pick just one random instance in accordance with the hypothesized rule. I will suppose a = 4, b = 6, so c = −2. So the instance is 4 6 −2. That instance does not match the hidden rule, so the hypothesized rule is immediately disconfirmed. Now try the following: instead of picking a random instance in accordance with the hypothesized rule, pick one not in accordance with it. I’ll pick 4 6 8. This also fails to match the hidden rule, so it fails to tell us whether our hypothesized rule is correct. We see that it was quicker to test an instance that agrees with the hypothetical rule.

Thus we can see that in a certain class of situations, the most efficient way to test a hypothesis is to come up with instances that conform with the hypothesis.

Now you can fault people on having made this assumption. But if you do, then it is still a different error from the one describe. If the assumption about the kind of problem faced had been correct, then the approach (testing instances that agree with the hypothesis) would have been a good one. The error, if any, lies not in the approach per se but in the assumption.

Finally, I do not think one can rightly fault people for making that assumption. For, it is inevitable that very large and completely untested assumptions must be made in order to come to a conclusion at all. For, infinitely many rules are consistent with the evidence no matter how many instances you test. The only way ever to whittle this infinity of rules consistent with all the evidence down to one concluded rule is to make very large assumptions. The assumption that I have described may simply be the assumption which they made (and they had to make some assumption).

Furthermore, it doesn’t matter what assumptions people make (and they must make some, because of the nature of the problem), a clever scientist can learn what assumptions people tend to make and then violate those assumptions. So no matter what people do, someone can come along, construct an experiment in which those assumptions are violated, and then say, “gotcha” when the majority of his test subjects come to the wrong conclusions (because of the assumptions they were making which were violated by the experiment).

Another serious problem is that the students must make the necessary assumption that the rule be simple. In the context of school, simple is generally “most trivial to figure out”.

This is a necessary assumption because there could be rules that would not be possible to determine by guessing. For example, you’d have to spend the lifetime of the universe guessing triplets to correctly identify that the rule is “Ascending integers except sequences containing the 22nd Busy Beaver number”, and then you still wouldn’t know if there’s some other rider.

If it was said, “It will require several more guesses to figure out the rule, but not more than a couple dozen, and the sequences you have don’t fully tell you what the rule is”, the exercise would be a lot more sane. At worst, the only mistake the students made was assuming that the exercise was supposed to be

toosimple. Which is like asking them to be mind readers: I’m thinking of a problem; on a scale of 1-10, please guess how difficult it is to solve.The problem is not that they are trying examples which confirm their hypothesis it’s that they are trying

onlythose examples which test their hypothesis.The article focuses on testing examples which don’t work because people don’t do this enough. Searching for positive examples is (as you argue) a neccessary part of testing a hypothesis, and people seem to have no problem applying this. What people fail to do is to search for the negative as well.

Both positive and negative examples are, I’d say, equally important, but people’s focus is completely imbalanced.

In the situation you described, it would be necessary to test values that did and didn’t match the hypothesis, which ends up working an awful lot like adjusting away from an anchor. Is there a way of solving the 2 4 6 problem without coming up with a hypothesis too early?

The problem is not that they come up with a hypothesis too early, it’s that they stop too early without testing examples that are not supposed to work. In most cases people are given as many opportunities to test as they’d like, yet they are confident in their answer after only testing one or two cases (all of which came up positive).

The trick is that you should come up with one or more hypotheses as soon as you can (maybe without announcing them), but test both cases which do and don’t confirm it, and be prepared to change your hypothesis if you are proven wrong.

If it requires a round-trip of human speech through a professor (and thus the requisition of the attention of the entire class) then you can hardly say they are given as many opportunities to test as they’d like. A person of functioning social intelligence certainly has no more than 20 such round-trips available consecutively, and less conservatively even 4 might be pushing it for many.

Give them a computer program to interact with and

thenyou can say they have as many opportunities to test as they’d like.Come up with several hypotheses in parallel, perhaps?

Sooo many double posts! This new interface is buggy as @#$!

Following what Constant has pointed out, I am wondering if there is, in fact, a way to solve the 2 4 6 problem without first guessing, and then adjusting your guess.

Following what Constant has pointed out, I am wondering if there is, in fact, a way to solve the 2 4 6 problem without first guessing, and then adjusting your guess.

In the situation you described, it would be necessary to test values that did and didn’t match the hypothesis, which ends up working an awful lot like adjusting away from an anchor. Is there a way of solving the 2 4 6 problem without coming up with a hypothesis too early?

Robin, I observe that Nature also fails to live up to the usual standards of an economics experiment.

Stuart and Constant, in AI/machine learning we have a formal notion of “strictly more general concepts” as those with a strictly greater set of positive examples, and symmetrically for strictly more specific concepts. (This is not usually what I mean when I say “concept” but this is the term of art in machine learning.)

Positive bias implies that people look at a set of examples and a starting concept, and try to envision a strictly more specific concept: for example, “ascending by 2 but all numbers positive”. We seem to focus less on finding a strictly more general concept, such as “separated by equal intervals” or “in ascending order” or “any sequence not ending in 2″.

Why do we only look in the more-specific direction and see only half the universe of concepts? Instinct, one might simply say, and be done with it it. One might try a Bayesian argument that any more general concept would concentrate its probability mass less, and do a poorer job of explaining the positive examples found—for it seems that 10-12-14 is an unlikely thing to see, if the generator is “any sequence” than “any sequence separated by intervals of 2″. But this is an invalid argument if you are the one generating the examples! As for the initial example being misleadingly specific, heck, people read nonexistent coincidences into Nature all the time. It may not be fair of the experimenter but it is certainly

realisticas a test of a rationalist’s skill.If you are testing examples in an oracle, “positive” and “negative” are

symmetrical labels. This point alone should make it very clear that, from the standpoint of probability theory, we are dealing strictly with a bizarre quirk of human psychology.Are you searching for positive examples of positive bias right now, or sparing a fraction of your search on what positive bias should lead you to not see? Did you look toward light or darkness?Your hypothesis is that positive biases are generally bad. It is thus my duty to try and disprove your idea, and see what emerges from the result.

Let’s take your example, but now the sequences are ten numbers long and the initial sequence is 2-4-6-10-12-14-16-18-20-22 (the rule is still the same). Picking a sequence at random from a given set of numbers, we have only one chance in 10! = 3628800 of coming up with one that obeys the rule. Someone following the approach you recommended would probably fist try one instance of “x,x+2,x+4...” or “x,2x,3x,...”, then start checking a few random sequences (getting “No” on each one, with near certainty). In this instance, disregarding positive bias doesn’t help (unless you do a really brutal amount of testing). This is not just an artifact of “long” sequences—had we stuck with the sequence of three numbers, but the rule was “all in ascending order, or one number above ten trillion”, then finding the right rule would be just as hard. What gives?

Even worse, suppose you started with two assumptions: 1) the sequence is x,2x,3x,4x,5x,… 10x 2) the sequence is x, x+2, x+4,… x+18

You do one or two (positive) tests of 1). They comes up “yes”. You then remember to try and disprove the hypothesis, try a hundred random sequences, coming up with “no” every time. You then accept 1).

However, had you just tried to do some positive testing of 1)

and2), you would very quickly have found out that something was wrong.Analysis: Testing is indeed about trying to disprove a hypothesis, and gaining confidence when you fail. But your hypothesis covers uncountably many different cases, and you can test (positively or negatively) only a very few. Unless you have some grounds to assume that this is enough (such as the uniform time and space assumptions of modern science, or some sort of nice ordering or measure on the space of hypotheses or of observations), then neither positive nor negative testing are giving you much information.

However, if you have two competing hypothesis about the world, then a little testing is enough to tell which one is correct. This is the easiest way of making progress, and should always be considered.

Verdict: Awareness of positive bias causes us to think “I may be wrong, I should check”. The correct attitude in front of these sorts of problems is the subtly different “there may be other explanations for what I see, I should find them”. The two sentiments feel similar, but lead to very different ways of tackling the problem.

This experiment isn’t up to the usual standards of an economics experiment. When economists do such an information experiment, we give subjects some indication of the distribution that the hidden truth will be drawn from, and then we actually draw from that distribution. You can always make subjects look like fools if you give them an example that is rare given their prior expectations.

Anyone who finds the game described at the top of the article interesting, check out Zendo, a game based upon a similar idea. I’ve found Zendo handy when explaining the concept in the OP and the various other ideas of experimental design and inductive investigation. Plus, it’s lots of fun. :-)

Zendo is my go-to exercise for explaining just about any idea in inductive investigation. (But it’s even more useful as a tool for reminding myself to do better. After years, the number of Zendo games I lose due to positive bias is still far higher than I’d like… even when I think I’ve taken steps to avoid that.)

As my group’s usual Zendo Master, I have a lot of players fall into this trap. I like to train new players with one easy property like “A Koan Has The Buddah Nature If (and only if) it contains a red piece.” Once they understand the rules, I jump to something like “A Koan Has The Buddah Nature

UnlessIt contains exactly two pieces.”Switching from a positively-marked property (there is a simple feature which all these things

have) to a negatively-marked property (there is a simple feature which all these thingslack) can be pretty eye-opening.I showed Zendo to a math professor once who fell smack into the 2-4-6 trap and tried to build as many white-marked koans as possible. He even asked why the game didn’t punish people for just making the same koan over and over again, since it would be guaranteed to “follow the rule.” I eventually managed to convey that the object of the game is to be able to tell me, in words, what you think the rule is. Since then I’ve been more explicit that “part of the game involves literally just saying, out loud, what you think defines the property.” People always seem to think that the zendo is a sort of a silent lecture, when really it’s more of a laboratory class.

Maybe this provides some insight into the nature of positive bias. In the game, the only goal is to find the rule; there is no punishment for asking a wrong sequence. But I guess the real life is

notlike this. In real life, especially in the ancient environment, making a wrong guess is costly; and our cognitive algorithms were optimized for that.For example, imagine that the rule is some taboo, punishable by death. It is better to avoid the punishment, than to find the boundaries precisely. Avoiding a

supersetof the taboo also has some cost, but that cost is probably cheaper than being stoned to death. If you know that the sequence “2-4-6” doesnotget you killed (unlike some other sequences, not explicitly known which ones), it may be wise to guess “2-4-6″ over and over again.Constant made an important point:

infinitely many rules are consistent with the evidence no matter how many instances you test.Therefore any guess you make must be influenced by prior expectations. And like lusispedro said, based on experience students probably put a lot more weight on rules based on simple equations than rules based on inequalities.I’m sure I could get the percentage of people who guess correctly down to 0% by simply choosing the perfectly valid rule: “sequences (a,b,c) such that EITHER a less than b less than c OR b is a multiple of 73.”

Why? Because rules of that sort are given low weight in subjects’ priors.

We’re playing a game in which you, the player, start with a number sequence. There is a rule governing which number comes next, and whoever determines the rule will recieve $10. Any one can play, but I tagged the people who i think will be most interested.

If you guess a number, I will tell you if it is correct, and if so, I will add it to the existing sequence. Please only guess one number each day. Please only guess one number at a time, dont try and fill in a section of the sequence.

If you guess the rule, I will tell you if you are correct or incorrect. If correct, you win $10. If incorrect, you may not guess the rule again for 3 days.

Original sequence:

2, 4, 6

The sequence so far is:

2, 4, 6, 10, 18, 30, 50, 82, 134, 218, 354, 622, 623, 630 47 comments Updated about a month ago

Craig Fleischman (Indiana) wrote at 7:44pm on July 13th, 2007 10? Message—Delete

Dan Margolis (Japan) wrote at 8:31pm on July 13th, 2007 7 Message—Delete

Jeff Borack wrote at 1:03am on July 14th, 2007 10 yes, 7 no Delete

Dan Margolis (Japan) wrote at 9:52am on July 14th, 2007 Its like fibonacci sequence except starting at 2. The next digit is the sum of the two previous digits. So it would be 2, 4, 6, 10, 16, 26, 42, 68, 110...

So… X0 = 2, X1 = 4, Xn = (Xn-1 + Xn-2) Message—Delete

Jeff Borack wrote at 12:16pm on July 14th, 2007 Incorrect Delete

Dan Margolis (Japan) wrote at 12:38pm on July 14th, 2007 Worth a shot...I can’t deduce much from so few numbers… Message—Delete

Elliot Alyeshmerni wrote at 7:06pm on July 14th, 2007 im gonna go with 18 Message—Delete

Jeff Borack wrote at 8:24pm on July 14th, 2007 a job well done Delete

Yvette Monachino wrote at 8:08pm on July 15th, 2007 30 Message—Delete

Jeff Borack wrote at 10:47am on July 16th, 2007 30 works Delete

Yvette Monachino wrote at 11:06am on July 16th, 2007 50 Message—Delete

Jeff Borack wrote at 11:35am on July 16th, 2007 good Delete

Elliot Alyeshmerni wrote at 2:25pm on July 16th, 2007 82, still havent gotten the sequence down so this is a bit of a guess Message—Delete

Jeff Borack wrote at 2:33pm on July 16th, 2007 good Delete

Elliot Alyeshmerni wrote at 3:19pm on July 16th, 2007 i think we all got this sequence now.. Message—Delete

Jeff Borack wrote at 3:36pm on July 16th, 2007 i dont think anyone has it. but i welcome you to guess. If your right, $10. If your wrong, at least you’ll save yvette! Good luck. Delete

Peter Dahlke wrote at 7:31pm on July 16th, 2007 134 next? Message—Delete

Jeff Borack wrote at 8:07pm on July 16th, 2007 yup Delete

Elliot Alyeshmerni wrote at 10:49pm on July 16th, 2007 218 Message—Delete

Jeff Borack wrote at 11:15pm on July 16th, 2007 218 Delete

Victor Baranowski wrote at 10:16am on July 17th, 2007 IDK where it started, but assuming we started with 2, 4, 6 the sequence is:

Xn = X (n-1) + [(X(n-1) - X(n-2)) + (X(n-2)-X(n-3))]

or something like that… Message—Delete

Jeff Borack wrote at 10:26am on July 17th, 2007 Interesting guess, I thought people were gonna say Xn = X(n-1)+X(n-2)+2, but both are wrong. Sorry Vic. The more interesting question is: why did it take so long for someone to guess? Is the reward for guessing the correct answer to low or is the penalty to high? Delete

Jeff Borack wrote at 10:45am on July 17th, 2007 I’m changing the rule of 1 rule guess/week. You can now guess once every three days. Numbers are still once a day even though elliot broke that rule and i accepted the number. Delete

Elliot Alyeshmerni wrote at 11:43am on July 17th, 2007 this a answer works for every number except 6 and 18, but i’ll put it down anyway

X(n)=2X(n-1)-X(n-3) Message—Delete

Victor Baranowski wrote at 12:22pm on July 17th, 2007 Ya, that was similar to mine. Why the sequence goes from 10 to 18 is the tricky part of this whole thing, which makes me think the equation is going to be pretty ugly or wierd… maybe jeff made a mistake :P Message—Delete

Victor Baranowski wrote at 12:23pm on July 17th, 2007 Oh, and I might as well guess 354… Message—Delete

Jeff Borack wrote at 2:19pm on July 17th, 2007 a) the solution is beutiful b) i didn’t make any mistakes yet c) 354 is good Delete

Victor Baranowski wrote at 2:46pm on July 17th, 2007 Can I cite a) in response to your b) ? Message—Delete

Jeff Borack wrote at 8:20pm on July 17th, 2007 Hmmmm, I’m not sure. It depends on when you think the mistake was made. Technically it did come before b), but i could also argue that the mistake what made when i clicked the “Add your comment” button.

a) the solution is… very nice and good b) i didn’t make any mistakes in the number sequence yet. c) web browsers and AIM should have spell checkers. this isn’t the 20th century anymore. Delete

Tait Kowalski wrote at 3:48pm on July 18th, 2007 Sequence goes x(n) = x(n-1)+2*x(n-3)

so the next number = 354 + 2*134 = 622

next number is 622 Message—Delete

Jeff Borack wrote at 4:41pm on July 18th, 2007 Welcome Tait! That is the wrong rule, but ill accept your guess at the next number. Delete

Elliot Alyeshmerni wrote at 6:28pm on July 19th, 2007 the next number is fuck you jeff, just give us the answer lol Message—Delete

Jeff Borack wrote at 6:47pm on July 19th, 2007 Sorry elliot, want me to call the Whaaaaaaaaaaaaaaaambulance? Delete

Victor Baranowski wrote at 4:29pm on July 22nd, 2007 is the next number 620? Message—Delete

Jeff Borack wrote at 7:23pm on July 22nd, 2007 hmm strange guess. 620 is not a number Delete

Victor Baranowski wrote at 8:46am on July 23rd, 2007 howabout 623? Message—Delete

Jeff Borack wrote at 12:40pm on July 23rd, 2007 : ) 623 is the next number Delete

Craig Fleischman (Indiana) wrote at 12:55pm on July 23rd, 2007 630? Message—Delete

Jeff Borack wrote at 12:59pm on July 23rd, 2007 630 is good Delete

Victor Baranowski wrote at 1:51pm on July 23rd, 2007 Solution: the next number is whatever number is guessed, as long as it is higher than the previously guessed number. Message—Delete

Jeff Borack wrote at 2:28pm on July 23rd, 2007 hahaha, yup. it took a lot of time but not a lot of guesses. i expected the guessing to to into the hundreds of thousands. do you accept paypal? Delete

Victor Baranowski wrote at 2:35pm on July 23rd, 2007 no, i accept shots and beers the next time we hang out. Message—Delete

Yvette Monachino wrote at 4:10pm on July 27th, 2007 that is the dumbest sequence i have ever heard of Message—Delete

Jeff Borack wrote at 5:08pm on July 27th, 2007 It’s about thinking outside the box, yvette, something i wouldnt expect most MATH majors to understand! : p Victory for the engineers!!! Delete

Yvette Monachino wrote at 2:08pm on July 30th, 2007 aw thats a cute remark, knowing that you don’t actually know what real math is i won’t take that as an insult, and the only victory you accomplished is adding yourself to the long list of pompous engineers, so congrats :) Message—Delete

Jeff Borack wrote at 2:52pm on July 30th, 2007 While I might be pompous, I unfortunately can’t be considered much of an engineer. I did bioengineering, which certainly doesnt count, and i’ve never actually engineered anything. Neither has vic, hes in law school.

It is true that i don’t know what real math is (although i would love for you to teach me). However, I would imagine that real math does involve thinking outside the box on occasion. In this particular example, it required you to test a number you thought was not part of the sequence. If you believed you had found the sequence, and contintued to test numbers that fit that sequence, you would never derive the answer. By simply testing a number that does not appear to fall into the sequence, such as 2 million, it’s easy to find the solution.

Does this sound like any ‘real’ math problems you have ever encountered?

I’m not sure I buy the whole ‘subverbal’ thing—it seems to me that misleading phrasing is a big part of the problem. If asked to find the “rule” which “governs” a sequence of three numbers, I’d (incorrectly …) assume that the questioner was thinking of some simple rule that can be used to generate all of the valid sequences. Given the examples, I’d guess it was something like ‘x x+2 x+4’ or ‘2x 2(x+1) 2(x+2).’ Now, after I started typing this I realized that you could map all ascending 3 integer sequences to the whole numbers, so there is a “rule” that could be used to generate the solution, but nobody would look at the solution in these terms naturally—instead, we think of the solution as the set of sequences with the “property” of being in ascending order. If the questioner said that he was thinking of “a property which sequences of 3 numbers either have or lack,” rather than a “rule” which “governs” the sequences, I suspect more folks would discover the correct solution.

After seeing the four examples (including one that didn’t fit) given, it didn’t even occur to me that someone could think the first one indicated a X-2X-3X pattern. It’s hard to tell what will confirm and what will disconfirm in such a broad space of possibilities.

A bit off topic but after numerous incidents of mocking Eliezer, Mencius Moldbug has launched a full-scale assault on Bayesianism. He hasn’t shown any inclination to post his critiques here, but perhaps some of the luminaries here could show him the error of his ways.

I think the Wason selection task with cards is an even more direct demonstration of the tendency to seek confirmatory, but not disconfirmatory, tests of a hypothesis.

Why is it that I suspect Constant didn’t guess the rule properly?

Isn’t it the entire point of the post that confirmation bias is the tendency NOT TO CHECK ASSUMPTIONS?

Building on the previous commenter:

Through playing various games of this sort, people develop a prior on the space of rules which has a lot of mass around rules of the type “X,X+2,X+4” or “X,2X,3X”.

That is a good link, Ambitwistor. The last paragraph refers to an interesting psychological hypothesis, which I’d like to expand on in an example related to the “Look Into the Dark” post. Let’s rephrase EY’s proposition to give it more of a social “plot”.

“You’re a smuggler in a strange foreign land, where they only allow exports of goods in certain combinations of quantities, so as to keep their domestic lobby groups happy. [Yes, it’s a convulated example, but governments can be convoluted.] Trouble is, everyone knows the rule except you and your gang of smugglers, and if you ask, you become a suspect. Furthermore, you don’t actually know what you’re smuggling, since your fence always seals them in the standard export containers, which are numbered ordinally “First”, “Second”, and “Third”.

Since you’re an amoral smuggler boss, in charge of a lot of obedient “mule” underlings, you can send as many people through customs as you want and no matter how many get arrested, you won’t be a suspect. Also, you have an infinite number of empty export containers with the usual “First”, “Second” or “Third” labels. If your mule gets arrested with an empty container, he’ll be released immediately. So basically, you can test the rule all you want, since you’ll witness any arrest that happens.

Just as you and your team of criminals arrives at the customs checkpoint, a man goes into customs with 2 “First” boxes, 4 “Second” boxes, and 6 “Third” boxes. You can start making a tidy profit as soon you determine what the rule is. What is your next move?”

Granted, it is a convoluted example and I’m worried that in its current form it would just confuse too many test subjects. Perhaps someone would think of a more straightforward equivalent. The point, though, is to make the test sound less like the sort of rule we are familiar with from math class. As several posters have alluded, usually a rule in math class is much stricter and requires some arithmetic. A bureaucratic rule, convoluted though it may be, will often be mathematically simpler. E.g., “The extremely powerful pineapple lobby has pushed through a law requiring that no other fruit (papaya or mango) be exported in greater numbers than pineapples. Exports from the politically weak mango industry must not exceed papaya exports. Pineapples are labelled Third; papayas labelled Second; mangos labelled First.”

My hypothesis is that people will come up with this rule faster than they would when faced with the phrasing from the original post. (Of course, the “domestic lobby groups happy” phrasing is sort of a giveaway … maybe it should be replaced with a more neutral explanation, or none at all.)

It seems very normal to expect that the rule will be more restrictive or arithmetic in nature. But if I am supposed to be

sure of the rule, then I need to test more than just a few possibilities. Priors are definitely involved here.Part of the problem is that we are trained like Monkeys to make decisions on underspecified problems of this form all the time. I’ve hardly ever seen a “guess the next [number|letter|item] in the sequence problem that didn’t have multiple answers. But most of them have at least one answer that feels “right” in the sense of being simplest, most elegant or most obvious or within typical bounds given basic assumptions about problems of that type.

I’m the sort of accuracy-minded prick who would keep testing until he was very close to

certainwhat the rule was, and would probably take forever.An interesting version of this phenomenon is the game: “Bang! Who’s dead”. one person starts the game, says “Bang!”, and some number of people are metaphorically dead, based on a rule that the other participants are supposed to figure out (which is, AFAIK, the same every time, but I’m not saying it here). The only information that the starter will give is who is dead each time.

Took me forever to solve this, because I tend to have a much weaker version of the bias you consider here. But realistically, most of my mates solved this game much faster than I did. I suspect that this “jump to conclusions” bias is useful in many situations.

Eliezer, yes sometimes nature includes rare events, but

only rarely. We should evaluate human inference abilities on average across the kinds of cases humans face, and not just for rare surprising events.The plethora of incorrect hypothesis compared to the relatively few correct (so far) theories seem to speak against this.

Stuart, you

dohave a “nice ordering measure”—simpler hypotheses (“all ascending”) have a higher prior probability than complex ones (“all ascending OR one over ten trillion” or randomness). Positive testing of contradictory, high-prior-probability hypotheses is still negative testing of your original hypothesis, no?Is my idea correct why this is in Mysterious Answers?: Due to positive bias you don’t try to falsify a theory—and if a theory does not predict anything for the negative case, then it does not have any predictive value and thus is a mysterious answer.

Thought experiment. Suppose you have

twooracles, and your task is to find out whether or not they have the same rule. If each oracle is considered as “A lookup table produced by a coin flip for each possible input, except that there’s a 50% chance that the second is just a copy of the first” then of course any input is as likely as any other to exhibit a difference, and you can easily compute the probability of no difference afterntests fail to exhibit one. But if you have an assumption that simpler rules are more likely (eg. your prior is 2^-complexity) then what’s your optimal strategy?A plausible strategy is to follow the same strategy as you would if you had to find the rule of a single oracle; you always send the input that gives you the most bits about Oracle A’s rule. That way, you maximise the probability of exhibiting a difference given that one exists. So if you can generate an input which, under your current model of the space of A’s possible rules (and the probability of each), has exactly a 50% chance of matching A, then it also has a 50% chance of matching B; moreover these probabilities are independent, so you have 25%+25%=50% chance of exhibiting a difference. If instead you picked an input with a 30% chance of matching A, your chance of exhibiting a difference is 21%+21%=42%.

Flynn, you write, “Isn’t it the entire point of the post that confirmation bias is the tendency NOT TO CHECK ASSUMPTIONS?”

You simply can’t check all your assumptions in finite time in this task, which is a problem, because you must complete the task in a finite time. That is not your fault—that is intrinsic in the challenge. Therefore some of your assumptions will necessarily go untested—and they will necessarily be enormous assumptions. The reason for this is that the set of possible rules is too large—it’s infinite—and remains infinite no matter how much testing you do.

See also Stuart’s comment and Robin’s comment. I think they express major points I was trying to make, more clearly than I did.

I wonder why noone cares to mention Ockham’s Razor in this situation. As already a couple of times mentioned, there are infinite rules possible to describe a finite set of numbers. thereby we can only start at the least restricting rule possible and work our way farther in until we get to a point where we are not able to find a set of numbers working for our rule, but not for the rule to find within a certain interval of time. thereby i start by saying its all numbers. obviously ill find a couple of pairs not matching the correct rule. ill then start trying whole numbers. after that i might try ascending numbers or at least a>b or b>c… the only important thing to do here is to find the simplest solution still possible.

So i actually wouldnt try finding anything thats not fitting my assumptions, since there would be way more sets not fitting my assumption and not fitting the solution.

I think most people would come up with the correct answer ‘with extension’. Such as ‘increasing by 2 in ascending order’ where the correct answer ‘ascending order’ is the basis that they have then specified further. In my eyes they have then given a partially correct answer and should not strive so hard to ‘avoid this mistake’ in the future. My reasoning is that you might then ‘dismiss out of hand’ a partially correct answer and by default do the same to the ‘fully correct answer’. It is better then, to make a habit out of breaking down a hypotheses before dismissing it. Or you could just use up all your energy on convincing yourself that nothing should be believed, ever. Since belief means to know without proof.

Hey there! Welcome to Less Wrong!

I’d say you should read the Sequences, but that’s clearly what you’re doing :D. I’d suggest going ahead and introducing yourself over here.

I agree with you that some people might come up with the rule, but with unnecessary additions. The point of looking into the dark is that people may tend to add on to those extensions, when they should really be shaving them down to their core. And they can only do so (Or at least do so more effectively.) by looking into the dark.

Also, that’s not exactly the commonly accepted definition of “Belief” around here. For what most would think of when you refer to “belief” check out here, here, and the related The Simple Truth article, and really the entire Map and Territory sequence

Again, welcome!

Funny, “three numbers in ascending order” was the first hypothesis that popped in my mind.

One thing that helped me really get this one is testing software upgrades. It’s insanely tedious. Most stuff just keeps working. But if you

don’ttest, you’re justaskingfor something to come back and bite you in the backside.e.g.recent work example: upgrading Tomcat 6.0.16 to 6.0.29. Minor point release from the Apache Software Foundation, computer scientists famous for their dedication to engineering stability. I so didn’t want to bother testing this at all—days and days of tedium. Then this bit us—someone decided the letter of the spec beat mountains of real-world codein a stable branch maintenance release. And it’s in mountains of real-world code because of this. My opinion of Apache slipped somewhat. But my systems stayed up.I still hate lining up testing, but a few of these and you start to expand your map of chances large enough to mess you up. Sysadmins know that computers are evil and out to get them, and that the only way around this is not to give them the opportunity.

A friend of mine has a similar story involving why he

neverallows code-changes after code freeze dates, even if X, even if Y, even if Z. His story, however, involves avatars in a video game sorting their layers in strange ways on obscure video cards to cause breastplates to unexpectedly sort below breasts, which is why I still remember it.It’s like backups or freedom 0. Approximately no-one gets it until they’ve been bitten in real life. (I am particularly bad at learning

withoutdirect application of forehead to concrete, but am attempting to think more clearly.)I have two observations, one personal and one general:

Once, I tried to apply artificial neural nets on the task to evaluate positional situations in the game of Go. I did a very basic error, which was to train the net only on positive examples. The net quickly learned to give high scores for these, but then I tested on bad situations it still reported high scores. Maybe a little naive mistake, but you have to learn sometimes.

A very common example is testing of software. Usually, people pay much attention on testing the positive cases, and verifying that they work as they should. Less time is spent on testing things that should not work, sometimes resulting in programs that generates answers when it should not. The problem here is that testing the positive cases usually consists of a limited set, while the negative cases are almost infinite.

You need to do a lot more to demonstrate irrationality than this. Obviously, as other commenters have pointed out, there are an infinite number of rules that agree with any given finite sequence of experimental results so obviously you can never conclusively demonstrate that your rule is indeed the correct one. Moreover, you can’t even be ‘bias free’ in the sense of assigning all possible rules the same probability unless you want to assign each rule probability 0.

Now you might be tempted to just give up at this point but this is exactly the same problem we face when doing science. We have an infinite number of possible rules that extend the results we have seen so far and we need to guess which is most likely. Amazingly we do it pretty well but justifying it seems impossible, it’s the classical philosophical problem of induction.

In short it’s not clear anyone is ‘wrong’. Maybe they have a good initial probability distribution for what sorts of rules people normally pick. Heck it’s not even clear what it means to be ‘wrong’ in this sense, i.e., having an implausible a priori probability distribution

Teacher: ‘In ‘Beast and Man in India’ John Kipling describes a custom of how gypsies ransomed crows to Hindus. A gypsy would catch a crow, peg it on the ground spread-eagled so that it cannot escape, and when another bird would fly to attack it, the first one, defending itself, catches it with her legs. When the gypsy has enough crows, he goes to a shop of some rich Hindu and offers to let them go, for a price, or eat them for dinner. The Hindu pays one or two paisas for a bird. Let the p of the crow not flying away when it is pegged down be 90%, p of it catching another one be 95%. If by the end of the day the gypsy has 16 paisas, what is the least number of birds that will have been on the ground?′ …and the correct answer is zero:)

I wonder if this can not be partially explained by people wanting to answer quickly. The teacher says you can make as many guesses as you like, but we still instinctively feel like we do better if we do it faster.

Imagine the same test, but now with the last line reading: “You can make as many guesses as you like, but you get graded on how fast you get the right result”. With the rule it is a lot more rational to not spend too much time on verification of your hypothesized rule. I have no idea what the best strategy is, I guess it depends on your priors about the rule-space, but it probably does not involve spending a lot of questions on falsification.

My guess is that many people approach the problem as if it is of the above variety, even though it isn’t. So while positive bias no doubt plays a part, I think a desire to answer quickly also factors hugely.

This is testable. Give people a 10 dollar reward for giving the correct answer, and explicitly tell them that the number of guesses does not affect this reward. I hypothesize that the fraction of people getting the correct answer will go up significantly.

(I know this is a very old thread, but this sequence still features prominently on the site, so I have some hopes that people still read this occasionally :P)

I think there is a simple approach to handling these problems. First define a number than no one knows anything about. Say BB(10) where BB is the busy beaver function. No one knows anything much about the size of this number, whether its odd or even, etc. Then if someone yes yes to:

BB(10), BB(10) + 2, BB(10) + 4 you can infer they probably really are using rule: n, n+2, n+4.

If its not this rule they may need to say they can’t tell if the sequence follows the rules or not.

Unless they are using very general and hard to guess rules this method seems effective. An example of an absurdly hard to guess rule would be. “All numbers are less than BB(100)”

This doesn’t solve the problem. If you think the rule is n,2n,3n you could try BB(10), 2

BB(10), 3BB(10) but then the rule might really be: n,kn,(k+1)n for some k. But again this method seems to me like it would give you a way to check most “easy” rules. Or at least something like this is useful in testing your theories.Software design: if you are using a logic test, check on either side of the logic test, and also random answers.

is X > 5? if X is: 4: no 5: no 6: yes 5.00001: no 5.999999: yes −1: error error error “tomato”: error error error

taught me to always double check the hypothesis is not just a good fit, but a good enough fit for the purpose. If you never encounter a tomato, or decimals or negative numbers, then the test works fine. if you expect occasional tomatoes, and your test is looking for a positive integer. Maybe its time for a new test.

If I was advising an AI on how to solve this question, I might recommend guessing many sets of three random numbers, and just looking at the ratio of ‘yes’ to ‘no’. A result of

^{1}⁄_{6}yes, could then be matched against various rules and there ratios. This would greatly reduce the solution set, and ordering would likely jump to the front as a likely possibility.If I were answering the question for myself, I would likely try to break it, by that I mean get you to either add a new rule, or to say ‘I don’t know’. { e, i, pi }

I intuitively wanted to see if the combination 8-6-4 or 6-4-2 would be acceptable, without actually making a guess at the rule. I looked at the two acceptable answers and the one unacceptable answer and thought, okay, but that doesn’t prove a rule. The rule the experiment wants you to think about is a pattern like 2-4-6-8-10, so let’s see if something disproves that pattern. Would 6-4-2 be acceptable? Obviously, it wouldn’t. If I wasn’t under the influence of hindsight bias I

mightcontinue on to try and see if different intervals were not acceptable I.e. 2-2-2 until I could differentiate between ascending order and the intervals, but knowing me and the likelihood of anyone actually guessing the rule I would put that as a very low probability. Still, this strikes me as the kind of thing where it’s best to avoid bringing up a solution—get more information, and study and discuss the information, and then try to solve it. If people did this perhaps they would come closer to getting it right?Haha… And before I read this blog I thought I was irrational. Probably still am.

Are you searching for positive examples of positive bias right now, or sparing a fraction of your search on what positive bias should lead you to not see?Isn’t

what positive bias should lead you to not seea positive example of positive bias? Or am I explaining the joke?It seems much of our cognitive architecture was developed in the context of social situations. Indeed, the standard experiments on checking modus ponens and modus tollens understanding show sharp increases in ability when they are presented as social rules (e.g. http://en.wikipedia.org/wiki/Wason_selection_task checking whether someone is violating the “minor drinking alcohol” rules, rather than cards gives much higher performance). Testing whether you understand a social rule by deliberately violating your current understanding can be a very, very expensive test. It seems plausible that this cost has led to the human default ways for testing implicit rules to avoid seeking out these negatives, even when the cost would be low.

We’re good at reasoning with social situations, and bad with more abstract situations. As such, we can’t be doing them the same way. Something that helps in social situations is unlikely to cause a bias in more abstract situations.

In other words, our current architecture was developed in the context of social situations, and the fact that we do significantly better in those situations shows that it’s the only time we use it. Otherwise, we use different, lousy architecture that won’t exhibit the same biases.

I just want to summarize what I learned in this thread in order to ensure that I understand it. As I understand, the steps for determining the rule should be something like this:

See sequence.

What relations do the elements share? All are numbers, integers, even, differ by two, and are in ascending order. The rule is likelier to contain each (but not all) of these as a clause than not to.

If any relation you thought of belongs to a larger class, add that class.

Try to disconfirm each relation by creating sequences that violate only this relation (as well as its descendents, necessarily). Test general attributes first, since if they fail, the descendents can be considered impossible.

Create a candidate rule which consists of all relations that were not disconfirmed.

Offer the rule to the examiner.

Quite a bit more laborious than blurting out “n[i] = n[i-1]+2”, I have to admit.

But then n[i]=n[i-1]+2 is wrong, so...

Robin, I suspect that Eliezer has a different perspective on that, given his line of work. Availability bias on which biases to overcome? The creation of a seed AI is an event so rare that is has never happened (so far as we can tell), but failure to get it right on the first try could eliminate all life in the solar system. There is perhaps room for discussing average and better inference abilities with respect to common and rare events, although we would do well to be clear on exactly what we are arguing.

I meant that first comment to be more speculative than definite. I was speculating about an alternative explanation of the observed behavior, which locates the fault elsewhere.