# The Bat and Ball Problem Revisited

Cross posted from my per­sonal blog.

In this post, I’m go­ing to as­sume you’ve come across the Cog­ni­tive Reflec­tion Test be­fore and know the an­swers. If you haven’t, it’s only three quick ques­tions, go and do it now.

One of the strik­ing early ex­am­ples in Kah­ne­man’s Think­ing, Fast and Slow is the fol­low­ing prob­lem:

(1) A bat and a ball cost $1.10 in to­tal. The bat costs$1.00 more than the ball.

How much does the ball cost? _____ cents

This ques­tion first turns up in­for­mally in a pa­per by Kah­ne­man and Fred­er­ick, who find that most peo­ple get it wrong:

Al­most ev­ery­one we ask re­ports an ini­tial ten­dency to an­swer “10 cents” be­cause the sum $1.10 sep­a­rates nat­u­rally into$1 and 10 cents, and 10 cents is about the right mag­ni­tude. Many peo­ple yield to this im­me­di­ate im­pulse. The sur­pris­ingly high rate of er­rors in this easy prob­lem illus­trates how lightly Sys­tem 2 mon­i­tors the out­put of Sys­tem 1: peo­ple are not ac­cus­tomed to think­ing hard, and are of­ten con­tent to trust a plau­si­ble judg­ment that quickly comes to mind.

In Think­ing Fast and Slow, the bat and ball prob­lem is used as an in­tro­duc­tion to the ma­jor theme of the book: the dis­tinc­tion be­tween fluent, spon­ta­neous, fast ‘Sys­tem 1’ men­tal pro­cesses, and effort­ful, re­flec­tive and slow ‘Sys­tem 2’ ones. The ex­plicit moral is that we are too will­ing to lean on Sys­tem 1, and this gets us into trou­ble:

The bat-and-ball prob­lem is our first en­counter with an ob­ser­va­tion that will be a re­cur­rent theme of this book: many peo­ple are over­con­fi­dent, prone to place too much faith in their in­tu­itions. They ap­par­ently find cog­ni­tive effort at least mildly un­pleas­ant and avoid it as much as pos­si­ble.

This story is very com­pel­ling in the case of the bat and ball prob­lem. I got this prob­lem wrong my­self when I first saw it, and still find the in­tu­itive-but-wrong an­swer very plau­si­ble look­ing. I have to con­sciously re­mind my­self to ap­ply some ex­tra effort and get the cor­rect an­swer.

How­ever, this be­comes more com­pli­cated when you start con­sid­er­ing other tests of this fast-vs-slow dis­tinc­tion. Fred­er­ick later com­bined the bat and ball prob­lem with two other ques­tions to cre­ate the Cog­ni­tive Reflec­tion Test:

(2) If it takes 5 ma­chines 5 min­utes to make 5 wid­gets, how long would it take 100 ma­chines to make 100 wid­gets? _____ minutes

(3) In a lake, there is a patch of lily pads. Every day, the patch dou­bles in size. If it takes 48 days for the patch to cover the en­tire lake, how long would it take for the patch to cover half of the lake? _____ days

Th­ese are de­signed to also have an ‘in­tu­itive-but-wrong’ an­swer (100 min­utes, 24 days), and an ‘effort­ful-but-right’ an­swer (5 min­utes, 47 days). But this time I seem to be im­mune to the wrong an­swers, in a way that just doesn’t hap­pen with the bat and ball:

I always have the same re­ac­tion, and I don’t know if it’s com­mon or I’m just the lone idiot with this prob­lem. The ‘ob­vi­ous wrong an­swers’ for 2. and 3. are com­pletely un­ap­peal­ing to me (I had to look up 3. to check what the ob­vi­ous an­swer was sup­posed to be). Ob­vi­ously the ma­chine-wid­get ra­tio hasn’t changed, and ob­vi­ously ex­po­nen­tial growth works like ex­po­nen­tial growth.

When I see 1., how­ever, I always think ‘oh it’s that bas­tard bat and ball ques­tion again, I know the cor­rect an­swer but can­not see it’. And I have to stare at it for a minute or so to work it out, slowed down dra­mat­i­cally by the fact that Ob­vi­ous Wrong An­swer is jump­ing up and down try­ing to dis­tract me.

If this test was re­ally test­ing my propen­sity for effort­ful thought over spon­ta­neous in­tu­ition, I ought to score zero. I hate effort­ful thought! As it is, I score two out of three, be­cause I’ve trained my in­tu­itions nicely for ra­tios and ex­po­nen­tial growth. The ‘in­tu­itive’, ‘Sys­tem 1’ an­swer that pops into my head is, in fact, the cor­rect an­swer, and the sup­pos­edly ‘in­tu­itive-but-wrong’ an­swers feel bad on a visceral level. (Why the hell would the lily pads take the same amount of time to cover the sec­ond half of the lake as the first half, when the rate of growth is in­creas­ing?)

The bat and ball still gets me, though. My gut hasn’t in­ter­nal­ised any­thing use­ful, and it’s su­per keen on shout­ing out the wrong an­swer in a dis­tract­ing way. My dis­like for effort­ful thought is definitely a prob­lem here.

I wanted to see if oth­ers had raised the same ob­jec­tion, so I started do­ing some re­search into the CRT. In the pro­cess I dis­cov­ered a lot of fol­low-up work that makes the story much more com­plex and in­ter­est­ing.

I’ve come nowhere near to do­ing a proper liter­a­ture re­view. Fred­er­ick’s origi­nal pa­per has been cited nearly 3000 times, and dredg­ing through that for the good bits is a lot more work than I’m will­ing to put in. This is just a sum­mary of the in­ter­est­ing stuff I found on my limited, par­tial dig through the liter­a­ture.

# Think­ing, in­her­ently fast and in­her­ently slow

Fred­er­ick’s origi­nal Cog­ni­tive Reflec­tion Test pa­per de­scribes the Sys­tem 1/​Sys­tem 2 di­vide in the fol­low­ing way:

Rec­og­niz­ing that the face of the per­son en­ter­ing the class­room be­longs to your math teacher in­volves Sys­tem 1 pro­cesses — it oc­curs in­stantly and effortlessly and is un­af­fected by in­tel­lect, alert­ness, mo­ti­va­tion or the difficulty of the math prob­lem be­ing at­tempted at the time. Con­versely, find­ing to two dec­i­mal places with­out a calcu­la­tor in­volves Sys­tem 2 pro­cesses — men­tal op­er­a­tions re­quiring effort, mo­ti­va­tion, con­cen­tra­tion, and the ex­e­cu­tion of learned rules.

I find it in­ter­est­ing that he frames men­tal pro­cesses as be­ing in­her­ently effortless or effort­ful, in­de­pen­dent of the per­son do­ing the think­ing. This is not quite true even for the ex­am­ples he gives — face­blind peo­ple and calcu­lat­ing prodi­gies ex­ist.

This fram­ing is im­por­tant for in­ter­pret­ing the CRT. If the prob­lem in­her­ently has a wrong ‘Sys­tem 1 solu­tion’ and a cor­rect ‘Sys­tem 2 solu­tion’, the CRT can work as in­tended, as an effi­cient tool to split peo­ple by their propen­sity to use one strat­egy or the other. If there are ‘Sys­tem 1’ ways to get the cor­rect an­swer, the whole thing gets much more mud­dled, and it’s hard to dis­en­tan­gle nat­u­ral propen­sity to re­flec­tion from prior ex­po­sure to the right math­e­mat­i­cal con­cepts.

My ten­ta­tive guess is that the bat and ball prob­lem is close to be­ing this kind of effi­cient tool. Although in some ways it’s the sim­plest of the three prob­lems, solv­ing it in a ‘fast’, ‘in­tu­itive’ way re­lies on see­ing the prob­lem in a way that most peo­ple’s ed­u­ca­tion won’t have pro­vided. (I think this is true, any­way—I’ll go into more de­tail later.) I sus­pect that this is less true the other two prob­lems—ra­tios and ex­po­nen­tial growth are top­ics that a math­e­mat­i­cal or sci­en­tific ed­u­ca­tion is more likely to build in­tu­ition for.

(Aside: I’d like to know how these other two prob­lems were cho­sen. The pa­per just states the fol­low­ing:

Mo­ti­vated by this re­sult [the an­swers to the bat and ball ques­tion], two other prob­lems found to yield im­pul­sive er­ro­neous re­sponses were in­cluded with the “bat and ball” prob­lem to form a sim­ple, three-item “Cog­ni­tive Reflec­tion Test” (CRT), shown in Figure 1.

I have a vague sus­pi­cion that Fred­er­ick trawled through some­thing like ‘The Bumper Book of An­noy­ing Rid­dles’ to find some brain­teasers that don’t re­quire too much in the way of math­e­mat­i­cal pre­req­ui­sites. The lily­pads one has a fam­ily re­sem­blance to the clas­sic grains-of-wheat-on-a-chess­board puz­zle, for in­stance.)

How­ever, I haven’t found any great ev­i­dence ei­ther way for this guess. The origi­nal pa­per doesn’t break down par­ti­ci­pants’ scores by ques­tion – it just gives mean scores on the test as a whole. I did how­ever find this meta-anal­y­sis of 118 CRT stud­ies, which shows that the bat and ball ques­tion is the most difficult on av­er­age – only 32% of all par­ti­ci­pants get it right, com­pared with 40% for the wid­gets and 48% for the lily­pads. It also has the biggest jump in suc­cess rate when com­par­ing uni­ver­sity stu­dents with non-stu­dents. That looks like bet­ter math­e­mat­i­cal ed­u­ca­tion does help on the bat and ball, but it doesn’t clear up how it helps. It could im­prove par­ti­ci­pants’ abil­ity to in­tu­itively see the an­swer. Or it could im­prove abil­ity to come up with an ‘un­in­tu­itive’ solu­tion, like solv­ing the cor­re­spond­ing si­mul­ta­neous equa­tions by a rote method.

What I’d re­ally like is some in­sight into what in­di­vi­d­ual peo­ple ac­tu­ally do when they try to solve the prob­lems, rather than just this ag­gre­gate statis­ti­cal in­for­ma­tion. I haven’t found ex­actly what I wanted, but I did turn up a few in­ter­est­ing stud­ies on the way.

# No, se­ri­ously, the an­swer isn’t ten cents

My favourite thing I found was this (ap­par­ently un­pub­lished) ‘ex­tremely rough draft’ by Meyer, Spunt and Fred­er­ick from 2013, re­vis­it­ing the bat and ball prob­lem. The in­tu­itive-but-wrong an­swer turns out to be ex­tremely sticky, and the pa­per is ba­si­cally a se­ries of in­creas­ingly des­per­ate at­tempts to get peo­ple to ac­tu­ally think about the ques­tion.

One con­jec­ture for what peo­ple are do­ing when they get this ques­tion wrong is the at­tribute sub­sti­tu­tion hy­poth­e­sis. This was sug­gested early on by Kah­ne­man and Fred­er­ick, and is a fancy way of say­ing that they are in­stead solv­ing the fol­low­ing sim­pler prob­lem:

(1) A bat and a ball cost $1.10 in to­tal. The bat costs$1.00.

How much does the ball cost? _____ cents

No­tice that this is miss­ing the ‘more than the ball’ clause at the end, turn­ing the ques­tion into a much sim­pler ar­ith­metic prob­lem. This sim­ple prob­lem does have ‘ten cents’ as the an­swer, so it’s very plau­si­ble that peo­ple are get­ting con­fused by it.

Meyer, Spunt and Fred­er­ick tested this hy­poth­e­sis by get­ting re­spon­dents to re­call the prob­lem from mem­ory. This showed a clear differ­ence: 94% of ‘five cent’ re­spon­dents could re­call the cor­rect ques­tion, but only 61% of ‘ten cent’ re­spon­dents. It’s pos­si­ble that there is a differ­ent com­mon cause of both the ‘ten cent’ re­sponse and mis­re­mem­ber­ing the ques­tion, but it at least gives some sup­port for the sub­sti­tu­tion hy­poth­e­sis.

How­ever, get­ting peo­ple to ac­tu­ally an­swer the ques­tion cor­rectly was a much more difficult prob­lem. First they tried bold­ing the words more than the ball to make this clause more salient. This made sur­pris­ingly lit­tle im­pact: 29% of re­spon­dents solved it, com­pared with 24% for the origi­nal prob­lem. Print­ing both ver­sions was slightly more suc­cess­ful, bump­ing up the cor­rect re­sponse to 35%, but it was still a small effect.

After this, they ditched sub­tlety and re­sorted to past­ing these huge warn­ings above the ques­tion:

Th­ese were still only mildly effec­tive, with a cor­rect solu­tion jump­ing to 50% from 45%. Peo­ple just re­ally like the an­swer ‘ten cents’, it seems.

At this point they com­pletely gave up and just flat out added “HINT: 10 cents is not the an­swer.” This worked rea­son­ably well, though there was still a hard core of 13% who per­sisted in writ­ing down ‘ten cents’.

That’s where they left it. At this point there’s not re­ally any room to es­ca­late be­yond con­fis­cat­ing the re­spon­dents’ pens and pre­filling in the an­swer ‘five cents’, and I worry that some­body would still try and scratch in ‘ten cents’ in their own blood. The wrong an­swer is just in­cred­ibly com­pel­ling.

# So, what are peo­ple do­ing when they solve this prob­lem?

Un­for­tu­nately, it’s hard to tell from the pub­lished liter­a­ture (or at least what I found of it). What I’d re­ally like is lots of tran­scripts of in­di­vi­d­u­als talk­ing through their prob­lem solv­ing pro­cess. The clos­est I found was this pa­per by Sza­szi et al, who did carry out these sort of in­ter­view, but it doesn’t in­clude any ex­am­ples of in­di­vi­d­ual re­sponses. In­stead, it gives a ag­gre­gated overview of types of re­sponses, which doesn’t go into the kind of de­tail I’d like.

Still, the ex­am­ples given for their re­sponse cat­e­gories give a few clues. The cat­e­gories are:

• Cor­rect an­swer, cor­rect start. Ex­am­ple given: ‘I see. This is an equa­tion. Thus if the ball equals to x, the bat equals to x plus 1… ’

• Cor­rect an­swer, in­cor­rect start. Ex­am­ple: ‘I would say 10 cents… But this can­not be true as it does not sum up to €1.10...’

• In­cor­rect an­swer, re­flec­tive, i.e. some effort was made to re­con­sider the an­swer given, even if it was ul­ti­mately in­cor­rect. Ex­am­ple: ‘… but I’m not sure… If to­gether they cost €1.10, and the bat costs €1 more than the ball… the solu­tion should be 10 cents. I’m done.’

• No re­flec­tion. Ex­am­ple: ‘Ok. I’m done.’

Th­ese demon­strate one way to rea­son your way to the cor­rect an­swer (solve the si­mul­ta­neous equa­tions) and one way to be wrong (just blurt out the an­swer). They also demon­strate one way to re­cover from an in­cor­rect solu­tion (think about the an­swer you blurted out and see if it ac­tu­ally works). Still, it’s all rather ab­stract and high level.

# How To Solve It

How­ever, I did man­age to stum­ble onto an­other source of in­sight. While re­search­ing the prob­lem I came across this ar­ti­cle from the on­line mag­a­z­ine of the As­so­ci­a­tion for Psy­cholog­i­cal Science, which dis­cusses a var­i­ant ‘Ford and Fer­rari prob­lem’. This is quite in­ter­est­ing in it­self, but I was most ex­cited by the com­ments sec­tion. Fi­nally some ex­am­ples of how the prob­lem is solved in the wild!

The sim­plest ‘an­a­lyt­i­cal’, ‘Sys­tem 2’ solu­tion is to rewrite the prob­lem as two si­mul­ta­neous lin­ear equa­tions and plug-and-chug your way to the cor­rect an­swer. For ex­am­ple, writ­ing for the bat and for the ball, we get the two equations

, ,

which we could then solve in var­i­ous stan­dard ways, e.g.

, ,

which then gives

.

There are a cou­ple of var­i­ants of this ex­plained in the com­ments. It’s a very re­li­able way to tackle the prob­lem: if you already know how to do this sort of rote method, there are no sur­prises. This sort of method would work for any similar prob­lem in­volv­ing lin­ear equa­tions.

How­ever, it’s pretty ob­vi­ous that a lot of peo­ple won’t have ac­cess to this method. Plenty of peo­ple noped out of math­e­mat­ics long be­fore they got to si­mul­ta­neous equa­tions, so they won’t be able to solve it this way. What might be less ob­vi­ous, at least if you mostly live in a high-maths-abil­ity bub­ble, is that these peo­ple may also be miss­ing the sort of tacit math­e­mat­i­cal back­ground that would even al­low them to frame the prob­lem in a use­ful form in the first place.

That sounds a bit ab­stract, so let’s look at some re­sponses (I’ll paste all these straight in, so any ty­pos are in the origi­nal). First, we have these two con­fused com­menters:

The thing is, why does the ball have to be $.05? It could have been .04 0r.03 and the bat would still cost more than$1.

and

This is ex­actly what both­ers me and re­sulted in me want­ing to look up the ques­tion on­line. On the quiz the other 2 ques­tions were defini­tive. This one tech­ni­cally could have more than one an­swer so this is where phy­col­o­gists ac­tu­ally mess up when try­ing to give us a trick ques­tion. The ball at .4 and the bat at 1.06 doesn’t break the rule ei­ther.

Th­ese com­menters don’t au­to­mat­i­cally see two equa­tions in two vari­ables that to­gether are enough to con­strain the prob­lem. In­stead they seem to fo­cus mainly on the first con­di­tion (adding up to $1.10) and just use the sec­ond one as a vague check at best (‘the bat would still cost more than$1’). This means that they are un­able to im­me­di­ately tell that the prob­lem has a unique solu­tion.

In re­sponse, an­other com­menter, Tony, sug­gests a cor­rect solu­tion which is an in­ter­est­ing mix of writ­ing the prob­lem out for­mally and then figur­ing out the an­swer by trial and er­ror:\

I hear your pain. I feel as though psy­chol­o­gists and psy­chi­a­trists get to­gether ev­ery now and then to prove how stoopid I am. How­ever, af­ter more than a lit­tle head scratch­ing I’ve gained an un­der­stand­ing of this puz­zle. It can be ex­pressed as two facts and a ques­tion A=100+B and A+B=110, so B=? If B=2 then the solu­tion would be 100+2+2 and A+B would be 104. If B=6 then the solu­tion would be 100+6+6 and A+B would be 112. But as be KNOW A+B=110 the only num­ber for B on it’s own is 5.

This sug­gests enough half-re­mem­bered math­e­mat­i­cal knowl­edge to find a sen­si­ble ab­stract fram­ing, but not enough to solve it the stan­dard way.

Fi­nally, com­menter Marlo Eu­gene pro­vides an in­ge­nious way of solv­ing the prob­lem with­out writ­ing all the alge­braic steps out:

Lin­guis­tics makes all the differ­ence. The con­cep­tual em­pha­sis seems to lie within the word MORE.

X + Y = $1.10. If X =$1 MORE then that leaves $0.10 TO WORK WITH rather than au­to­mat­i­cally as­sign to Y So you di­vide the re­main­der equally (as­sum­ing nega­tive val­ues are dis­qual­ified) and get 0.05. So even this small sam­ple of com­ments sug­gests a wide di­ver­sity of prob­lem-solv­ing meth­ods lead­ing to the two com­mon an­swers. Fur­ther, these solu­tions don’t all split neatly into ‘Sys­tem 1’ ‘in­tu­itive’ and ‘Sys­tem 2’ ‘an­a­lytic’. Marlo Eu­gene’s solu­tion, for in­stance, is a mixed solu­tion of writ­ing the equa­tions down in a for­mal way, but then find­ing a clever way of just see­ing the an­swer rather than solv­ing them by rote. I’d still ap­pre­ci­ate more de­tailed tran­scripts, in­clud­ing the time taken to solve the prob­lem. My sus­pi­cion is still that very few peo­ple solve this prob­lem with a fast in­tu­itive re­sponse, in the way that I rapidly see the cor­rect an­swer to the lily­pad ques­tion. Even the more ‘in­tu­itive’ re­sponses, like Marlo Eu­gene’s, seem to rely on some ini­tial care­ful re­flec­tion and a good ini­tial fram­ing of the prob­lem. If I’m cor­rect about this lack of fast re­sponses, my ten­ta­tive guess for the rea­son is that it has some­thing to do with the way most of us learn si­mul­ta­neous equa­tions in school. We gen­er­ally learn ar­ith­metic as young chil­dren in a fairly con­crete way, with the for­mal nu­mer­i­cal prob­lems sup­ple­mented with lots of spe­cific ex­am­ples of adding up ap­ples and ba­nanas and so forth. But then, for some rea­son, this goes com­pletely out of the win­dow once the un­known quan­tity isn’t sit­ting on its own on one side of the equals sign. This is in­stead hived off into its own sep­a­rate sub­ject, called ‘alge­bra’, and the rules are taught much later in a much more for­mal­ised style, with­out much at­tempt to build up in­tu­ition first. (One ex­cep­tion is the sort of puz­zle sheets that are of­ten given to young kids, where the un­knowns are just empty boxes to be filled in. Some­times you get 2+3=□, some­times it’s 2+□=5, but ei­ther way you go about the same pro­cess of us­ing your wits to figure out the an­swer. Then, for some rea­son I’ll never un­der­stand, the work­sheets get put away and the poor kids don’t see the sub­ject again un­til years later, when the box is now called for some rea­son and you have to find the an­swer by defined rules. Any­way, this is a sep­a­rate rant.) This lack of a rich back­ground in puz­zling out the an­swer to spe­cific con­crete prob­lems means most of us lean hard on for­mal rules in this do­main, even if we’re rel­a­tively math­e­mat­i­cally so­phis­ti­cated. Only a few build up the nec­es­sary reper­toire of tricks to solve the prob­lem quickly by in­sight. I’m re­minded of a story in Feyn­man’s The Plea­sure of Find­ing Things Out: Around that time my cousin, who was three years older, was in high school. He was hav­ing con­sid­er­able difficulty with his alge­bra, so a tu­tor would come. I was al­lowed to sit in a cor­ner while the tu­tor would try to teach my cousin alge­bra. I’d hear him talk­ing about x. I said to my cousin, “What are you try­ing to do?” “I’m try­ing to find out what x is, like in 2x + 7 = 15.” I say, “You mean 4.” “Yeah, but you did it by ar­ith­metic. You have to do it by alge­bra.” I learned alge­bra, for­tu­nately, not by go­ing to school, but by find­ing my aunt’s old school­book in the at­tic, and un­der­stand­ing that the whole idea was to find out what x is—it doesn’t make any differ­ence how you do it. I think this re­li­ance on for­mal meth­ods might be some­what less true for ex­po­nen­tial growth and ra­tios, the sub­jects un­der­pin­ning the lily­pad and wid­get ques­tions. Cer­tainly I seem to have bet­ter in­tu­ition there, with­out hav­ing to re­sort to rote calcu­la­tion. But I’m not sure how gen­eral this is. # How To Vi­su­al­ise It If you wanted to solve the bat and ball prob­lem with­out hav­ing to ‘do it by alge­bra’, how would you go about it? My origi­nal post on the prob­lem was a pretty quick, throw­away job, but over time it picked up some truly ex­cel­lent com­ments by an­ders and Kyzen­tun, which re­ally start to dig into the struc­ture of the prob­lem and sug­gest ways to ‘just see’ the an­swer. The thread with an­ders in par­tic­u­lar goes into lots of other ex­am­ples of how we think through solv­ing var­i­ous prob­lems, and is well worth read­ing in full. I’ll only sum­marise the bat-and-ball-re­lated parts of the com­ments here. We all used some var­i­ant of the method sug­gested by Marlo Eu­gene in the com­ments above. Writ­ing out the ba­sic prob­lem again, we have: , . Now, in­stead of im­me­di­ately jump­ing to the stan­dard method of elimi­nat­ing one of the vari­ables, we can just look at what these two equa­tions are say­ing and solve it di­rectly ‘by think­ing’. We have a bat, . If you add the price of the ball, , you get 110 cents. If you in­stead re­move the same quan­tity you get 100 cents. So the bat’s price must be ex­actly halfway be­tween these two num­bers, at 105 cents. That leaves five for the ball. Now that I’m think­ing of the prob­lem in this way, I di­rectly see the equa­tions as be­ing ‘about a bat that’s halfway be­tween 100 and 110 cents’, and the an­swer is in­cred­ibly ob­vi­ous. Kyzen­tun sug­gests a var­i­ant on the prob­lem that is much less coun­ter­in­tu­itive than the origi­nal: A cen­tered piece of text and its mar­gins are 110 columns wide. The text is 100 columns wide. How wide is one mar­gin? Same num­bers, same math­e­mat­i­cal for­mula to reach the solu­tion. But less mis­lead­ing be­cause you know there are two mar­gins, and thus know to di­vide by two af­ter sub­tract­ing. In the origi­nal prob­lem, the 110 units and 100 units both re­fer to some­thing ab­stract, the sum and differ­ence of the bat and ball. In Kyzen­tun’s ver­sion these be­come much more con­crete ob­jects, the width of the text and the to­tal width of the mar­gins. The work of see­ing the equa­tions as re­lat­ing to some­thing con­crete has mostly been done for you. Similarly, an­ders works the prob­lem by ‘get­ting rid of the 100 cents’, and split­ting the re­main­der in half to get at the price of the ball: I just had an easy time with #1 which I haven’t be­fore. What I did was take away the differ­ence so that all the items are the same (sub­tract 100), evenly di­vide the re­main­der among the items (di­vide 10 by 2) and then add the resi­d­u­als back on to get 105 and 5. The heuris­tic I seem to be us­ing is to treat ob­jects as made up of a value plus a resi­d­ual. So when they gave me the resi­d­ual my next thought was “now all the ob­jects are the same, so what­ever I do to one I do to all of them”. I think that af­ter rea­son­ing my way through all these per­spec­tives, I’m fi­nally at the point where I have a quick, ‘in­tu­itive’ un­der­stand­ing of the prob­lem. But it’s sur­pris­ing how much work it was for such a sim­ple bit of alge­bra. # Fi­nal thoughts Rather than mak­ing any big con­clu­sions, the main thing I wanted to demon­strate in this post is how com­pli­cated the story gets when you look at one prob­lem in de­tail. I’ve writ­ten about close read­ing re­cently, and this has been some­thing like a close read­ing of the bat and ball prob­lem. Fred­er­ick’s origi­nal pa­per on the Cog­ni­tive Reflec­tion Test is in that generic so­cial sci­ence style where you define a new met­ric and then see how it cor­re­lates with a bunch of other macroscale fac­tors (ei­ther big so­cial cat­e­gories like gen­der or ed­u­ca­tion level, or the re­sults of other statis­ti­cal tests that try to mea­sure fac­tors like time prefer­ence or risk prefer­ence). There’s a strange in­differ­ence to the de­tails of the test it­self – at no point does he dis­cuss why he picked those spe­cific three ques­tions, and there’s no at­tempt to model what was mak­ing the in­tu­itive-but-wrong an­swer ap­peal­ing. The later pa­per by Meyer, Spunt and Fred­er­ick is much more in­ter­est­ing to me, be­cause it re­ally starts to pick apart the speci­fics of the bat and ball prob­lem. Is an eas­ier ques­tion get­ting sub­sti­tuted? Can par­ti­ci­pants re­pro­duce the cor­rect ques­tion from mem­ory? I learned the most from the in­di­vi­d­ual re­sponses, though. This is where you re­ally get to see the va­ri­ety of ways that peo­ple tackle the prob­lem. Care­ful re­flec­tion definitely seems to im­prove the chance of a cor­rect an­swer in gen­eral, but many of the re­sponses don’t re­ally fit the neat ‘fast vs slow’ di­vi­sion of the origi­nal setup. # Questions I’m in­ter­ested in any com­ments on the post, but here are a few spe­cific things I’d like to get your an­swers to: • My rapid, in­tu­itive an­swer for the bat and ball ques­tion is wrong (at least un­til I re­trained it by think­ing about the prob­lem way too much). How­ever, for the other two I ‘just see’ the cor­rect an­swer. Is this com­mon for other peo­ple, or do you have a differ­ent split? • If you’re able to rapidly ‘just see’ the an­swer to the bat and ball ques­tion, how do you do it? • How do peo­ple go about de­sign­ing tests like these? This isn’t at all my field and I’d be in­ter­ested in any good sources. I’d kind of as­sumed that there’d be some kind of se­ri­ous-busi­ness Test Creation Method­ol­ogy, but for the CRT at least it looks like peo­ple just no­ticed they got sur­pris­ing an­swers for the bat and ball ques­tion and looked around for similar ques­tions. Is that un­usual com­pared to other psy­cholog­i­cal tests? • My daugh­ter is just start­ing to learn sub­trac­tion. She was very frus­trated by it, and if I ver­bally asked “What’s seven minus five?” she was about 50% likely to give the right an­swer. I asked her a se­quence of sim­ple sub­trac­tion prob­lems and she con­sis­tently performed at about that level. In the course of our back and forth I switch my phras­ing to the form “You have seven ap­ples and you take away five, how many left?” and she im­me­di­ately started an­swer­ing the ques­tions 100% cor­rectly, very rapidly too. Ex­per­i­men­tally I switched back to the prior form and she started get­ting them wrong again. It was ap­par­ent to me that sim­ply phras­ing the prob­lem in terms of con­crete ob­jects was ac­ti­vat­ing some­thing like vi­su­al­iza­tion which made the prob­lems easy, and just phras­ing it as ab­stract num­bers was failing to ac­ti­vate this switch. So as you say, for more tricky ar­ith­metic prob­lems, it may be the case that what men­tal cir­cuits are “ac­ti­vated au­to­mat­i­cally” de­ter­mine the first an­swer you ar­rive at, and you can ex­ploit that effect with edge cases like this. • Strangely, it can some­times also go the other way! One of my most eye-open­ing teach­ing ex­pe­riences oc­curred when I was helping a six-year-old who was strug­gling with ba­sic ad­di­tion – or so it ap­peared. She was try­ing to work through a book that helped her to the con­cept of ad­di­tion via var­i­ous ex­am­ples such as “If Nel­lie has three ap­ples and is then given two more, how many ap­ples does she have?” The poor lit­tle girl didn’t have a clue. How­ever, af­ter spend­ing a short time with her I dis­cov­ered that she could do 3+2 with no prob­lem what­so­ever. In fact, she had no trou­ble with ad­di­tion. She just couldn’t get her head around all these wretched ap­ples, cakes, mon­keys etc that were be­ing used to “ex­plain” the con­cept of ad­di­tion to her. She needed to work through the book al­most “back­wards” – I had to help her un­der­stand that adding up ap­ples was just an ex­am­ple of an ab­stract ad­di­tion she could do perfectly well! Her prob­lem was that all the books for six-year-olds went the other way round. I think this is un­usual though. • Ooh, I’d for­got­ten about that test, and how the beer ver­sion was much eas­ier—that would be an­other good one to read up on. • I sus­pect that this is less true the other two prob­lems—ra­tios and ex­po­nen­tial growth are top­ics that a math­e­mat­i­cal or sci­en­tific ed­u­ca­tion is more likely to build in­tu­ition for. This seems to be con­tra­dicted by: the bat and ball ques­tion is the most difficult on av­er­age – only 32% of all par­ti­ci­pants get it right, com­pared with 40% for the wid­gets and 48% for the lily­pads. It also has the biggest jump in suc­cess rate when com­par­ing uni­ver­sity stu­dents with non-stu­dents. • Ah yeah, I meant to make this bit clearer and for­got. I’m not re­ally sure what to make of that state­ment you put in ital­ics. The jump in suc­cess rate could be down to bet­ter trained in­tu­ition. It could also be due to bet­ter ac­cess to for­mal meth­ods. I don’t re­ally see it as good ev­i­dence for my guess ei­ther way. If I get more time later I’ll edit the post. • I didn’t “just see” the an­swers to the ques­tions the first time I saw them, but nei­ther would I say that I had to solve them en­tirely for­mally. It was more like dock­ing a boat—the river keeps tug­ging at the tail end, un­til you feel the boat’s side touch the berth and know it has stopped. There’s a kind of nat­u­ral in­er­tia to this kind of puz­zles. Also, there is a kind of prob­lems like “one wallet con­tains ten coins, an­other one con­tains twice more, and the to­tal is twenty; ex­plain” that get asked much ear­lier than kids learn alge­bra, if I re­mem­ber right. But it gets dis­missed, in favour of cases where you must learn not to count the same bits of ev­i­dence twice (cough Bayes cough). I like to think this dis­mis­sal bites peo­ple in the back­side when they learn Men­delian ge­net­ics (more eas­ily seen when the genes in ques­tion in­ter­act hi­er­ar­chi­cally) or, Mer­lin for­bid, mass-spec­trom­e­try, where the math difficulty is com­pli­cated by the chem difficulty of molecules not di­vid­ing into usual sub­units. Whew, I was think­ing to write a sep­a­rate post on this, but now I don’t have to! Profit! • It’s been may years since I first saw this ques­tion, so my mem­o­ries may not be ac­cu­rate, but I think my in­ter­nal thoughts went some­thing like this: ‘Well 1.10 minus 1 is .10, but wait I know this is a trick ques­tion so … Ah! I also need to di­vide by 2. The an­swer is .05.’ And then I checked my an­swer by do­ing 1.05 + .05 and 1.05 - .05. In­tro­spect­ing now on why I leaped to the idea of di­vid­ing by two, I think what I was see­ing was some­thing like: In this con­text “costs$1.00 more than” means Ex­actly $1 more than, so it’s say­ing that with­out the$1 the two things are equal and you need to di­vide the cost be­tween them.

This makes me think of or­di­nary real life con­texts where I would say “costs $1.00 (or$20 or $100) more than.” It seems pos­si­ble it might be clear to both me and my listener I meant ‘at least x more than,’ ‘as much as x more than,’ or ‘ap­prox­i­mately x more than.’ I won­der if chang­ing the word­ing to “The bat costs ex­actly$1.00 more than the ball” would help any.

• This is ex­actly what both­ers me and re­sulted in me want­ing to look up the ques­tion on­line. On the quiz the other 2 ques­tions were defini­tive. This one tech­ni­cally could have more than one an­swer so this is where phy­col­o­gists ac­tu­ally mess up when try­ing to give us a trick ques­tion. The ball at .4 and the bat at 1.06 doesn’t break the rule ei­ther.

In­ter­est­ing: these could cover a cou­ple of mi­s­un­der­stand­ings, one is that B>=100, the other that “The bat costs $1.00 more than the ball” does not mean B-b=100, but that B-b>=100. In or­di­nary lan­guage, “that costs$1.00 more than the other one” is not in­cor­rect if the differ­ence is $1.01. I sus­pect that per­son would have been cor­rected by say­ing “the bat costs pre­cisely one dol­lar more than the ball” • I have the same ex­pe­rience as you, dross­bucket: my rapid an­swer to (1) was the com­mon in­cor­rect an­swer, but for (2) and (3) my in­tu­ition is well-honed. A pos­si­ble rea­son for this is that the in­tu­itive but in­cor­rect an­swer in (1) is a de­cent ap­prox­i­ma­tion to the cor­rect an­swer, whereas the com­mon in­cor­rect an­swers in (2) and (3) are wildly off the cor­rect an­swer. For (1) I have to ex­plic­itly do a calcu­la­tion to ver­ify the in­cor­rect­ness of the rapid an­swer, whereas in (2) and (3) my un­der­stand­ing of the situ­a­tion im­me­di­ately rules out the in­cor­rect an­swers. Here are ques­tions which might be similar to (I): (4a) I booked seats J23 to J29 in a cin­ema. How many seats have I booked? (4b) There is a 20m fence in which the fence posts are 2m apart. How many fence posts are there? (4c) How many num­bers are there in this list: 200,201,202,203,204,...,300. (5) In 24 hours, how many times do the hour-hand and minute-hand of a stan­dard clock over­lap? (6) You are in a race and you just over­take sec­ond place. What is your new po­si­tion in the race? • The bat and ball prob­lem I an­swer in what I’ll call one con­scious time-step with the cor­rect “five cents”, but it hap­pens too fast for me to ver­ify how (be­yond the usual trou­ble with ver­ify­ing in­ter­nal re­flec­tion). I would spec­u­late, in de­creas­ing or­der of in­tu­itive prob­a­bil­ity, that in or­der to get the an­swer, ei­ther (a) I’ve seen an ex­actly analo­gous “trick” prob­lem be­fore and am pat­tern-match­ing on that or (b) I’m do­ing the alge­bra quickly us­ing my seem­ingly well-de­vel­oped math­e­mat­i­cal in­tu­ition. I can also imag­ine (c) I’m leap­ing to the “wrong” an­swer, then try­ing to ver­ify it, notic­ing it’s wrong, and cor­rect­ing it, all in the same sub­con­scious flash, but that feels off. Imag­in­ing the “ten cents” an­swer doesn’t ac­tu­ally feel com­pel­ling; it just feels wrong. (It feels like a similar emo­tion to notic­ing I’ve got­ten the wrong amount of change, in fact.) The wid­gets prob­lem I do a no­tice­able dou­ble-take on, but it’s rapidly cor­rected within one con­scious time-step; the “100” is a mo­men­tary flicker be­fore my brain set­tles on the cor­rect an­swer. Imag­in­ing “100” af­ter­wards feels wrong, but less im­me­di­ately so than “ten cents” did. It feels like I have a bias there to­ward an­swer­ing “how many wid­gets can you pro­duce in a fixed time” ques­tions, so I might have an echo of the mis­read­ing “how many wid­gets can 100 ma­chines pro­duce in [as­sumed to be the same amount of time as be­fore, since no con­trary time value is pre­sented to over­ride this]”. The lily pads ques­tion takes me a con­scious time-step longer to an­swer than ei­ther of the other two; the ini­tial flash is “in­con­clu­sive”, and then I see my­self recheck­ing the part where the quan­tity dou­bles ev­ery step be­fore an­swer­ing “47”. (I no­tice I didn’t re­mem­ber that the steps were days, only re­mem­ber­ing that there was a time unit; I don’t know if that’s rele­vant.) Imag­in­ing “24” af­ter­wards feels some in­ter­me­di­ate level of wrong be­tween “ten cents” and “100”; my men­tal graph of the growth curve puts the ex­pected value 24 at “way too low” in­tu­itively be­fore I can com­pute the ac­tual ex­po­nent. • How­ever, for the other two I ‘just see’ the cor­rect an­swer. Is this com­mon for other peo­ple, or do you have a differ­ent split? For all three ques­tions, the wrong an­swer comes to my mind first*. But es­pe­cially in the con­text of ex­pect­ing a trick ques­tion, I sec­ond-guess it and come up with the cor­rect an­swer fairly quickly. *In the third ques­tion, the ac­tual an­swer “24” does not come to mind first, but the gen­eral sense of “half that num­ber” does. My mind does not ac­tu­ally calcu­late what half of 48 is be­fore finish­ing think­ing through the prob­lem. • I just saw the an­swer to the bat and ball prob­lem within a few sec­onds. As I re­mem­ber, my thought pro­cess was some­thing like: Could it be 10 cents? No, that adds up to$1.20. So there’s an ex­tra 10 cents—oh, of course, the differ­ence be­tween $1 and$1.10 has to be dis­tributed evenly be­tween both items, so the an­swer is 5 cents.

I’ve taken a course that cov­ered si­mul­ta­neous equa­tions, but my mem­ory of it is hazy enough that I’m sure that method would’ve taken me much longer.

• I’m go­ing to pull a re­verse true scots­man here and say that is si­mul­ta­neous equa­tions. (When we think of ‘solv­ing si­mul­ta­neous equa­tions’ we imag­ine peo­ple pul­ling the an­swer out, rather than push­ing the solu­tion in and see­ing if it fits—solv­ing ver­sus check­ing as it were.)

• How­ever, for the other two I ‘just see’ the cor­rect an­swer. Is this com­mon for other peo­ple, or do you have a differ­ent split?

I think I figured out and ver­ified the an­swer to all 3 ques­tions in 5-10 sec­onds each, when I first heard them (though I was ex­posed to them in the con­text of “Take the cog­ni­tive re­flec­tion test which peo­ple fail be­cause the ob­vi­ous an­swer is wrong”, which always felt like cheat­ing to me).

If I re­call cor­rectly, the third ques­tion was eas­ier than the sec­ond ques­tion, which was eas­ier than bat & ball: I think I gen­er­ated the cor­rect an­swer as a sug­ges­tion for 2 and 3 pretty much im­me­di­ately (alongside the sup­pos­edly ob­vi­ous an­swers), and I just had to check them. I can’t quite re­mem­ber my strat­egy for bat & ball, but I think I gen­er­ated the $0.1 ball,$1 bat an­swer, saw that the differ­ence was $0.9 in­stead of$1, ad­justed to $0.05,$1.05, and found that that one was cor­rect.

• This is pretty much the same for me. I think the solu­tion to bat and ball of “10cents, oh no, that doesn’t work. Split the differ­ence evenly for 5 cents? yup that’s bet­ter” is all done on sys­tem 1.

Kah­ne­man’s ex­am­ples of sys­tem 1 think­ing in­clude (I think) a Chess Grand­mas­ter see­ing a good chess move, so he in­cludes the pos­si­bil­ity of train­ing your sys­tem 1 to be able to do more things. In the case of the OP, sys­tem 1 has been trained to re­ally un­der­stand ex­po­nen­tial growth and ra­tios. I think that for me both “quickly check that your an­swer is right” and “try some­thing vaguely sen­si­ble and see what hap­pens” are both in­grained as gen­eral prin­ci­ples that I don’t have to ex­ert effort to ap­ply them to sim­ple prob­lems.

A prob­lem which I would vol­un­teer for a CRT is the snail climb­ing out of a well. Here there’s an ob­vi­ous but wrong an­swer but I think if you re­al­ise that it’s wrong then the cor­rect an­swer isn’t too hard to figure out.