Decaf vs. regular coffee self-experiment

Link post

For a while, I’ve been con­vinced that de­caf coffee has roughly the same effect on me as reg­u­lar coffee. How­ever, I haven’t been able to say with cer­tainty be­cause there’s huge po­ten­tial for placebo effects. Start­ing to­mor­row, I’ll be con­duct­ing a (99%) blinded ex­per­i­ment to test whether drink­ing reg­u­lar vs. de­caf coffee has a de­tectable effect on my mood and alert­ness.

I in­tend to record the met­rics I de­scribe in the next sec­tion un­der the ‘Data col­lec­tion’ head­ing of this post and will also re­port re­sults here once the ex­per­i­ment is over. I’ll make data from my Quan­tified Mind ex­per­i­ment (dis­cussed be­low) available as well in CSV for­mat.

See some­thing wrong with this ex­per­i­ment plan, my anal­y­sis, and/​or my re­sults? Email me at (see top left header for spel­ling) or com­ment on this post!



To pre­pare for the ex­per­i­ment, I split 2 weeks’ worth of coffee (7 days of de­caf, 7 days of reg­u­lar) into 14 bags. After I split the coffee into 14 bags, my lab as­sis­tant (girlfriend) sorted them into a ran­dom or­der (with la­bels for each day) by flip­ping a coin to de­cide whether that day’s coffee would be reg­u­lar or de­caf.


Start­ing to­mor­row, for the next 14 days, I’ll make my coffee us­ing the grounds from the la­beled bags and track a few sub­jec­tive met­rics three times a day (right af­ter hav­ing coffee, at 1 PM, and at 6 PM):

  • Alert­ness (1-5 scale): loosely defined as how tired and sleepy I feel.

  • Sharp­ness (1-5 scale): the op­po­site of ‘fog­gi­ness’.

  • Mood (1-5 scale): a coarse-grained mea­sure of how ‘good’ I feel emo­tion­ally.

  • Headache (yes/​no): I get headaches if I don’t have coffee in the morn­ing, so it will be in­ter­est­ing to see whether I get them on de­caf days even when I don’t know the coffee is de­caf.

  • Reg­u­lar vs. de­caf day (reg­u­lar/​de­caf): “Given how I feel to­day, do I think this morn­ing’s coffee was reg­u­lar or de­caf?”

<img src=”http://​​​​images/​​coffee_bags.jpg″ style=”width: 400px; height:200 px; clip: rect(0px,-60px,-200px,0px);”>

I’ve also set up a Quan­tified Mind (QM) ex­per­i­ment which will give me 8 min­utes of cog­ni­tive tests each day and record my scores. I’ll take these tests im­me­di­ately upon finish­ing my coffee in the morn­ing.


I’m us­ing Swiss Water’s ver­sion of Joe Coffee Night­cap De­caf and Joe Coffee Colom­bia La Fa­milia Guarnizo reg­u­lar coffee both brewed in a Mr. Coffee drip coffee maker. I chose Swiss Water for de­caf af­ter read­ing that they do the best job of fil­ter­ing out >99% of the caf­feine from the beans.

For cog­ni­tive tests, I’m us­ing QM’s 8 minute “coffee” test that in­cludes tests of ex­ec­u­tive func­tion, work­ing mem­ory, and vi­su­ospa­tial some­thing.



Get­ting less than 7 hours of sleep af­fects how sharp I feel through­out the day and how awake I feel in the af­ter­noon. Even though this is a ran­dom­ized ex­per­i­ment, 14 days is short enough that were my sleep sched­ule to get in­ter­rupted, noise from sleep qual­ity var­i­ance could eas­ily over­whelm the sig­nal from drink­ing reg­u­lar vs. de­caf coffee in the data. I plan to deal with this in two ways:

  1. Track how much sleep I get each night (in num­ber of hours). Un­for­tu­nately, I don’t have a good sleep tracker so this will just be based on es­ti­mate of when I went to bed, how long it took me to fall asleep, and when I woke up.

  2. Avoid ma­jor sleep dis­rup­tions and sleep roughly the same num­ber of hours each night. That said, if 2 fails and my sleep sched­ule gets messed up dur­ing the ex­per­i­ment, I’ll be much less con­fi­dent in the re­sults.

(ETA on 0302.) In the com­ments, Bucky points out that caf­feine use may also im­pact sleep the next night, mak­ing the con­found­ing even more com­plex. My cur­rent plan is to test for ev­i­dence of this when I do my anal­y­sis. Pre-reg­is­ter­ing that I’ll be sur­prised if my one cup of coffee in the morn­ing has much of an im­pact, but be­ing sur­prised is the whole point of some­thing like this!


Rel­a­tive to sleep, I’m less con­vinced that reg­u­lar dietary vari­a­tion—i.e. eat­ing rel­a­tively ‘healthy’ food and not be­ing moribdly obese—has much of an effect on cog­ni­tive perfor­mance. But I still will keep my reg­u­lar eat­ing sched­ule of skip­ping break­fast and only hav­ing lunch and din­ner as I sus­pect this will also help me keep a reg­u­lar sleep sched­ule. To keep my­self hon­est here, I’ll track when I eat each day.

In­ter­net Use

(ETA on 0302.)

I know this one seems weird but anec­do­tally, I’ve found pro­cras­ti­nat­ing on the more ad­dict­ing in­ter­net web­site (read: Twit­ter) causes me to feel a lot fuzzier for the rest of the day.

Caf­feine Withdrawal

(ETA on 0302.)

In the com­ments, Issa Rice points out that caf­feine with­drawal may be­gin any­where be­tween 12 and 24 hours af­ter not hav­ing caf­feine and peaks around 50 hours. This means that de­pend­ing on the or­der of con­sec­u­tive days, I may or may not go through full with­drawal, which would pre­sum­ably im­pact my re­sults.


(ETA on 0302.)

In the com­ments, Pat­tern points out that mood and events that af­fect it might af­fect the re­sults. I’ve added a mood met­ric to my list of sub­jec­tive met­rics to track to pre­pare for this pos­si­bil­ity.

Cog­ni­tive test prac­tice effects

Given that I haven’t been do­ing QM tests be­fore the ex­per­i­ment to cal­ibrate, there’s a risk of prac­tice effects dom­i­nat­ing differ­ences be­tween caf­feine and no caf­feine days. I’m not to­tally sure how to deal with this yet, but isn’t this the use-case for ran­dom effects re­gres­sions?


On not pre-reg­is­ter­ing in detail

Since I’ve been read­ing Gel­man’s won­der­ful Bayesian Data Anal­y­sis and also view this study as a good can­di­date for a Bayesian ap­proach due to the ex­per­i­ment hav­ing a small , I in­tend to use Bayesian meth­ods for my anal­y­sis. In an ideal world, I’d pre-reg­ister ex­actly what analy­ses I in­tend to do now (as of 0301), but un­for­tu­nately, I’m still enough of a noob at this that I need to spend a good chunk of time read­ing about the right way to set up the anal­y­sis. For now, I’m record­ing the ques­tions I want to an­swer be­low and will edit to add de­tails of the anal­y­sis as I figure them out.

I worry less than I nor­mally would about post-hoc chang­ing the anal­y­sis to find a sig­nifi­cant re­sult be­cause I don’t have strong in­cen­tives to find one. That is, I’m gen­uinely in­ter­ested in the ‘true’ an­swer to the ques­tion and don’t have a strong de­sire for it to be ‘there’s a big effect’ or ‘there’s no effect’. Be­ing trans­par­ent about the re­sults of each stage of anal­y­sis should also help keep me hon­est. (Of course, I could always post-hoc choose not to share in­ter­me­di­ate stages but again I don’t think my in­cen­tives are to do that.)

High level plan

At a high level, I want to test the effect of reg­u­lar vs. de­caf coffee on alert­ness, sharp­ness, headaches, and my QM re­sults. This is com­pli­cated by the fact that my prior is that the re­sponse vari­ables I de­scribed above only share some com­mon causes and that the causal effects of caf­feine con­sump­tion differ be­tween the re­sponse vari­ables. For ex­am­ple, I sus­pect alert­ness and QM test scores are both af­fected by sleep quan­tity and coffee con­sump­tion but that alert­ness may also be im­pacted by other con­found­ing vari­ables like mood and plans for the day.

To miti­gate this, I’ll heav­ily rely on the most ob­jec­tive re­sponse vari­able, the QM re­sults, to de­ter­mine the mag­ni­tude of the ‘true effect’. In causal terms, this is equiv­a­lent to as­sum­ing that sleep is suffi­cient for block­ing all ‘back­door’ paths be­tween reg­u­lar vs. de­caf coffee and cog­ni­tive abil­ity. I’m still mea­sur­ing the other sub­jec­tive vari­ables be­cause I’m cu­ri­ous to see how cor­re­lated they are with my QM re­sults and each other and other want to leave open the pos­si­bil­ity of do­ing other analy­ses that come to mind and seem in­ter­est­ing.


This is cur­rently (as of 0301) a list of ques­tions that I came up with for my­self, but I’ll also add an­swers to ques­tions oth­ers raise in this sec­tion.

Isn’t this too short?

As I men­tioned, 14 days is short enough that even though the reg­u­lar vs. de­caf day as­sign­ments are ran­dom­ized and blinded, the ‘statis­ti­cal power’ of my re­sults will be rel­a­tively weak. Two re­sponses to this:

  1. From a de­ci­sion-the­o­retic per­spec­tive, I mostly care about the eas­ier to an­swer ques­tion of was the effect mean­ingful enough that I could ac­cu­rately de­tect whether the coffee I had that day was reg­u­lar or de­caf con­di­tional on what I know about my sleep and other fac­tors.

  2. I’m go­ing to use Bayesian meth­ods and will be more than will­ing to la­bel the re­sults ‘in­con­clu­sive’ if my anal­y­sis re­sults in a diffuse pos­te­rior.

Why ’99%′ blinded?

I’m call­ing this 99% blinded be­cause there is a slight vi­sual differ­ence be­tween the two coffee grounds that I could in the­ory de­tect while mak­ing my morn­ing coffee. By mak­ing my coffee in the dark (I do this already) and hav­ing the bags pre-sorted so I barely have to look at them, I hope to min­i­mize the like­li­hood of ‘de-blind­ing’ the ex­per­i­ment. I tried to min­i­mize the like­li­hood fur­ther by buy­ing iden­ti­cal de­caf and reg­u­lar grounds but un­for­tu­nately couldn’t find a sel­ler that sold the same beans in de­caf and reg­u­lar. In lieu of that, I set­tled for buy­ing beans from the same re­gion with the same fla­vor pro­file (I also don’t have very good taste sense) so as to limit the differ­ence to a vi­sual one.

Data Collection

Record­ing sub­jec­tive met­rics and sleep du­ra­tion in this Google spread­sheet (to make ex­port to CSV easy).

Below, I’m also record­ing mis­cel­la­neous ob­ser­va­tions.


Day 7 (03/​09)

Quan­tified Mind Prac­tice Effects

My Quan­tified Mind re­sults are definitely im­prov­ing in large part due to prac­tice effects. This is in spite of my try­ing to use the same strate­gies for the differ­ent tests rather than im­prove them. For ex­am­ple, there’s a test in which I have to se­lect a num­ber be­tween 1 and 9 based on a pic­ture and on the first day I set up my hands such that my pinky was on the 0 (which isn’t an op­tion in the test). This po­si­tion­ing is un­nat­u­ral for me and in hind­sight I should have started with my pinky on the 9. But, to keep things con­sis­tent and pre­vent un­nec­es­sary con­found­ing, I’ve stuck with my origi­nal hand po­si­tion­ing for all sub­se­quent tests.


Qual­i­ta­tive Observations

I’m done! Made it through the with­drawal headaches. I haven’t done much anal­y­sis yet but here are a few of my ini­tial ob­ser­va­tions, some of which I won’t be able to ver­ify with anal­y­sis.

  1. I did pretty well at iden­ti­fy­ing which days were caf­feine vs. de­caf days. I only made two mis­takes and one of them I had a hunch I was wrong in hind­sight.

  2. De­caf days af­fected my ac­tual sub­jec­tive pro­duc­tivity less than ex­pected. The main benefi­cial effect of caf­feine seemed to be that it low­ered the ac­ti­va­tion en­ergy for me to get started on tasks and on days in which I’d slept well seems to add a cer­tain ‘sharp’ qual­ity to my think­ing.

  3. Sleep mat­ters. This I’m hope­ful I’ll be able to get at least some sig­nal on. Anec­do­tally, es­pe­cially if we ig­nore the headaches (which were a re­sult of with­drawal not drink­ing de­caf coffee in gen­eral), the differ­ence in all my sub­jec­tive met­rics seemed to cor­re­late much more with how much sleep I got be­fore than with reg­u­lar vs. de­caf coffee.

  4. Caf­feine may not help me do bet­ter when sleep de­prived. As men­tioned above, I do no­tice a small sub­jec­tive pos­i­tive effect on my ‘sharp­ness’ when I sleep re­ally well, have caf­feine, and fast (which I do most days un­til lunch). On the other hand, on days on which I got <7 hours of sleep (hap­pened be­fore both caf­feine and de­caf days), I felt like caf­feine ei­ther made no differ­ence or made me a bit more awake at the cost of mak­ing my cog­ni­tion even fuzzier. I highly doubt this will show up in the Quan­tified Mind met­rics in any de­tectable way but I wanted to note it as a hy­poth­e­sis that I’ve very in­ter­ested in as part of my gen­eral in­ter­est in miti­gat­ing the effects of sleep de­pri­va­tion.

  5. Credit to Issa Rice for point­ing out that this would be an is­sue when I pro­posed the ex­per­i­ment. With­drawal did turn out to be a bit of an is­sue al­though not enough of one (IMO) to mess up the re­sults of the ex­per­i­ment. My first de­caf se­quence was two days in a row and in the af­ter­noon I got a bad with­drawal headache that was re­sis­tant to Ibupro­fen. On later de­caf days, I took Ibupro­fen at the first sign of a headache and this seemed to largely miti­gate with­drawal symp­toms. Of course this does con­found my headache track­ing a bit, but I view it as worth it in or­der to try min­i­mize the effect of with­drawal on other met­rics.

(Where I’ll record graphs and other sum­mary statis­tics.)