Lessons learned from offering in-office nutritional testing

Link post

Introduction

I’ve talked previously about my concerns with nutritional deficiencies in effective altruists who go vegan for ethical reasons, especially those who don’t have a lot of contact with the broader vegan culture. No one else seemed very concerned, so I launched a tiny project to test people in this group and see if they were in fact deficient. This is a report on the latest phase of the project.

To cut to the chase:

  • It was very easy to find lots of deficiencies, although due to a severely heterogenous sample and lack of a control group this doesn’t provide useful information about if veganism is at fault.

  • Finding these deficiencies probably leads to useful treatment, but not as much as I’d hoped.

  • There are still a lot of operational issues to work out. My guess is that the ideal would require more work (to encourage participants to act on their results) or less (by focusing on education but not providing testing).

  • I am currently looking for a co-founder to properly investigate the impact of veganism on nutrition.

My main question here was “is there low-hanging fruit in treating nutritional deficiencies in this group, and if so how do we pluck it?” An important part of that is “how prevalent are deficiencies?”, but I had substantially more uncertainty around “do people treat deficiencies you find?” and “does the treatment lead to improvements in anything we actually care about?” That prioritization (and budget issues) led the experimental design to focus on operational issues and outcomes, and deprioritized getting the kind of clean data that would let me compare vegan and non-vegan outcomes. Similarly this write-up is mostly focused on showing the problem exists at all and building metis of investigation and treatment, rather than estimating prevalence.

Which is to say to to everyone planning on @ing me to complain about the sample size, heterogeneity, or mediocre statistics: you are right that this sample is not very informative about base rates of deficiencies in vegans or anyone else. If someone claimed it was, they would be committing an epistemic sin. However, this particular post is focused on “how much effort is it to get nutritional issues addressed, and is that effort worth it?”. Given that, any complaints about the terrible sampling will be considered to be offers of assistance in running the much larger study that could answer the prevalence question.

Background

(if you’ve read the previous posts, this will be review)

Last year I worked in a co-working space focused on existential risks, which is anything that might really end everything. Because x-risk is a popular topic within the effective altruism movement, many participants in the space were EAs. Another big topic within effective altruism is animal suffering, especially farmed animal suffering. This has led many EAs in the office to go vegan or at least vegetarian for ethical reasons, without making animals the focus of their lives. And when I asked them, many had put no thought into how to give up animal products in a healthy way.

Which of course would have been fine if they were eating well. My personal opinion is that most people’s optimal diet contains small amounts of animal products, but lots of people are eating suboptimally for lots of reasons and I don’t consider it my problem. But the number of fatigue and concentration issues were… I don’t actually know if they were high. I don’t know what baseline to compare them to. But it wasn’t low enough to reassure me. Neither did the way the vegans talked about nutrition, and in particular the fact that more than one person was doing keto vegan and still wasn’t investigating nutrition.

None of this meant there was necessarily a problem, but I was suspicious. So I got a small grant to solve this with numbers.

First I tried broad spectrum testing, which led me to identify iron deficiency as the main concern.

Then I did a short investigation into the actual costs of iron deficiency, which motivated a number of people to get tested, some of whom got in touch after the fact.

In my last post I implied I was stopping the project, and wrote next steps as if someone else would be doing them. I ended up getting a follow on grant so pushed forward with round two on my own, which is what this post is about.

Round 2 concrete steps

I followed the steps laid out in my previous post almost exactly.

  1. I brought in a company to draw blood from participants in the Lightcone office. This was expensive, probably 2-4x having people buy lab orders online, which is still more expensive than doctor-ordered tests with insurance.

    1. I say I brought them in – I in fact hired someone to handle the company and interface with participants. Aaron Silverbook was great, pretty much best case scenario for hiring a subcontractor, although still not as good as an invested co-founder.

  2. Based on previous results and nutritional folk wisdom my test priorities were ferritin (the best proxy for iron), B12, and Vitamin D. I threw in some other iron tests because they were very cheap.

  3. We had applicants fill out a form detailing their diet and any fatigue issues.

  4. Aaron picked 20 participants and made a schedule for testing them. Prioritization was: Fatigue issues > vegan > vegetarian > whoever happened to be in the office that day.

    1. This was my favorite part of having a contractor because scheduling involved some very intense math around prioritization and availability and from my perspective it was no work at all.

    2. Prioritizing people with fatigue issues probably made the immediate impact higher, and was useful for estimating the upper bound of possible impact, but ruined the sample for calculating base rates. It means I can’t really compare vegans vs. omnivores, because both were selected for having potentially-nutrition-caused problems. I had hoped to maybe compare vegans with fatigue to omnivores with fatigue, because the tired vegans had more nutritional issues that would be pretty suggestive, but ultimately didn’t have enough data to bother.

    3. I flat out didn’t have enough money to do a good study capable of establishing high confidence prevalence rates, even if that had been my primary goal. Putting aside sample size, good sampling requires getting representative participants, many of whom have no reason to get tested. They’re not intrinsically motivated, so you have to pay them to get tested.

  5. The company comes into the office. There are a bunch of day-of headaches that Aaron handles beautifully and I am completely uninvolved with because he is good at his job.

  6. Results go out. Some participants received results over email, some were told to go to the company’s portal, some may never have gotten a notification at all. More on this in the Difficulties section.

  7. I ask Aaron to get results from participants to me, for analysis. This goes less well- I wasn’t sufficiently clear on what information I wanted in the form, and he’s gotten busy with other stuff and had less time to devote to my project, which has already gone longer than anticipated due to issues with the lab. Response rate is poor- we eventually get 13 people out of 20.

  8. I send out a second form asking for more information, and a bunch of emails harassing people to do both forms. Response rate continues to be poor, only 8 people this time.

  9. 3 months after testing I followed up with people with the worst scores to see if they had gotten treatment.

Project Difficulties

Other People’s Fault

I cannot say enough bad things about BayAreaPLS, the company we hired to come into the office and do testing.

First and most importantly, they just forgot to run the ferritin test for half the participants. They did the other tests, there was no reason to not do that one in particular, they just… didn’t. Their initial attempt to make up for this was an offer to not charge me for tests they didn’t do. I pushed back fairly hard and they agreed to some actual discounting. It’s been 2.5 months and they have yet to send me that follow-up invoice, which I guess is technically good for me but makes me feel worse overall.

Second, the results were hard to retrieve. Some participants never got results at all even after following the instructions, and I have no way to debug whose fault that is. Others didn’t bother to retrieve results because it was too hard. The happiest people were the ones who got their results over email, which is a HIPAA violation the company claims to have done by accident. Much like the lack of the second invoice, the emailed results are technically good for me but leave me more concerned about the company.

Even for people who did get results it took almost two weeks, which was not great for momentum.

Lastly the company used different deficiency thresholds for different people. They say this was based on sex and age, but they didn’t get that information from everyone (sounds like incompetence on their part), so some people got a list of thresholds. It ended up creating a moderate amount of confusion and friction, especially because I couldn’t tell what variation came from age/​sex (which I mostly want to ignore) vs. the norms of a particular test (which I very much don’t). People are slow to act on results in general so the additional friction was quite costly.

My Fault or Inherent Difficulties

Not all of these were literally caused by me, but they all fall under “my responsibility to anticipate and handle”.

I mentioned in my last post that some participants were kind of insistent on getting a lot of help from me, even after I explicitly told them something was outside my bailiwick and needed a doctor. I tried to fix that this time by hiring Aaron. That worked on that one issue, but made it harder to catch new problems. If the plan had been for people to directly show me their results and receive coaching I would have caught the missing test values much earlier. I’m not sure there is a way to have someone available enough to get all the relevant questions from participants without also having to deal with a bunch of irrelevant ones, including inappropriately persistent ones. The difference between the two just isn’t obvious with the knowledge most participants have, and emotions run so high around health stuff.

It’s hard to estimate the rate of acting on results because the people who don’t act are less likely to fill out the follow-up surveys, but my sense is it’s not good, and probably <50%. I also strongly suspect the rate of acting on results would have been higher if there had been in-person follow-up.

The guidelines I sent participants emphasized vitamin D and ferritin because it didn’t occur to me anyone could see an anemia result on the page and not rush to treat it, but at least two people scored as anemic and, as of three months later, had not treated it.

The lab obeyed HIPAA enough to only send results to participants, not directly to me, so I needed participants to forward them. Of 20 participants, 13 did so. Only 8 participated in the separate follow-up questionnaire (arguably my fault for asking for two separate forms, but there was good reason to ask for results quickly and do the follow-up a little later). 20 people with varying motivations for testing was never going to be strong evidence for anything systemic, but the low response rate makes it even harder to draw conclusions.

My emails to participants sometimes went to spam, possibly due to use of BCC, which was necessary to meet the privacy commitments I made in the application.

What were the total costs?

As a ballpark:

  • ~$270 per participant (if they had done all the tests for everyone).

  • 0.5 hours per participant, including follow-up and the disruption to their schedule. If you want to be really conservative you could call this an hour to account for the disruption from transitions.

  • ~10 hours of Aaron’s time

  • ~20 hours of my time

  • $30 per deficient person for supplements (not covered by the study)

If you value everyone’s time roughly the same, that means that to break even we need to save one person 30 hours + ~$5700 (ignoring the information gained).

If you want to complicate that math you can add any of: discount rate on time or money, exchange rate between ops time/​my time/​participant time, cost of unnecessary or counterproductive treatment, knowledge gained from this round can make the next round cheaper, the fact that most of my time went to a write-up that shouldn’t really be billed to participants, but a better program would have spent more time with participants.

Results and Benefits

Test Results

I should remind you here that the sample was a mix of 20 ethical EA vegans, vegetarians, people with fatigue issues, and people who happened to be in the office. Even a very large sample wouldn’t be perfectly predictive unless you had the exact same mix of participant types, and this sample was tiny. So the right way to look at these results is “is this enough to think there’s a problem?” and “did people do helpful things with the information?” not “at what exact rate does this population develop ferritin issues?”

With those caveats, here is the data from participants that reported back. I have ferritin results for 8 people and all other results for 13.

  • 85% reported energy problems

  • 15% were honest-to-god anemic

  • 65% had low ferritin (30% clinically deficient) (none of the anemic people had ferritin tests done, so there is no overlap between this group and the anemics. There were non-anemic people without ferritin tests, so I assume this is a coincidence)

  • 60% had low vitamin D (15% clinically deficient)

  • A total of 80% had low scores in at least one of hemoglobin, ferritin, or vitamin D

  • 0% had low B12 (many were on supplements, but I haven’t correlated that with serum B12 levels because at this sample size and heterogeneity there is no point)

Some of the non-reporting was random due to the lab’s incompetence, but it’s not impossible unhealthy participants were more likely to report back. If you want to be extremely conservative and assume every missing value was A+ healthy, the results are still quite concerning:

  • 10% anemic

  • 25% low-in-my-opinion ferritin, 10% clinically deficient (still no overlap with the anemics)

  • 35% low-in-my-opinion vitamin D, 10% clinically deficient

  • B12 is still great, good job everyone

I Lead This Horse To Water – You Won’t Believe What Happened Next

But finding results isn’t very meaningful if no one acts on them. Of the 8 people who filled out the follow-up survey, 75% changed either diet or supplements, and 1 additional person kept going with supplements they would otherwise have dropped. I assume this is overreporting because people who changed things are more likely to respond, but if you assume none of the nonresponders did anything that’s still 30% of people changing something.

Of the 5 who changed something and answered the relevant question, 40% said they thought they saw an improvement (~1 month after they received results). I don’t consider that particularly strong evidence in either direction- it can take time for deficiencies to heal, but it’s also easy to placebo yourself into seeing an improvement that isn’t there. The real test will be the six-month follow-up.

Three months after testing I followed up with the two identified anemics to make sure they were getting treatment. Neither was, despite having health issues plausibly caused by anemia. They’ve both indicated vague plans to follow up now that I’ve pushed them on it.

Is in-office testing worth it?

I believe this round of testing was better than not doing in-office testing, but there is a lot of room for improvement.

My absolute wild-ass-guess is that this saved between one and ten people (out of twenty participants) from anemia or a moderate iron deficiency, and this improved their life and productivity by 10% to 100% (mean around 20%). I acknowledge these are large ranges, but some problems are bigger than other problems. I’m ignoring vitamin D entirely here because I haven’t even attempted to quantify its value and its ardent fanclub has poisoned the literature.

Even in the worst case, a 5% chance at a 10% improvement is a big deal, so I think this was obviously worth it from a participant perspective. I think it’s a toss-up if it would be worth it for apparently healthy omnivores: my expected value for them is much lower, but people don’t always realize they’re operating at a deficit and catching them requires testing actually-healthy people as well.

I said above that we needed to save someone 30 hours + $6000 for the project to break even. Even one successfully treated anemic will blow that out of the water, so I don’t feel the need to do the more complicated math with discount rates and relative value of time, especially because any future round should either be less work or have a higher response rate.

Of course “break even” isn’t a very high bar. To go even further out on a limb: the median case of mild anemia easily costs someone two hours/​day (source: had anemia one time). This testing easily caught the anemia at least six months before it would otherwise have been caught (because everyone who was going to get tested on their own did so when I published my iron post). That is, at a bare minimum, 360 hours someone otherwise wouldn’t have had. That’s a pretty great return rate for 40 hours (my time + Aaron’s time + participant time) +$6000.

Was the project overall worth it?

I would bet on yes, although a lot of information has yet to come in.

I expect this project to have 6 lasting impacts:

  1. The treated health problems of participants.

  2. The secondary impact via participants’ work.

  3. Public blog posts I write. My iron deficiency post, which motivated many people to get tested themselves, quite possibly more than received tests directly.

  4. Knowledge of how to do this more in the future.

  5. Influence on Effective Altruism vegan culture.

  6. Animal suffering averted by making veganism more sustainable for participants.

#1 is what I calculated above, and think was already a sound success although not resounding.

#2 and #3 require making assessments of all individuals who received testing from the project or due to blog posts I wrote. That’s a combination of “vast amounts of missing information” and “judging individual merit” that makes it really uncomfortable to talk about in a public post.

To be totally honest I’m on an “everything is bullshit” kick right now so deep in my soul I don’t think this paid off, but intellectually I think my standards are too high and this was a better project than average project in the space of existential risk.

#4 and #5 depend on other people following up on this project. I absolutely believe they should, for all the other reasons but also because the return via #6 seems pretty good- health reasons are a common reason vegans go back to eating meat [It was hard to find a good source, but this shitty source says 26%. I’ll acknowledge that’s probably an overestimate, since health is the most virtuous reason to go back to eating meat.]

I can’t estimate #6 myself. I’m not familiar enough with vegan literature to sort good from bad here, and it wasn’t my main goal.

Then there are costs. The total grant was for a little less than $25k. I didn’t track my time very closely to avoid depressing myself, but my compensation is going to work out to a fraction of my normal hourly rate. If you count foregone client work you could argue the true cost was as high as $50k.

My gut feeling is the project was straightforwardly worth it if you don’t track the foregone work. If you do, impact is dependent on having at least one of:

  • At least one of the impaired and successfully treated participants goes on to do high-impact work.

  • The iron post inspires at least 40 tests total, with a similar rate of finding and treating problems.

  • Follow-up projects exist and do good work.

  • This work leads to more veganism, and you value that a lot.

So can we blame veganism for the deficiencies?

This study doesn’t say anything one way or the other, which means I still think yes but you shouldn’t change your opinion based on the results. The sample is too small and skewed to compare deficiency rates in vegans and nonvegans. There were energetic omnivores with deficiencies and tired vegans with perfect scores so it’s clearly not deterministic.

Next steps

I see three possible follow ups to this project:

Nutrition blogging

This is my default, although I don’t plan on writing many of these because there is only so much low hanging fruit and people have a very limited attention budget. I have to be very judicious in what I suggest.

Mass testing to investigate deficiencies in effective altruism populations

Get the money to test a large enough representative sample, and run the tests with a proper control to actually estimate the cost of nutritional veganism.

This is most useful if there are EA vegan leaders who won’t act on nutritional concerns now but would if the study demonstrated a problem. If this is you, I would love to talk to you about what you would consider sufficient to act on.

Assuming the demand for this information is there, I still don’t think I want to run this project alone. First, it is a lot of work. Second…I know I said “assuming demand is there”, but I can’t picture a scenario where demand exists but no animal EAs consider this project worth working on. A collaborator would be both proof of investment and much better positioned than I am to get the information acted on.

To that end, here is an ad for a co-founder. I will post it on this blog in a few days.

In office testing with real nutritional counseling

This can work, but only in a limited number of situations. You need a reason (uninformed veganism, high fatigue rates) to suspect nutritional issues in lots of people sharing a space. There needs to be a reason people aren’t getting tested themselves that won’t also inhibit follow up (probably lack of money and existing relationship with a doctor). And even then it’s more of a hits-based model than a sure thing.

My decision is easy because the office I was working out of closed, and in general I think most of the people in the bay area I would want to help have already been reached. The market is saturated for at least a year. There are other offices elsewhere in the world, and if you want to run this yourself I’m happy to act in an advisory capacity (especially if you share data), but it can’t really be an ongoing project in any one city.

Conclusion

I finished most of this post planning on it being the end of my part of the project. I had hopes I would convince someone else to pick up the torch, and maybe even act as an advisor, but it seemed like the biggest problem was participant motivation, which I don’t feel equipped to solve. It was while I was writing this that I realized I wasn’t ready to let the broader issue of vegan nutrition go. I still believe the problem that offended my morals and epistemics is there and worth acting on.

But doing so is still very annoying, which is why I’m looking for someone to partner with on this. Someone who can handle the parts I’m bad at, point out where I’m wrong, and interface with the vegan EA community to get the results acted upon. If you’re interested, please reach out to elizabeth@acesounderglass.com.

Thank you to the Survival and Flourishg Fund for funding this research, and Lightcone Infrastructure for hosting the grant and testing. I inflicted this draft on a number of people but want to especially thank Gavin Bishop. Daniel Filan didn’t beta read this post but he did vegan-check my co-founder ad and suggest the title “I Lead This Horse To Water – You Won’t Believe What Happened Next”.

Crossposted to EA Forum (38 points, 2 comments)