Be less scared of overconfidence

Link post

When I was deciding whether to work for Wave, I got very hung up on the fact that my “total compensation” would be “lower.”

The scare quotes are there because Wave and my previous employer, Theorem, were both early-stage startups that were paying me mostly in fake startup bucks equity. To figure out the total compensation, I tried to guess how much money the equity in each company was worth, with a thought process something like:

  • Both of these companies have been invested in by reputable, top-tier venture capitalists.

  • The market for for-profit investments is pretty efficient, and most people who think they can do better are being overconfident.

  • Who am I, a lowly 22-year-old programmer, to disagree with reputable top-tier venture capitalists? I should defer to them about the valuations.

So I valued the equity by taking the valuation each company’s VCs had invested at, and multiplied it by the fraction of the company my shares represented. That number was higher for Theorem than for Wave.

Seven years on, the Wave equity turned out to be… a lot more valuable. That raises the question: how dumb was my take? Was the actual outcome predictable if I’d thought about it in the right way?

I don’t think it was perfectly predictable, but I do think I shouldn’t have been that anchored to the market-efficiency reasoning. Those respectable, top-tier VCs had YOLOed those valuations after a couple one-hour meetings, because that’s how early-stage VC works. Meanwhile, I had worked at Theorem for a year and my then-partner had worked at Wave for nine months. Heck, I had gotten more founder time than those VCs had just during my interview process. I had way more information than “the market.”

If I’d had the confidence to use that information, I might have thought something like:

  • After its funding round, Wave continued to add users at one of the fastest paces their investors had ever seen, whereas Theorem is struggling to grow.

  • Theorem is constrained by its ability to do sales, and the founders don’t seem to be acting with enough focus or urgency to unblock that constraint. Instead, they’re distracting themselves with things like hiring machine learning interns (i.e. me).

  • The founders of Wave seem much smarter, more relentlessly resourceful, and more trustworthy.

  • Given the above, I should value the Wave equity way more even though its naive expected value is less than the Theorem equity.

Fortunately, I chose Wave for other reasons. But this thought pattern—throwing away most information in fear of using it to make overconfident judgments—shows up all the time. I’m here to tell you why I hate it.

In January 2020, my entire Twitter timeline was freaking out about a novel-seeming respiratory disease spreading in Wuhan.

Part of me thought:

  • All the reputable, top-tier technocrats are ridiculing the freaked-out people.

  • Usually, when a ragtag band of Internet weirdos thinks they know better than a large group of reputable, top-tier technocrats, the Internet weirdos are being overconfident.

  • So the technocrats are probably right on this one.

Another part of me thought:

  • Huh, the simple model of “this thing has a fast exponential growth rate and spreads when people are asymptomatic so it’s very hard to stop” seems like a compelling reason to think things will be quite bad.

  • When reputable, top-tier technocrats say not to freak out, they don’t usually address the best arguments in favor of freaking out, and they often seem like they don’t understand how exponential growth works.

  • Maybe I’ll buy a lot of beans in case everything goes to shit.

(I also contemplated the fact that the stock market didn’t seem to be freaking out, but I decided that since most people can’t beat the stock market, I probably wouldn’t either. Some braver souls than I bought puts on the S&P 500 and made a killing.)

In both of these situations, I had some mental model of what was going on (“this epidemic is growing exponentially,” “this startup seems good”) based on the particulars of the situation, but instead of using my internal model to make a prediction, I threw away all my knowledge of the particulars and instead used a simple, easy-to-apply heuristic (“experts are usually right,” “markets are efficient”).

I frequently see people leaning heavily on this type of low-information heuristic to make important decisions for themselves, or to smack down overconfident-sounding ideas from other people.

  • This startup is growing incredibly fast and the founders are some of the most effective people I’ve ever met, but at their current VC valuation, the total comp is lower than my Big Tech job so I can’t justify the move.

  • I think I could have a big impact as an academic researcher, but most grad students end up depressed and don’t land a tenure-track position, so it’s not worth trying.

  • You’re going to start a company? Are you aware that 90% of startups fail? What makes you think you and your ragtag band of weirdos are the chosen ones?

  • Who are you to be sounding the alarm about a pandemic when every past alarm has been false and all the reputable, top-tier experts say not to worry?

These all place way too much weight on the low-info heuristic.

A heuristic like that can make a good starting point when you’re not an expert in an area and don’t have very much time to think about it or dig in. This is useful in theory, but in practice, people don’t limit them to that regime—they fall back on the same heuristics even in high-context, high-investment situations, where it’s silly to throw away so much detailed context about the particulars.

What’s worse, these low-info heuristics almost always push in the direction of being less ambitious, because the low-info view of any ambitious project is that it will fail (most projects run behind schedule, most startups fail, most investors underperform the market, etc.).

The problem is that the bad consequences of underconfidence and under-ambition are severe but subtle, whereas the bad consequences of overconfidence and wishful thinking are milder but more obvious. If you’re overconfident, you’ll try things that fail, and people will laugh at you. If you’re underconfident, you’ll avoid making risky bets, and miss out on the potential upside, but nobody will know for sure what you missed.

That means it’s always tempting to do what the low-info heuristic tells you and be less ambitious—but ultimately, that ends up being worse for the world.

Why do people find low-info heuristics so compelling? A few potential reasons:

  • Many (most?) attempts to reason via specific details are wrong. Most people who think “I’m going to beat the market” don’t; most people who think “I know better than all the experts” are less Balaji Srinivasan and more Time Cube guy.

  • The reasoning and evidence backing up low-info heuristics is (relatively) legible and easily verifiable. If I claim “90% of startups fail,” I can often cite a study for support. Whereas if I claim “the markets aren’t freaking out enough about COVID,” I’d need to make a much more complicated argument to explain my reasoning.

  • It’s relatively straightforward to reason with low-info heuristics even when you’re not an expert in the domain. For something like a forecasting challenge, where forecasters need to make predictions across a wide range of topics and can’t possibly be an expert in all of them, this is very important.

  • Because it’s much more objective, reasoning via low-info heuristics gives you many fewer opportunities to fall prey to biases like optimism bias, motivated reasoning, the planning fallacy, etc.

Those are all real advantages! low-info heuristics are a great way to be more-or-less right most of the time as a non-expert, and to limit your vulnerability to overconfidence and wishful thinking.

The problem is that there are lots of ways that low-info heuristics fail or can be improved on.

For example, the efficient market hypothesis (“asset prices incorporate all available information, so it’s hard to beat the market” used in the above example to infer that “venture capitalists value companies correctly”) is justified by economic theory that relies on a few assumptions:

  • Low transaction costs: The cost of doing a trade in the market (in this case, an investment) must be near-zero so that people can use any mispricings to get rich.

  • Enough smart money: The well-informed and rational players in the market need to have enough capital to take advantage of any pricing inefficiencies that they notice.

  • No secrets: The “available information” must be available to enough of the smart money that it can be used to correct mispricings.

  • Ability to profit: There must be a way for a smart market participant to make money from a mispriced asset.

In the case of venture capital, many of these assumptions are super false. Fundraising takes a lot of time and money: transaction costs are high. Venture capitalists YOLO their valuations after a few meetings: they frequently miss important information. And it’s impossible to short-sell startups, so there’s no market mechanism to correct an overpriced company. You can see the outcome of this in the fact that there are venture capitalists that consistently beat “the market’s” returns.

But it’s not just venture capital: almost no markets fully satisfy the conditions of the EMH, and many important markets—like housing or prediction markets—strongly violate them.

Or consider the heuristic that “if internet weirdos disagree with experts, the experts are right.” What community of Internet weirdos and what community of experts? Some communities of experts are clearly bonkers, like the victims of the Sokal hoax. In other cases, a community with expertise in one narrow area might not have the context in adjacent areas or the ability to do the first-principles thinking necessary to apply their expertise correctly in the real world. For example, doctors are experts in medicine, and thus are often expected to make medical diagnoses, but only 21% of doctors are capable of doing the elementary statistical calculations necessary to turn a medical test result into the probability of having a disease.

Or consider the heuristic of the outside view: “the outcome of this situation will probably be similar to the outcome of similar past situations.” Suppose you’re using this to judge how likely a startup is to succeed. Sure, you could predict it based on the distribution of outcomes across all startups at a similar stage and valuation. But that would throw away almost all information you have about the particular startup at hand. It ignores tons of important questions, like:

You could imagine trying to incorporate info like this into your outside-view analysis, by, e.g., looking at outcomes specifically of all startups that have grown by 10x in a single year. But that kind of information is so private and closely guarded that you probably can’t do that analysis. For some of the other traits, e.g. “how determined are the founders,” we don’t even have a good enough way of measuring that trait that you could do the analysis even in principle.

Sometimes I see people use the low-info heuristic as a “baseline” and then apply some sort of “fudge factor” for the illegible information that isn’t incorporated into the baseline—something like “the baseline probability of this startup succeeding is 10%, but the founders seem really determined so I’ll guesstimate that gives them a 50% higher probability of success.” In principle I could imagine this working reasonably well, but in practice most people who do this aren’t willing to apply as large of a fudge factor as appropriate. Strong evidence is common:

One time, someone asked me what my name was. I said, “Mark Xu.” Afterward, they probably believed my name was “Mark Xu.” I’m guessing they would have happily accepted a bet at 20:1 odds that my driver’s license would say “Mark Xu” on it.

The prior odds that someone’s name is “Mark Xu” are generously 1:1,000,000. Posterior odds of 20:1 implies that the odds ratio of me saying “Mark Xu” is 20,000,000:1, or roughly 24 bits of evidence. That’s a lot of evidence.

… One implication of the Efficient Market Hypothesis (EMH) is that is it difficult to make money on the stock market. Generously, maybe only the top 1% of traders will be profitable. How difficult is it to get into the top 1% of traders? To be 50% sure you’re in the top 1%, you only need 200:1 evidence. This seemingly large odds ratio might be easy to get.

In fact, outperforming low-info heuristics isn’t just possible; it’s practically mandatory if you want to have an outsized impact on the world. That’s because leaning too heavily on low-info heuristics pushes people away from being ambitious or trying to search for outliers.

Most important things in life—jobs, hires, companies, ideas, partners, etc.—have a distribution of outcomes where the best possible choices are outliers that are dramatically better than the typical ones. In my case, for example, choosing to work at Wave was probably 10x better than staying at my previous employer: I learned more, gained responsibility faster, had a bigger impact on the world, etc.

Unfortunately, low-info heuristics tell you that outliers can’t exist. By definition, most members of any group are not outliers, so any generalized heuristic will predict that whatever you’re looking at isn’t an outlier either. If you index too heavily on what the average outcome is, you’re deliberately blinding yourself to the possibility of finding an outlier.

This is especially bad when someone uses this kind of reasoning to smack down other people’s ambition, because the payoffs are asymmetric. If you incorrectly tell someone that their ambitious idea is likely to succeed, then they’ll waste their time on a failed idea, which is not great, but ultimately fine. But if you smack them down with low-info heuristics and convince them their idea is likely to fail, you rob the world of an awesome idea that would have existed otherwise. Shame on you! (Too bad you’ll never know about it.)

OK, so what should you do instead of relying on low-info heuristics? Here are my suggestions:

  • Build gears-level models of the decision you’re trying to make. If you’re deciding, e.g., where to work, try to understand what makes different jobs awesome or terrible for you.

  • Think really hard about the problem. Most inside views are wrong—to stand a fighting chance of beating the outside view, you’ll need to put a lot of effort in.

  • Don’t fool yourself with motivated reasoning. Stress-test your ideas; ask yourself what the best arguments against your inside view are and see if you can rebut them.

    • To the extent that you do use low-info heuristics, use them as a stress test rather than a default belief. “90% of startups fail” is useful to know as a warning to try to mitigate failure modes. It’s dangerous when you hear it and stop thinking there.

  • Don’t be afraid to try ambitious things where the downside of failing is low, and the upside of succeeding is high!

Thanks to draft readers Irene Chen, Milan Cvitkovic, and Sam Zimmerman.