How Many (Smallish) Organic Compounds Are There?

Introduction

Here’s a question I’ve been pondering recently: how many organic compounds are there? There are multiple ways to interpret the question, each of which leads to consideration of a different potential set of compounds, and each of which is informative in its own way.

First off, what is an organic compound? This sounds straightforward, but because of the history of the term, it isn’t. Originally, “organic compounds” were carbon-containing compounds associated with living things. Over time, the term drifted to include most carbon-containing compounds, except for things like carbon dioxide, minerals like limestone (CaCO), and allotropes of pure carbon (like diamonds, graphite, or carbon nanotubes). Wikipedia defines an organic compound as “generally any chemical compounds that contain carbon-hydrogen bonds”. That definition seems good enough for our purposes[1], so we’ll go with it.

Next, what do we mean by “are there”? Is it substances that have actually been found or made and characterized? A quick check of PubChem says that there are around 100 million () compounds with information submitted. However, that’s not quite what I was envisioning when I asked the question. The compounds we’ve actually made or isolated (and more importantly, characterized to a greater or lesser extent) are only a tiny fraction of chemical space, which I will loosely define as the set of all possible[2] molecules.

Example Chemical Sub-spaces: Hydrocarbons and Proteins

Organic chemical space contains, in theory, an infinite number of molecules. Consider the set of fully-saturated, unbranched hydrocarbon chains: methane with a single carbon, ethane with two, etc. In principle, one could construct an arbitrarily long chain with no reason to expect chemical instability. In practice, we make polyethylene chains up to about half a million carbon atoms in length that are reasonable approximations of that thought exercise.

Of course, ultra-long hydrocarbons aren’t that interesting chemically, with only one monomer and no functional groups. What about molecules that actually do things? Let’s take proteins, a strong contender for “most interesting class of molecules”. There are 20 naturally-occurring amino acids. A 30-kDa protein[3] has about 300 amino acids, so there are 20^300 (about 10^390) possible combinations of amino acids leading to a protein of moderate size. As a comparison, a quick Fermi calculation gives 10^80 atoms in the known universe.[4]

“Smallish” Organic Molecules

Okay, but that still isn’t quite what I meant to ask when I wondered “how many organic molecules are there?” I’m a synthetic organic chemist, so what I really wondered was “how big is the space of the kinds of molecules that synthetic organic chemists are typically concerned with?” These molecules tend to be:

  • Relatively small compared to the examples we’ve considered so far (mostly less than 500 Da molecular weight)

  • Stable on a timescale of at least hours under conditions achievable in the lab

  • Composed of relatively dense arrangements of rings, chains, and functional groups

These constraints are fairly similar to the space of “potentially pharmacologically active molecules”. Wikipedia led me to a paper by Bohacek, McMartin, and Guida[5] that gives an estimate of 10^63 such molecules. This sounds like an answer to the question I actually wanted to ask, so it’s worth unpacking their calculation further.

Here’s the calculation as Bohacek et al give it[6]:

Although the number of possible molecules is difficult to estimate accurately, simple considerations show that it must be very large! Consider growing a linear molecule an atom at a time and choosing a carbon, nitrogen, oxygen, or sulfur atom at each position. Some of these atoms can be doubly or triply bonded, but not all combinations of atoms are chemically stable, and some multiple bonds will only be possible in nonlinear structures, i.e., a C=O group. Assuming a very approximate average choice multiplicity of 6, then 6^30 or 2*10^23 molecules could be grown containing 30 atoms. Now consider the ways of introducing branching or cyclization into the resulting structure. Closure of rings with three or more atoms involves selecting two atoms to form a bond and could be achieved in 30*28/​2 ways. Making a branched molecule could be achieved by choosing a point to cut the chain and a point in the first part of the chain to attach to the cut end of the second part of the chain (i.e., 30^2 ways). Not all atoms can be joined in this way. However, this will be offset by the fact that when stereochemical considerations are introduced, the number of possibilities will be expanded. Based upon these considerations, approximately 10^40 molecules with up to four rings and 10 branch points could be produced from each linear chain, resulting in a very approximate estimate of 10^63 molecules in total. Although this is a rough estimate, it seems likely that when all the different possible combinations of ring closure and branching are taken into account, the true number will be well in excess of 10^60 and will rise steeply with increasing molecular weight.

We can quibble with some of the choices made in this calculation, but if anything, the final figure of 10^60 is a lower bound. There are other elements besides C, N, O, and S that show up in natural products or other potentially bioactive compounds (B, F, P, Cl, Br, I just for a start) and while the halogens would only substitute for hydrogen, rather than forming rings and chains, each added element contributes multiple orders of magnitude to the total.

It’s also fun to compare this number with some estimates about how much carbon is available in various places. Carbon amounts in this section come from this paper. First, there are about 600 Pg carbon in the amosphere, mostly as carbon dioxide. That works out to about 3 x 10^40 carbon atoms in Earth’s atmosphere.[7] Between the atmosphere, the oceans, and the terrestrial biosphere, there are around 43,000 Pg “circulating carbon”, or ~2 x 10^42 carbon atoms.[8] If we add in the estimates of “deep carbon” in the earth’s interior, there are around 1.85 x 10^9 Pg C on the planet, or about 10^47 carbon atoms.[9]

Conclusion

There are a few points I’d like to emphasize at the end of this chain of Fermi calculations. First, all the estimates here are incredibly rough, and could be off by multiple orders of magnitude without changing the primary conclusions. Second, the set of organic compounds considered as a search space is really big, and if you have a goal that involves picking a compound with a defined set of characteristics, you’ll want to do something other than brute-force search. Finally, in the estimate for the size of small-molecule chemical space, the bulk of the work (about two-thirds of our log-units) is done by the ability of carbon to form rings and branched chains. In this respect, no other element comes close to the variety we see with carbon-based compounds.[10] To the extent that molecular shape is relevant to processes we care about (like living systems we know about, and likely those we haven’t encountered yet) we should expect carbon-based compounds to play a significant role.

  1. ^

    it does exclude some fun edge cases like carbon tetrachloride and hexafluorobenzene, but any definition we choose will have to draw the line somewhere.

  2. ^

    “Possible” here is intended to mean only chemical entities that are isolable and stable for at least long enough to be characterized.

  3. ^

    That is, a protein with a molecular weight of 30,000 grams per mole. Mid-sized, as proteins go.

  4. ^

    Wikipedia gives a figure of 1.5x10^53 kg total mass of ordinary matter. Most of this is hydrogen atoms, which come 6x10^26 per kilogram. That gives 9*10^79 or ~10^80 total atoms. The existence of non-hydrogen atoms doesn’t change this significantly.

  5. ^

    Apologies for the paywalled article. I’ll quote the portion that interests us in full but wanted to include a link to the paper.

  6. ^

    This is far from the main concern of the paper. In fact, the calculation is a footnote to a figure late in the paper.

  7. ^

    600 Pg carbon = 6 x 10^14 kg C

    1 kg C = 83.333 mol C = 5 x 10^25 C atoms

    600 Pg C = 3 x 10^40 C atoms

  8. ^

    43,000 Pg carbon = 4.3 x 10^16 kg C

    1 kg C = 83.333 mol C = 5 x 10^25 C atoms

    43,000 Pg C = 2 x 10^42 C atoms

  9. ^

    1.85 x 10^21 kg C on earth

    1 kg C = 83.333 mol C = 5 x 10^25 atoms

    9.25 x 10^46 ~ 10^47 C atoms on earth

  10. ^

    Silicon can form some rings and chains analogous to carbon’s, but they tend to be less stable, in large part due to the reduced strength of Si-Si bonds relative to C-C bonds. Sulfur can form chains and rings of various sizes but doesn’t branch well and also suffers from relatively low S-S bond strength.