LW Biology 101 Introduction: Constraining Anticipation

Since the responses to my recent inquiry were positive, I’ve rolled up my sleeves and gotten started. Special thanks to badger for eir comment in that thread, as it inspired the framework used here.

My intent in the upcoming posts is to offer a practical overview of biological topics of both broad-scale importance and particular interest to the Less Wrong community. This will by no means be exhaustive (else I’d be writing a textbook instead, or more likely, you’d be reading one); instead I am going to attempt to sketch what amounts to a map of several parts of the discipline – where they stand in relation to other fields, where we are in the progress of their development, and their boundaries and frontiers. I’d like this to be a continually improving project as well, so I would very much welcome input on content relevance and clarity for any and all posts.

I will list relevant/​useful references for more in-depth reading at the end of each post. The majority of in-text links will be used to provide a quick explanation of terms that may not be familiar or phenomena that may not be obvious. If the terms are familiar to you, you probably do not need to worry about those links. A significant minority of in-text links may or may not be purely for amusement.

It is a popular half-joke that biology is applied chemistry is applied physics is applied math. While it’s certainly necessary to apply all the usual considerations for a chemical system to a biological system or problem, there are some overall complications and themes that specifically (though not uniquely) apply to biological problems, and it is useful to keep them in mind.

1. Biological processes are stochastic.

Cellular-scale chemistry is an event-dense environment, but the abundance of most reactants is generally quite low. (Exceptions typically include oxygen, carbon dioxide, water, and small ions.) Beyond the basic consideration of abundance, there are other layers of regulation that determine whether a given entity, usually a protein, can actually react at any given time, and complicated geometries involved that further decrease the frequency of a given reaction. We therefore must consider the majority of reactions as discrete, and model them stochastically.

If we take a step up to the scale of cells in culture – to give a ballpark idea, we’re talking on the order of 106-109 cells/​mL for yeast or bacteria – the overall flux of nutrients, waste, and other metabolites becomes much more stable across samples under uniform conditions. However, even with relatively well-behaved cells in liquid culture, sample variability is such that it is necessary to take replicate samples for each condition when using these in an experiment.

Taking another (large) step up the complexity hierarchy to consider multicellular organisms (and let’s make them genetically identical, as in laboratory strains of fruit flies or worms), we now have aggregates of individual stochastic processes, which themselves exhibit stochasticity at the organism level. And, as you might expect, the same holds for bigger, more complicated, and genetically non-identical organism.

The functional implications of this behavior are:

  • If we wish to model biochemical processes on a molecular scale, we must account for stochastic behavior.

  • Careful statistical analysis is intrinsically necessary to studying biological systems at all levels.

2. Biological systems are complex networks, and variable deconvolution is nontrivial.

Stochastic processes aren’t especially difficult to model if you have access to accurate probability predictions for the set of possible events… but getting those accurate predictions is a process that in biological systems is painstaking at best.

If you’re trying to model a single node in a biochemical pathway – say, an enzyme catalyzing the cyclization of a linear cholesterol precursor—you have to consider the upstream and downstream reactions’ effect on that node, the interaction of your whole pathway with other biochemical pathways in the cell, and then the chemical environment of that cell, which could be as simple as a uniform flask of culture medium or as complex as a human tissue, which is within a human body, which… you get the idea. (This is somewhat hyperbolic for purposes of illustration. In reality, you can probably get away with assuming either a near-constant culture medium or some known dynamic cycle of states in a multicellular tissue, depending on the application you’re looking at.)

In other words, there are a lot of variables, and almost none of them are fully independent. It is therefore necessary to expend a great deal of time and resources to characterize these variables, relative to non-biological systems. Even for our most-studied, favorite single-celled organisms, whose genomes we’ve sequenced and whose metabolisms we’ve begun to model, there are still huge blank areas in our lists of variable definitions.

I’ll be discussing experimental paradigms and difficulties with specific systems in later posts, as well as the methods used to deal with them.

3. Due to (1) and (2), modeling efforts are heavily limited by computing power.

This is worth mentioning here, but fairly self-explanatory, although it’s also worth noting that a specific consequence of the interconnectedness of system branches is a vast range of time scales on which events occur. This time scale diversity makes system-level modeling a very stiff endeavor.

I’ll go into some of the more successful and potentially successful modeling strategies for various systems in later posts.

4. Information transfer is error-prone in biological systems.

Genes are replicated by a sequential polymerization reaction that constructs a new strand of DNA using the old one as a template. Each time a monomer is added to the new strand, there is a small* chance that the incorrect type of monomer will be added, and a smaller chance that the error will not be recognized and corrected by proofreading mechanisms. Since genes are on the order of thousands of monomer units (‘bases’) long, in aggregate these mutations have a large chance of at least some occurrence over the course of the cell’s lifetime. (Mutation probabilities differ by organism, by gene, and in accordance to a host of other factors.)

Aside from this familiar form of error in the genetic code itself, abnormalities can also occur in how DNA is partitioned between new units in a dividing cell, and on a less permanent level, there are many phenomena that amount to ‘miscommunication’ between parts of a cell, or whole cells or organs.

Functionally speaking, this rather patchy scheme of information fidelity gives rise to the phenomenon of evolution, a great deal of useful experimental methods oriented around inducing mutations, and the occasional thorn in the side of a researcher who has suddenly found eir cell line has lost a trait that it needed to have or gained one it didn’t. On the broad scale, it also makes the study of biological systems something of a moving target, particularly in mutation-prone systems such as human pathogens.

* This chance has actually been estimated for various situations, and tables of these estimates are used heavily in mapping evolution. (NB: This inference is descriptive only; it is NOT predictive.)

5. Biological processes are limited to life-sustaining conditions.

…Or, in chemical engineering terms, it is not advisable to blow up your reactor.

The kinds of chemistry we can convince cells to do for us are those that are not toxic to the cell, and that do not completely overwhelm the cell’s ability to handle its own biochemical needs. There are ways to partially circumvent this – you can sometimes get away with slightly toxic products in an engineered metabolic pathway, or if you have to completely hijack some part of the cell’s essential machinery you can sometimes provide it with whatever it’s missing externally – but it’s a rule that can only be bent so far before you’ve got dead cells on your hands. (And chances are, unless they were cancer cells, that’s not what you wanted.)

Biological systems also exhibit a high degree of organization, allowing the partitioning of microenvironments necessary to support the full spectrum of biochemical reactions. The maintenance of this organization is just as vital as temperature and pH homeostasis, and the avoidance of toxin buildup.

6. Biological processes are transport-constrained.

For similar, but more complex reasons as (1), chemical transport is a Big Deal in biology from the cellular level all the way to a clinical setting. Due to (5), we can’t just put everything in a blender and assume perfect mixing (though I’m sure some people would be happy to try), so to a certain extent on a cellular level (depending on the complexity of your cells and what you’re trying to make them do), and to an all-consuming extent on an organism level, biological problems contain transport problems.

The easiest illustrative example of this is to consider cancer, and conventional chemotherapy treatment. You’ve got a patient with a tumor, and they’re receiving chemotherapy. The chemotherapeutic chemicals are injected into the bloodstream, which they then ride through the body to the tumor, and get to work. Except… since they took the scenic route getting there, they’ve also come into contact with a lot of erstwhile healthy tissue, which they have also attacked, producing the host of nasty side effects that comes with chemo. You could inject the drugs directly into the tumor or the tissue surrounding it, but then you’ve got to hope the drugs manage to diffuse far enough into the tumor to do some good despite the fact that they aren’t riding any blood vessels. This is the sort of transport-focused engineering problem that is necessary to solve in some capacity for nearly all clinical applications.

Given these considerations, much of our most productive, ground-breaking research** in biology and bioengineering today is focused on:

  • Finding new ways to model systems of interest

  • Finding more efficient ways to update our existing models and understanding (more efficient variable characterization)

  • Designing streamlined, well-behaved systems based on our emerging understanding of how all these processes work individually and link together (‘plug and play’ biology)

**Reflects my engineering-slanted opinion on the future of biology, as well as that of most people who are doing bioinformatics and are excited about it, but could be open to contrary opinion. ‘Fastest-developing’ would perhaps be a better description.

Useful/​interesting references consulted for this section:

Lehninger Principles of Biochemistry, 4th ed., by David L. Nelson and Michael M. Cox

Molecular Biology of the Cell, 4th ed., by Alberts, Johnson, Lewis, Raff, Roberts, and Walter

Receptors: Models for Binding, Trafficking, and Signaling, by Douglas A. Lauffenburger and Jennifer J. Linderman