8 examples informing my pessimism on uploading without reverse engineering

Steven Byrnes3 Nov 2023 20:03 UTC

111 points

Whole Brain Emulation AI World Optimization

(If you’ve already read everything I’ve written, you’ll find this post pretty redundant. See especially my old posts Building brain-inspired AGI is infinitely easier than understanding the brain, and Randal Koene on brain understanding before whole brain emulation, and Connectomics seems great from an AI x-risk perspective. But I’m writing it anyway mainly in response to this post from yesterday.)

1. Background / Context

1.1 What does uploading (a.k.a. Whole Brain Emulation (WBE)) look like with and without reverse-engineering?

There’s a view that I seem to associate with Davidad and Robin Hanson, along with a couple other people I’ve talked to privately. (But I could be misunderstanding them and don’t want to put words in their mouths.) The view says: if we want to do WBE, we do not need to reverse-engineer the brain.

For an example of what “reverse-engineering the brain” looks like, I can speak from abundant experience: I often spend all day puzzling over random questions like: Why are there oxytocin receptors in certain mouse auditory cortex neurons? Like, presumably Evolution put those receptors there for a reason—I don’t think that’s the kind of thing that appears randomly, or as an incidental side-effect of something else. (Although that’s always a hypothesis worth considering!) Well, what is that reason? I.e., what are those receptors doing to help the mouse survive, thrive, etc., and how are they doing it?

…And once I have a working hypothesis about that question, I can move on to hundreds or even thousands more “why and how” questions of that sort. I seem to find the activity of answering these questions much more straightforward and tractable (and fun!) than do most other people—you can decide for yourself whether I’m unusually good at it, or deluded.

For an example of what uploading without reverse-engineering would look like, I think it’s the idea that we can figure out the input-output relation of each neuron, and we can measure how neurons are connected to each other, and then at the end of the day we can simulate a human brain doing whatever human brains do.

Here’s Robin Hanson arguing for the non-reverse-engineering perspective in Age of Em:

The brain does not just happen to transform input signals into state changes and output signals; this transformation is the primary function of the brain, both to us and to the evolutionary processes that designed brains. The brain is designed to make this signal processing robust and efficient. Because of this, we expect the physical variables (technically, “degrees of freedom”) within the brain that encode signals and signal-relevant states, which transform these signals and states, and which transmit them elsewhere, to be overall rather physically isolated and disconnected from the other far more numerous unrelated physical degrees of freedom and processes in the brain. That is, changes in other aspects of the brain only rarely influence key brain parts that encode mental states and signals.
We have seen this disconnection in ears and eyes, and it has allowed us to create useful artificial ears and eyes, which allow the once-deaf to hear and the once-blind to see. We expect the same to apply to artificial brains more generally. In addition, it appears that most brain signals are of the form of neuron spikes, which are especially identifiable and disconnected from other physical variables.
If technical and intellectual progress continues as it has for the last few centuries, then within a millennium at the most we will understand in great detail how individual brain cells encode, transform, and transmit signals. This understanding should allow us to directly read relevant brain cell signals and states from detailed brain scans. After all, brains are made from quite ordinary atoms interacting via rather ordinary chemical reactions. Brain cells are small, and have limited complexity, especially within the cell subsystems that manage signal processing. So we should eventually be able to understand and read these subsystems.
As we also understand very well how to emulate any signal processing system that we can understand, it seems that it is a matter of when, not if, we will be able to emulate brain cell signal processing. And as the signal processing of a brain is the sum total of the signal processing of its brain cells, an ability to emulate brain cell signal processing implies an ability to emulate whole brain signal processing, although at a proportionally larger cost.

In other words:

For uploading-without-reverse-engineering (the thing I’m pessimistic about), imagine source code that looks vaguely like “Neuron 782384364 has the following intrinsic neuron property profile: {A.28, B.572, C.37, D.1, E.49,…}. It is connected to neuron 935783951 through synapse type {Z.58,Y.82,…} and to neuron 572379349 through synapse type…”.
Whereas, for uploading-with-reverse-engineering (the thing I’m more optimistic about), imagine source code that looks vaguely like an unusually complicated ML source code repository, full of human-legible learning algorithms, and other processes and stuff, but where everything has sensible variable names like “value function” and “prediction error vector” and “immune system status vector”. And then the learning algorithms are all initialized with “trained models” gathered from a scan of an actual human brain, and the parameters of those trained models are giant illegible databases of numbers, comprising this particular person’s life experience, beliefs, desires, etc.

1.2 Importantly, both sides agree on “step 1”: Let’s go get us a human connectome!

The reverse-engineering route that I prefer is:

Step 1: Measure a human connectome. The more auxiliary data, the better.
Step 2: Reverse-engineer how the human brain works.
Step 3: Now that we understand how everything works, we might recognize that our scan was missing essential data, in which case, we go back and measure it.
Step 4: Uploads!

The non-reverse-engineering route that I’m pessimistic about is:

Step 1: Measure a human connectome. The more auxiliary data, the better.
Step 2: Also do lots of measurements of neurons, organoids, etc. to fully characterize the input-output functions of neurons.
Step 3: Maybe iterate? Not sure the details.
Step 4: Uploads!

Anyway, I want to emphasize that we all agree on “step 1”. Also, I think “step 1” is the hard, slow, and expensive part, where maybe we’re building out giant warehouses full of microscopes, or whatever. So let’s do it!

If there are two disjunctive positive stories about what happens after step 1, on which people disagree, then fine! That’s all the more reason to do step 1!

(More discussion in my post Connectomics seems great from an AI x-risk perspective.)

1.3 This is annoying because I want to believe in brain uploading without reverse engineering

Let’s say we had an uninterpretable “binary blob” that could perfectly simulate a particular adult human who is very smart and nice. But there’s nothing else you can do with it. If you change random bits and try to run it, it mostly just breaks.

In terms of AGI safety, that’s a pretty great situation! We can run large numbers of sped-up people, and have them take their time to build aligned AGI, or whatever.

By contrast, let’s say we have a human upload by the reverse-engineering route. We can do the same thing, with large numbers of sped-up people. Or we can start doing extraordinarily dangerous experiments where we modify the upload to make it more powerful and enterprising. (What if we train a new one but where the cortex has 10× more neurons? And we replace all the normal innate drives with just a drive to maximize share price? Etc.)

Maybe you’re thinking: “Well, OK, but we’ll do the safe thing, not the dangerous thing.” But then I say: “What do you mean by “we”?” If there’s a top-secret project with good internal controls, then OK, sure. But if the reverse-engineering results gets published, people will do all sorts of crazy experimentation. And keeping secrets for a long time is hard. I figure that, if we can get a period where we do have uploads but don’t have non-upload brain-like-AGI, then this period would last at most a couple years, absent weird possibilities like “the uploads launch a coup”. More discussion of this is in my connectomics post.

So, I wish I believed that there was a viable path to making an uninterpretable binary blob that emulates a particular human, and can do nothing else. Alas! I don’t believe that.

2. Main text: 8 examples informing my pessimism of emulating the brain without understanding it

2.1 The mechanism by which oxytocin neurons emit synchronized pulses for milk let-down

Oxytocin neurons in the supraoptic nucleus of the hypothalamus have synchronous bursts every few minutes during suckling, which dumps a pulse of oxytocin into the bloodstream that triggers the “milk ejection reflex” (a.k.a. “milk let-down”). In his wonderful book on the hypothalamus (my review here), Gareth Leng devotes the better part of 30 pages to the efforts of his group and others to figure out how the neurons pulse:

The milk-ejection reflex had seemed to be a product of a sophisticated neuronal network that transformed a fluctuating sensory input (from the suckling of hungry young) into stereotyped bursts that were synchronized among the entire population of oxytocin cells.
…These experiments were the first convincing demonstration of a physiological role for any peptide in the brain. They did not explain the milk-ejection reflex but defined the questions that had to be answered before it could be explained. Building that explanation took another twenty years. The questions posed had no precedent in our understanding. Where did the oxytocin that was released in the supraoptic nucleus come from, if not from synapses? What triggered its release, if that release was not governed by spiking activity? What synchronized the oxytocin cells, if they were not linked by either synapses or electrical junctions? [emphasis added]

For spoilers: here’s his 2008 computational model, involving release of both oxytocin and endocannabinoids out of dendrites (contrary to the standard story where dendrites are inputs), volume transmission (as opposed to synaptic transmission), and some other stuff.

(I’m pretty sure that the word “explanation” in the quote above should be understood as “reproduction of the high-level phenomenon in a bottom-up model”, as opposed to “intuitive conceptual explanation of how that reproduction works”. I think the 2008 computational model was the first model for both of those.)

The moral of the story (I claim) is: If we’re trying to reverse-engineer the high-level behavior, it’s easy! To a first approximation, we can just say: “Well I don’t know exactly how, but these cells evidently make short pulses of oxytocin every few minutes when there are such-and-such indications of suckling”. Whereas if we’re trying to reproduce the high-level behavior (without knowing what it is) starting from properties of the individual neurons involved, then this is what seems to have taken these researchers decades of work, despite these neurons being unusually easy to access experimentally, and despite the researchers knowing exactly what they’re trying to explain.

2.2 Neurons with spike-timing-dependent plasticity, but the “timing” can involve 8-hour-long gaps

Conditioned Taste Aversion (CTA) is when you eat or drink something at time $t_{1}$ , then get nauseous at a later time $t_{2}$ , and wind up with an aversion to what you ate or drank. The interesting thing is that the aversion does not form if t₂ is just a few seconds or minutes after $t_{1}$ , nor if it’s a few days after $t_{1}$ , but it does form if it’s a few hours after $t_{1}$ .

Adaikkan & Rosenblum (2015) found a mechanism that seems to explain this. It involves neurons in the insular cortex. Novel tastes activate two molecular mechanisms (mumble mumble phosphorylation) that start 15-30 minutes after the taste, and unwind after 3 hours and 8 hours respectively. Presumably, these then interact with later nausea-related signals to enable CTA.

The moral of the story (I claim) is: If you try to characterize neurons in a controlled setting under the assumption that their behavior now depends on what was happening in the previous few seconds, but not what was happening five hours ago, then you might find that your dataset makes no sense.

As in the rest of this section, this specific example might not seem important—who cares if our uploads don’t have conditioned taste aversion?—but I suspect that this is one example of a broader category. For example, in the course of normal thinking, I think it’s easier to recall and use a concept if you were thinking about it 3 hours ago than if you haven’t thought about since yesterday. Capturing this phenomenon may be important for enabling our uploads to do good scientific research etc. I seem to recall reading a paper that suggested a molecular mechanism underlying this phenomenon (or something like it), but I can’t immediately find it.

2.3 Really weird neurons and synapses

There’s a funny thing called a “synaptic triad” in the thalamus. It’s funny because it’s three neurons connecting rather than the usual two, and it’s also funny because one of those neurons is “backwards”, with dendrites being the output rather than the input.

I suspect that there are many more equally weird things in the brain—this just happens to be one that I’ve come across.

The moral of the story (I claim) is: Let’s say that the uploading-without-understanding route involves separately characterizing N different things (e.g. each different type of neuron), and the reverse-engineering route involves separately characterizing M different things (e.g. each different functional component / circuit comprising how the brain’s algorithms work). Maybe you have the idea in your head that N<<M, because neurons comprise a small number of components, and they are configured into a dizzying variety of little machines throughout the brain that do different things.

If so, what I’m suggesting through this example that maybe instead N≈M, because there is also a dizzying variety of weird low-level components—i.e., N is not as small as you might think.

Separately, I think M can’t be more than maybe hundreds to low thousands, because there are only like 20,000 protein-coding genes, and they have to not only specify all the irreducible complexity of the brain’s algorithm, but also build everything else in the brain and body.

2.4 A book excerpt on simulating two particular circuits

I’m a bit hesitant to include this because I haven’t checked it, but if you trust the book The Idea of the Brain, here’s an excerpt:

…Despite having a clearly established connectome of the thirty-odd neurons involved in what is called the crustacean stomatogastric ganglion, [Eve] Marder’s group cannot yet fully explain how even some small portions of this system function. …in 1980 the neuroscientist Allen Selverston published a much-discussed think piece entitled “Are Central Pattern Generators Understandable?”...the situation has merely become more complex in the last four decades...The same neuron in different [individuals] can also show very different patterns of activity—the characteristics of each neuron can be highly plastic, as the cell changes its composition and function over time...
…Decades of work on the connectome of the few dozen neurons that form the central pattern generator in the lobster stomatogastric system, using electrophysiology, cell biology and extensive computer modelling, have still not fully revealed how its limited functions emerge.
Even the function of circuits like [frog] bug-detecting retinal cells—a simple, well-understood set of neurons with an apparently intuitive function—is not fully understood at a computational level. There are two competing models that explain what the cells are doing and how they are interconnected (one is based on a weevil, the other on a rabbit); their supporters have been thrashing it out for over half a century, and the issue is still unresolved. In 2017 the connectome of a neural substrate for detecting motion in Drosophila was reported, including information about which synapses were excitatory and which were inhibitory. Even this did not resolve the issue of which of those two models is correct.

The moral of the story (I claim) is: Building a bottom-up model that reproduces high-level behavior from low-level component neurons is awfully hard, even if we know what the high-level behavior is, and can thus iterate and iterate when our initial modeling attempts aren’t “working”. If we don’t know the high-level behavior we’re trying to explain, and thus don’t know whether our initial modeling attempts are sufficient or not—which is a necessary part of the uploading-without-understanding plan—we should expect it to be that much harder.

2.5 Failures to upload C. elegans

C. elegans only has 302 neurons, and abundant data. But we still can’t reproduce all its high-level behavior in a bottom-up model, if I understand correctly.

I’m actually pretty unfamiliar with the details, but see discussion at Whole Brain Emulation: No Progress on C. elegans After 10 Years (including the comments section).

I recall hearing somewhere [EDIT: there’s a citation here] that part of the reason that this has been a challenge is the thing I’ll talk about next:

2.6 Neurons can permanently change their behavior via storing information in the nucleus (e.g. gene expression)

See e.g. papers by Sam Gershman, David Glanzman, Randy Gallistel. Writing this section feels weird for me, because usually I’m arguing the opposite side on this topic: I subscribe to the conventional wisdom that human learning mainly involves permanent changes in and around synapses, not in the cell nucleus, and think the people in the previous sentence go way too far in their heterodox case to the contrary. But “information storage in the nucleus is not the main story in human intelligence” (which I believe) is different from “information storage in the nucleus doesn’t happen at all, or has such a small effect that we can ignore it and still get the same high-level behavior” (which I don’t believe).

The moral of the story (I claim) is: If you don’t know what’s going on in the brain and how, then you don’t know if you’re measuring the right things, until ding, your simulation reproduce the high-level behavior. Before the agreement happens, you just have a bad model and no clue how to fix it. (Even after you’re getting agreement, you don’t know if you have sufficiently high fidelity agreement, if you don’t know what the component in question is “supposed” to do in the larger design.) I’m not sure if it’s even possible to measure gene expression etc. in human brain slices. I doubt it’s easy! But if we understand the role of gene expression in how such-and-such part of the brain works, we can reason about what if anything we’re losing by leaving it out, and see if there’s any easier workaround. If we don’t understand what it’s doing, then we’re sitting there staring at a simulation that doesn’t match the data, and we don’t know how critical a problem that is, and whether we’re missing anything else.

2.7 Glial cell gene expression that changes over hours

I was just reading about this yesterday:

Astrocytes in the [suprachiasmatic nucleus of the hypothalamus], for instance, show rhythmic expression of clock genes and influence circadian locomotor behavior (25) [source]

This is kinda a combination of Section 2.3 above (there are lots of idiosyncratic low-level components to characterize, as opposed to lots of circuits made from the same basic neuronal building blocks) and Section 2.6 above (you need to be simulating gene expression if you want to get the right high-level behavior).

2.8 Metabotropic (as opposed to ionotropic) receptors can have almost arbitrary effects over arbitrary timescales

If I understand correctly, the brain uses lots of ionotropic receptors, whose effects are to immediately and locally change the flow of ions into a neuron. Great! That’s easy to model.

Unfortunately, the brain also uses lots of metabotropic receptors (a.k.a. G-protein-coupled receptors), whose effects are—if I understand correctly—extraordinarily variable, indeed almost arbitrary. Basically, when they attach to a ligand, they then set off a signaling cascade, which can ultimately have pretty much any effect on the cell over any timescale. It might change the neuron’s synaptic plasticity rules, it might change gene expression, it might increase or decrease the cell’s production of neuropeptides, you name it. (If it’s not already obvious, these 8 examples are not mutually exclusive—many of the previous subsections involve metabotropic receptors.)

The moral of the story (I claim) is: If your goal is to get the right high-level behavior using a bottom-up model of low-level components, then you presumably need to figure out what all these signaling cascades are, experimentally, for every neuron with metabotropic receptors. Again I’m not an expert, but that seems very hard.