I’m an AGI safety researcher in Boston, MA, USA, with a particular focus on brain algorithms. See https://sjbyrnes.com/agi.html Email: firstname.lastname@example.org
Note that assumption 4 also needs to include a claim that we figure out the “secret sauce” sooner than other paths to AGI, despite lots of effort being put into them already.
Yup, “time until AGI via one particular path” is always an upper bound to “time until AGI”. I added a note, thanks.
These seem easily like the load-bearing part of the argument; I agree the stuff you listed follows from these assumptions but why should these assumptions be true?
The only thing I’m arguing in this particular post is “IF assumptions THEN conclusion”. This post is not making any argument whatsoever that you should put a high credence on the assumptions being true. :-)
I think either “Whole Brain Emulation” or “Mind Uploading” would be a fine tag for the union of the two. How does merging or changing-into-wiki-only work? Does an admin have to do it?
The “Whole Brain Emulation” tag and the “Mind Uploading” tag seem awfully similar, and in particular, there currently seems to be no rhyme or reason to which articles are tagged with which of these two tags. Maybe they should be merged? (Sorry if I’m posting this in the wrong place.)
it’s not fair to treat all seconds of life as equally influential/important for learning
I agree and didn’t mean to imply otherwise.
In terms of what we’re discussing here, I think it’s worth noting that there’s a big overlap between “sensitive windows in such-and-such part of the cortex” and “the time period when the data is external not synthetic”.
O’Reilly (1,2) simulated visual cortex development, and found that their learning algorithm flailed around and didn’t learn anything, unless they set it up to learn the where pathway first (with the what pathway disconnected), and only connect up the what pathway after the where pathway training has converged to a good model. (And they say there’s biological evidence for this.) (They didn’t have any retinal waves, just “real” data.)
As that example illustrates, there’s always a risk that a randomly-initialized model won’t converge to a good model upon training, thanks to a bad draw of the random seed. I imagine that there are various “tricks” that reduce the odds of this problem occurring—i.e. to make the loss landscape less bumpy, or something vaguely analogous to that. O’Reilly’s “carefully choreographed (and region-dependent) learning rates” is one such trick. I’m very open-minded to the possibility that “carefully choreographed synthetic data” is another such trick.
Anyway, I don’t particularly object to the idea “synthetic data is useful, and plausibly if you take an existing organism and remove its synthetic data it would get messed up”. I was objecting instead to the idea “synthetic data is a major difference between the performance of brains and deep RL, and thus maybe with the right synthetic data pre-training, deep RL would perform as well as brains”. I think the overwhelming majority of training on human brains involves real data—newborns don’t have object permanence or language or conceptual reasoning or anything like that, and presumably they build all those things out of a diet of actual not synthetic data. And even if you think that the learning algorithm of brains and deep RL is both gradient descent, the inference algorithm is clearly different (e.g. brains use analysis-by-synthesis), and the architectures are clearly different (e.g. brains are full of pairs of neurons where each projects to the other, whereas deep neural nets almost never have that). These are two fundamental differences that persist for the entire lifetime / duration of training, unlike synthetic data which only appears near the start. Also, the ML community has explored things like deep neural net weight initialization and curriculum learning plenty, I would just be very surprised if massive transformative performance improvements (like a big fraction of the difference between where we are and AGI) could come out of those kinds of investigation, as opposed to coming out of different architectures and learning algorithms and training data.
The I part I’ll agree with is: If we look at a dial, we can ask the question:
If there’s an AGI with a safety-capabilities tradeoff dial, to what extent is the dial’s setting externally legible / auditable to third parties?
More legible / auditable is better, because it could help enforcement.
I agree with this, and I have just added it to the article. But I disagree with your suggestion that this is counter to what I wrote. In my mind, it’s an orthogonal dimension along which dials can vary. I think it’s good if the dial is auditable, and I think it’s also good if the dial corresponds to a very low alignment tax rate.
I interpret your comment as saying that the alignment tax rate doesn’t matter because there will be enforcement, but I disagree with that. I would invoke an analogy to actual taxes. It is already required and enforced that individuals and companies pay (normal) taxes. But everyone knows that a 0.1% tax on Thing X will have a higher compliance rate than an 80% tax on Thing X, other things equal.
After all, everyone is making decisions about whether to pay the tax, versus not pay the tax. Not paying the tax has costs. It’s a cost to hire lawyers that can do complicated accounting tricks. It’s a cost to run the risk of getting fined or imprisoned. It’s a cost to pack up your stuff and move to an anarchic war zone, or to a barge in the middle of the ocean, etc. It’s a cost to get pilloried in the media for tax evasion. People will ask themselves: are these costs worth the benefits? If the tax is 0.1%, maybe it’s not worth it, maybe it’s just way better to avoid all that trouble by paying the tax. If the tax is 80%, maybe it is worth it to engage in tax evasion.
So anyway, I agree that “there will be good enforcement” is plausibly part of the answer. But good enforcement plus low tax will sum up to higher compliance than good enforcement by itself. Unless you think “perfect watertight enforcement” is easy, so that “willingness to comply” becomes completely irrelevant. That strikes me as overly optimistic. Perfect watertight enforcement of anything is practically nonexistent in this world. Perfect watertight enforcement of experimental AGI research would strike me as especially hard. After all, AGI research is feasible to do in a hidden basement / anarchic war zone / barge in the middle of the ocean / secret military base / etc. And there are already several billion GPUs untraceably dispersed all across the surface of Earth.
Um, maybe 3% of the time the defense deathbots have a design defect, and then it doesn’t matter whether they’re ready earlier vs later???
But yeah OK I get where you’re coming from now. (Not an expert.)
This paper estimates that the human retina conveys visual information to the rest of the brain at 1e7 bits/second. I haven’t read the paper though. It’s a bit tricky to compare that to pixels anyway, because I think the retina itself does some data compression. I guess we have 6 million cones, which would be ~2M of each type, so maybe vision-at-any-given-time is ballpark comparable to the information content in a 1 megapixel color image??
I continue to be highly skeptical of the idea that protection against infection declines by an order of magnitude, yet protection against hospitalization remains unchanged, which implies that the hospitalization rate for infections went down by an order of magnitude, and yes I’ve confirmed that such things are physically possible but it’s still downright bizarre.
I’m thinking of it like, vaccination builds you an awesome EMP gun to repel the invading robot army. But after a couple months if no robots have ever shown up at my fortress, c’mon, I’m not realistically gonna keep the anti-robot EMP gun charged and manned on my roof turret 24⁄7, instead I’m obviously going to bring it down to the basement armory for storage.
Then when the robots do come, it’ll take a few minutes for me to bring the gun back up to the roof and mount it to the turret. And in those few minutes, maybe the robots will zap their way through some of my defenses at the far periphery of the fortress, like maybe my first layer or two of barbed wire and landmines. But then once my EMP gun is mounted on the roof turret, oh man, those robots are toast.
Thanks! Oh it’s fine, we can just have a normal discussion. :) Just let me know if I’m insulting your work or stressing you out. :-)
I believe that spontaneous activity is quite rich in information.
Sure. But real-world sensory data is quite rich in information too. I guess my question is: What’s the evidence that the spontaneous activity / “synthetic data” (e.g. retinal waves) is doing things that stimulated activity / “actual data” (e.g. naturalistic visual scenes) can’t do by itself? E.g. “the statistics of spontaneous activity and stimulus-evoked activity are quite similar and get more similar over development” seems to be evidence against the importance of the data being synthetic, because it suggests that actual data would also work equally well. So that would be the “shave a few days off development” story.
In conjunction with what I believe about spontaneous activity inducing very strong & informative priors …
The brain (well, cortex & cerebellum, not so much the brainstem or hypothalamus) does “online learning”. So my “prior” keeps getting better and better. So right now I’m ~1,200,000,000 seconds old, and if I see some visual stimulus right now, the “prior” that I use to process that visual stimulus is informed by everything that I’ve learned in the previous 1,199,999,999 seconds of life, oh plus the previous 21,000,000 seconds in the womb (including retinal waves), plus whatever “prior” you think was hardcoded by the genome (e.g. cortico-cortico connections between certain pairs of regions are more likely to form than other pairs of regions, just because they’re close together and/or heavily seeded at birth with random connections).
Anyway, the point is, I’m not sure if you’re personally doing this, but I do sometimes see a tendency to conflate “prior” with “genetically-hardcoded information”, especially within the predictive processing literature, and I’m trying to push back on that. I agree with the generic idea that “priors are very important” but that doesn’t immediately imply that the things your cortex learns in 10 days (or whatever) of retinal waves are fundamentally different from and more important than the things your cortex learns in the subsequent 10 days of open-eye naturalistic visual stimulation. I think it’s just always true that the first 10 days of data are the prior for the 11th day, and the first 11 days of data are the prior for the 12th day, and the first 12 days of data … etc. etc. And in any particular case, that prior data may be composed of exogenous data vs synthetic data vs some combination of both, but whatever, it’s all going into the same prior either way.
Great post, thanks for writing it!!
The links to http://modem2021.cs.nuigalway.ie/ are down at the moment, is that temporary, or did the website move or something?
Is it fair to say that all the things you’re doing with multi-objective RL could also be called “single-objective RL with a more complicated objective”? Like, if you calculate the vector of values V, and then use a scalarization function S, then I could just say to you “Nope, you’re doing normal single-objective RL, using the objective function S(V).” Right?
(Not that there’s anything wrong with that, just want to make sure I understand.)
…this pops out at me because the two reasons I personally like multi-objective RL are not like that. Instead they’re things that I think you genuinely can’t do with one objective function, even a complicated one built out of multiple pieces combined nonlinearly. Namely, (1) transparency/interpretability [because a human can inspect the vector V], and (2) real-time control [because a human can change the scalarization function on the fly]. Incidentally, I think (2) is part of how brains work; an example of the real-time control is that if you’re hungry, entertaining a plan that involves eating gets extra points from the brainstem/hypothalamus (positive coefficient), whereas if you’re nauseous, it loses points (negative coefficient). That’s my model anyway, you can disagree :) As for transparency/interpretability, I’ve suggested that maybe the vector V should have thousands of entries, like one for every word in the dictionary … or even millions of entries, or infinity, I dunno, can’t have too much of a good thing. :-)
Interestingly, I think you put a lot more importance on this “synthetic data” than I do. I want to call the “synthetic data” thing “a neat trick that maybe speeds up development by a few days or something”. If retinal waves are more important than that, I’d be inclined to think that they have some other role instead of (or in addition to) “synthetic data”.
It seems to me that retinal waves just don’t carry all that many bits of information, not enough for it to be really spiritually similar to the ML notion of “pretraining a model”, or to explain why current ML models need so much more data. I mean, compare the information content of retinal waves versus the information content of an adult human brain, or even adult rat brain. It has to be a teeny tiny fraction of a percent.
If I was talking about the role of evolution, I would talk instead about how evolution designed the learning algorithm, and inference algorithm, and neural architecture, and hyperparameter settings (some of which are dynamic depending on both where-we-are-in-development and moment-by-moment arousal level etc.), and especially the reward function. And then I would throw “oh and evolution also designed synthetic data for pretraining” into a footnote or something. :-P
I’m not sure exactly why we disagree on this, or how to resolve it. You’re obviously more knowledgeable about the details here, I dunno.
I was writing a kinda long reply but maybe I should first clarify: what do you mean by “model”? Can you give examples of ways that I could learn something (or otherwise change my synapses within a lifetime) that you wouldn’t characterize as “changes to my mental model”? For example, which of the following would be “changes to my mental model”?
I learn that Brussels is the capital of Belgium
I learn that it’s cold outside right now
I taste a new brand of soup and find that I really like it
I learn to ride a bicycle, including
maintaining balance via fast hard-to-describe responses where I shift my body in certain ways in response to different sensations and perceptions
being able to predict how the bicycle and me would move if I swung my arm around
I didn’t sleep well so now I’m grumpy
FWIW my inclination is to say that 1-4 are all “changes to my mental model”. And 5 involves both changes to my mental model (knowing that I’m grumpy), and changes to the inputs to my mental model (I feel different “feelings” than I otherwise would—I think of those as inputs going into the model, just like visual inputs go into the model). Is there anything wrong / missing / suboptimal about that definition?
This one kinda confuses me. I’m of the opinion that the human brain is “constructed with a model explicitly, so that identifying the model is as simple as saying “the model is in this sub-module, the one labelled ‘model’”.” Of course the contents of the model are learned, but I think the question of whether any particular plastic synapse is or is not part of the information content of the model will have a straightforward yes-or-no answer. If that’s right, then “it’s hard to find the model (if any) in a trained model-free RL agent” is a disanalogy to “AIs learning human values”. It would be more analogous to just train a MuZero clone, which has a labeled “model” component, instead of training a model-free RL.
And then looking at weights and activations would also be disanalogous to “AIs learning human values”, since we probably won’t have those kinds of real-time-brain-scanning technologies, right?
Sorry if I’m misunderstanding.
I think the way I would say it, which might or might not be exactly the same as what you’re saying, is:
Brains involve learning algorithms, trained on predictive loss (among other things). Learning algorithms need input data. In the case of rat vision, you don’t get much “real” input data until quite late in the game (14 days after birth). If that’s when the vision learning algorithm starts from scratch, then it would take even longer beyond those 14 days before the learning algorithm has produced a decent predictive model, and there’s risk that the predictive model will still kinda suck when the rat is going out into the world and relying on it for decision-making.To help mitigate that problem, the learning process kicks off earlier. Before there’s any real visual data yet, there are retinal waves. We can think of these as creating synthetic input data that can start (pre)training the learning algorithms. Obviously, the synthetic data is not the same as actual real-world naturalistic vision data; therefore the model will obviously need to be further trained on the real data once we have it. But still, if the synthetic data is similar enough to the real data (in the ways that matter), this retinal wave pre-training could be better than nothing.
Brains involve learning algorithms, trained on predictive loss (among other things). Learning algorithms need input data. In the case of rat vision, you don’t get much “real” input data until quite late in the game (14 days after birth). If that’s when the vision learning algorithm starts from scratch, then it would take even longer beyond those 14 days before the learning algorithm has produced a decent predictive model, and there’s risk that the predictive model will still kinda suck when the rat is going out into the world and relying on it for decision-making.
To help mitigate that problem, the learning process kicks off earlier. Before there’s any real visual data yet, there are retinal waves. We can think of these as creating synthetic input data that can start (pre)training the learning algorithms. Obviously, the synthetic data is not the same as actual real-world naturalistic vision data; therefore the model will obviously need to be further trained on the real data once we have it. But still, if the synthetic data is similar enough to the real data (in the ways that matter), this retinal wave pre-training could be better than nothing.
Anyway, if that’s what you’re saying, I view it as a plausible hypothesis. (But it’s also possible retinal waves are doing something entirely different, I wouldn’t know either way.)
I agree that if there are many paths to AGI, then the time-to-AGI is the duration of the shortest one, and therefore when I talk about one specific scenario, it’s only an upper bound on time-to-AGI.
(Unless we can marshal strong evidence that one path to AGI would give a better / safer / whatever future than another path, and then do differential tech development including trying to shift energy and funding away from the paths we don’t like. We don’t yet have that kind of strong evidence, unfortunately, in my opinion. Until that changes, yeah, I think we’re just gonna get whatever kind of AGI is easiest for humans to build.)
I guess I’m relatively skeptical about today’s most popular strands of deep ML research leading to AGI, at least compared to the median person on this particular web-forum. See here for that argument. I think I’m less skeptical than the median neuroscientist though. I think it’s just really hard to say that kind of thing with any confidence. And also, even if it turns out that deep neural networks can’t do some important-for-intelligence thing X, well somebody’s just gonna glue together a deep neural network with some other algorithm that does X. And then we can have some utterly pointless semantic debate about whether it’s still fundamentally a deep neural network or not. :-)
Oh wow, cool! I was still kinda confused when I wrote this post and comment thread above, but a couple months later I wrote Big Picture of Phasic Dopamine which sounds at least somewhat related to what you’re talking about and in particular talks a lot about basal ganglia loops.
Oh, except that post leaves out the cerebellum (for simplicity). I have a VERY simple cerebellum story (see the one-sentence version here) … I’ve done some poking around the literature and talking to people about it, but anyway I currently still stand by my story and am still confused about why all the other cerebellum-modelers make things so much more complicated than that. :-P
We do sound on the same page … I’d love to chat, feel free to email or DM me if you have time.
Ah, you mean that “alignment” is a different problem than “subhuman and human-imitating training safety”? :P
Ah, you mean that “alignment” is a different problem than “subhuman and human-imitating training safety”? :P
“Quantilizing from the human policy” is human-imitating in a sense, but also superhuman. At least modestly superhuman—depends on how hard you quantilize. (And maybe very superhuman in speed.)
If you could fork your brain state to create an exact clone, would that clone be “aligned” with you? I think that we should define the word “aligned” such that the answer is “yes”. Common sense, right?
Seems to me that if you say “yes it’s aligned” to that question, then you should also say “yes it’s aligned” to a quantilize-from-the-human-policy agent. It’s kinda in the same category, seems to me.
Hmm, Stuart Armstrong suggested here that “alignment is conditional: an AI is aligned with humans in certain circumstances, at certain levels of power.” So then maybe as you quantilize harder and harder, you get less and less confident in that system’s “alignment”?
(I’m not sure we’re disagreeing about anything substantive, just terminology, right? Also, I don’t actually personally buy into this quantilization picture, to be clear.)
The biologist answer there seems to be question-begging
Yeah, I didn’t bother trying to steelman the imaginary biologist. I don’t agree with them anyway, and neither would you.
(I guess I was imagining the biologist belonging to the school of thought (which again I strongly disagree with) that says that intelligence doesn’t work by a few legible algorithmic principles, but is rather a complex intricate Rube Goldberg machine, full of interrelated state variables and so on. So we can’t just barge in and make some major change in how the step-by-step operations work, without everything crashing down. Again, I don’t agree, but I think something like that is a common belief in neuroscience/CogSci/etc.)
it seems hard for parallelization in learning to not be useful … why am I harmed …
I agree with “useful” and “not harmful”. But an interesting question is: Is it SO helpful that parallelization can cut the serial (subjective) time from 30 years to 15 years? Or what about 5 years? 2 years? I don’t know! Again, I think at least some brain-like learning has to be serial (e.g. you need to learn about multiplication before nonabelian cohomology), but I don’t have a good sense for just how much.
Thanks, that’s really helpful. I’m going to re-frame what you’re saying in the form of a question:
The parallel-experiences question:
Take a model which is akin to an 8-year-old’s brain. (Assume we deeply understand how the learning algorithm works, but not how the trained model works.) Now we make 10 identical copies of that model. For the next hour, we tell one copy to read a book about trucks, and we tell another copy to watch a TV show about baking, and we tell a third copy to build a sandcastle in a VR environment, etc. etc., all in parallel.
At the end of the hour, is it possible to take ALL the things that ALL ten copies learned, and combine them into one model—one model that now has new memories/skills pertaining to trucks AND baking AND sandcastles etc.—and it’s no worse than if the model had done those 10 things in series?
What’s the answer to this question?
Here are three possibilities:
How an ML practitioner would probably answer this question: I think they would say “Yeah duh, we’ve been doing that in ML since forever.” For my part, I do see this as some evidence, but I don’t see it as definitive evidence, because the premise of this post (see Section 1) is that the learning algorithms used by ML practitioners today are substantially different from the within-lifetime learning algorithm used in the brain.
How a biologist would probably answer this question: I think they would say the exact opposite: “No way!! That’s not something brains evolved to do, there’s no reason to expect it to be possible and every reason to think it isn’t. You’re just talking sci-fi nonsense.”
(Well, they would acknowledge that humans working on a group project could go off and study different topics, and then talk to each other and hence teach each other what they’ve learned. But that’s kind of a different thing than what we’re talking about here. In particular, for non-superhuman AIs-in-training, we already have tons of pedagogical materials like human textbooks and lectures. So I don’t see teams-of-AIs-who talk-to-each-other being all that helpful in getting to superhuman faster.)
How I would answer this question: Well I hadn’t thought about it until now, but I think I’m in between. On the one hand, I do think there are some things that need to be learned serially in the human brain learning algorithm. For example, there’s a good reason that people learn multiplication before exponentiation, and exponentiation before nonabelian cohomology, etc. But if the domains are sufficiently different, and if we merge-and-re-split frequently enough, then I’m cautiously optimistic that we could do parallel experiences to some extent, in order to squeeze 30 subjective years of experience into <30 serial subjective years of experience. How much less than 30, I don’t know.
Anyway, in the article I used the biologist answer: “the human brain within-lifetime learning algorithm is not compatible with parallel experiences”. So that would be the most conservative / worst-case assumption.
I am editing the article to note that this is another reason to suspect that training might be faster than the worst-case. Thanks again for pointing that out.