Someone who is interested in learning and doing good.
My Twitter: https://twitter.com/MatthewJBar
My Substack: https://matthewbarnett.substack.com/
Someone who is interested in learning and doing good.
My Twitter: https://twitter.com/MatthewJBar
My Substack: https://matthewbarnett.substack.com/
I think you missed some basic details about what I wrote. I encourage people to compare what Eliezer is saying here to what I actually wrote. You said:
If you think you’ve demonstrated by clever textual close reading that Eliezer-2018 or Eliezer-2008 thought that it would be hard to get a superintelligence to understand humans, you have arrived at a contradiction and need to back up and start over.
I never said that you or any other MIRI person thought it would be “hard to get a superintelligence to understand humans”. Here’s what I actually wrote:
Non-MIRI people sometimes strawman MIRI people as having said that AGI would literally lack an understanding of human values. I don’t endorse this, and I’m not saying this.
[...]
I agree that MIRI people never thought the problem was about getting AI to merely understand human values, and that they have generally maintained there was extra difficulty in getting an AI to care about human values. However, I distinctly recall MIRI people making a big deal about the value identification problem (AKA the value specification problem), for example in this 2016 talk from Yudkowsky.[3] The value identification problem is the problem of “pinpointing valuable outcomes to an advanced agent and distinguishing them from non-valuable outcomes”. In other words, it’s the problem of specifying a function that reflects the “human value function” with high fidelity.
I mostly don’t think that the points you made in your comment respond to what I said. My best guess is that you’re responding to a stock character who represents the people who have given similar arguments to you repeatedly in the past. In light of your personal situation, I’m actually quite sympathetic to you responding this way. I’ve seen my fair share of people misinterpreting you on social media too. It can be frustrating to hear the same bad arguments, often made from people with poor intentions, over and over again and continue to engage thoughtfully each time. I just don’t think I’m making the same mistakes as those people. I tried to distinguish myself from them in the post.
I would find it slightly exhausting to reply to all of this comment, given that I think you misrepresented me in a big way right out of the gate, so I’m currently not sure if I want to put in the time to compile a detailed response.
That said, I think some of the things you said in this comment were nice, and helped to clarify your views on this subject. I admit that I may have misinterpreted some of the comments you made, and if you provide specific examples, I’m happy to retract or correct them. I’m thankful that you spent the time to engage. :)
I’m curious if you have any thoughts on the effect regulations will have on AI timelines. To have a transformative effect, AI would likely need to automate many forms of management, which involves making a large variety of decisions without the approval of other humans. The obvious effect of deploying these technologies will therefore be to radically upend our society and way of life, taking control away from humans and putting it in the hands of almost alien decision-makers. Will bureaucrats, politicians, voters, and ethics committees simply stand idly by while the tech industry takes over our civilization like this?
On the one hand, it is true that cars, airplanes, electricity, and computers were all introduced with relatively few regulations. These technologies went on to change our lives greatly in the last century and a half. On the other hand, nuclear power, human cloning, genetic engineering of humans, and military weapons each have a comparable potential to change our lives, and yet are subject to tight regulations, both formally, as the result of government-enforced laws, and informally, as engineers regularly refuse to work on these technologies indiscriminately, fearing backlash from the public.
One objection is that it is too difficult to slow down AI progress. I don’t buy this argument.
A central assumption of the Bio Anchors model, and all hardware-based models of AI progress more generally, is that getting access to large amounts of computation is a key constraint to AI development. Semiconductor fabrication plants are easily controllable by national governments and require multi-billion dollar upfront investments, which can hardly evade the oversight of a dedicated international task force.
We saw in 2020 that, if threats are big enough, governments have no problem taking unprecedented action, quickly enacting sweeping regulations of our social and business life. If anything, a global limit on manufacturing a particular technology enjoys even more precedent than, for example, locking down over half of the world’s population under some sort of stay-at-home order.
Another argument states that the incentives to make fast AI progress are simply too strong: first mover advantages dictate that anyone who creates AGI will take over the world. Therefore, we should expect investments to accelerate dramatically, not slow down, as we approach AGI. This argument has some merit, and I find it relatively plausible. At the same time, it relies on a very pessimistic view of international coordination that I find questionable. A similar first-mover advantage was also observed for nuclear weapons, prompting Bertrand Russell to go as far as saying that only a world government could possibly deter nations from developing and using nuclear weapons. Yet, I do not think this prediction was borne out.
Finally, it is possible that the timeline you state here is conditioned on no coordinated slowdowns. I sometimes see people making this assumption explicit, and in your report you state that you did not attempt to model “the possibility of exogenous events halting the normal progress of AI research”. At the same time, if regulation ends up mattering a lot—say, it delays progress by 20 years—then all the conditional timelines will look pretty bad in hindsight, as they will have ended up omitting one of the biggest, most determinative factors of all. (Of course, it’s not misleading if you just state upfront that it’s a conditional prediction).
If you are completely unfamiliar with the actual science on obesity you probably think that’s dumb because obesity is caused by high-palatability foods. Read the first page linked if you’d prefer to know why that’s obviously wrong.
I admit to being, at present, persuaded by the high-palatability hypothesis, which I roughly translate into the following thesis: “The general rise in obesity is primarily explained by the rise of highly processed, addicting foods, which raises our natural set point, tricking our bodies into eating more calories than we ‘need’ before feeling full.”
I read the posts you linked (you referred to this one, right?), and I’m not convinced by them, but I’m open to people explaining why they think I’m still wrong.
First I’ll summarize the article briefly, and then respond to each point.
My brief summary
The series begins by outlining 8 mysteries:
Obesity has gotten a lot worse over time
Obesity abruptly got worse some time in the 1970s
There’s good evidence that we’re not winning the war against obesity
Hunter-gatherers don’t become obese
Lab animals and wild animals have also become obese over time
People and animals gain a lot of weight when exposed to palatable foods
People at higher altitudes seem to get obese at a lower frequency
Diets are not effective at reducing obesity, for nearly everyone
The series continues by arguing that CICO (as in calories in, calories out) cannot explain the current crisis, and cites an array of evidence that tries to argue against that model. Given the inadequacy of CICO as a model for weight gain, then, the reason for the current obesity crisis must be due to environmental contaminants, which neatly fit each of the 8 mysteries.
My interpretation of the mysteries
In my opinion, assuming the high-palatability hypothesis, very few of the mysteries are actually “mysteries” in the sense of being surprising.
For example, we can explain mystery 1 by saying that high-palatability foods have become more common over time (duh). We can explain 3 because very few people are effectively targeted by anti-obesity campaigns, and it’s intractable to simply ban high-palatability food (which is probably the only solution that would actually work on a large scale, short of advanced technology). We can explain 4 by pointing out that hunter-gatherers don’t eat high-palatability food. We can explain 6 for obvious reasons. We can explain 8 by pointing out that people don’t have unlimited willpower, and thus, don’t rigidly adhere to a dieting plan when given abundant choices to “cheat” and eat high-palatability food (which is highly addictive).
That leaves mysteries 2, 5 and 7, which I do think call out for more explanation. However,
Mystery 2 is practically equally mysterious under both the environmental contaminant hypothesis, and the high-palatability hypothesis, since by the author’s admission, they have little idea about what chemicals were abruptly introduced into the environment starting in the 1970s. At the same time, I found their argument that foods were palatable before the 1970s to be weak.
Sure, you can name a few palatable foods from before the 1970s (Oreos, Doritos, Twinkies, Coca-Cola), but I don’t find it particularly unlikely that the absolute number and variety of high-palatability foods has increased greatly since the 1970s, given the immense pressure for food corporations to hyper-optimize their food for consumption.
Mystery 5 is only a real mystery if indeed animals under controlled conditions are getting fatter over time. The author presents two sources for this claim.
Source one states in its abstract, “We examined samples collectively consisting of over 20 000 animals from 24 populations (12 divided separately into males and females) of animals representing eight species living with or around humans in industrialized societies.” The palatability hypothesis can elegantly explain what’s going on here. Animals who live near cities are exposed to human trash, and humans throw a lot of high-palatability food away. Animals eat the trash and get addicted to it, raising their set point, causing them to overconsume calories. Animals that live with humans get fed human-produced food.
Source two is about horses, and I lack a coherent explanation for the details. But, this is mostly because I don’t know how common it is for horses to eat hyper-palatable food, as I have very little experience with common horse-feeding practices. Overall I wasn’t able to find compelling evidence that animals in controlled conditions , that don’t eat high-palatability foods, are experiencing increasing rates of obesity. (Though, of course, I might have missed this evidence in the sources). [Edit: it looks like I was mistaken and the first source includes laboratory rats and mice in the study.]
As far as I can tell, the most surprising mystery is 7. The author presents impressive evidence regarding altitude anorexia, and studies that looked into alternative factors (including carbon dioxide and oxygen).
EDIT: I now think that oxygen is the leading culprit for altitude anorexia, even though the author says it isn’t. Their evidence against the oxygen hypothesis is the following: one study found a small effect, and another study was methodologically flawed. Putting aside the second study, the effect found in the first study was not small at all in my opinion; in fact, it found that people who exercised in a low oxygen environment lost about 60% more weight than those who didn’t! Scott Alexander has written about this and finds the oxygen hypothesis plausible.
Yet, all things considered, I still don’t think that enough alternative hypotheses have been explored to say that mystery 7 is anywhere near conclusive. It’s well-known that obesity rates vary by demographic groups, and that there are genetic confounders involved, and demographic groups are also not evenly distributed between high and low altitudes.
In poorer nations, such as China, it seems highly plausible to me that altitude correlates strongly with access to supermarkets and fast-food restaurants that carry lots of high-palatability foods. Urban centers are generally clustered in low-altitude areas, along coasts and alongside rivers. If people in urban areas are exposed to more high-palatability foods, as opposed to more traditional dishes, then it seems obvious that you’ll find a correlation between altitude and obesity. The contamination hypothesis is not needed to explain this fact.
My take on CICO
Given that I don’t find any of the mysteries very surprising (with the possible exception of 7), I don’t see why the contamination hypothesis falls out as a parsimonious explanation of the data. Admittedly, however, my main disagreement probably boils down to the section on the plausibility of CICO.
Being honest, I found many of the parts of the CICO post to be riddled with misleading statements, sometimes simply confusing CICO with the idea that diets and attempts-to-increase-willpower work (which I emphatically do not believe), or strawmanning CICO into a generic position that absolutely nothing other than calories and exercise matter, or that an excess 3500 calories precisely and linearly adds 1 pound of fat to your body.
Obviously other factors, including genetics, matter. Obviously diets do not work on a large scale. And obviously the formula is not as easy as “eating an extra 3500 calories always means you gain an extra pound, even extrapolated to people eating 10,000 calories a day.” None of these facts are strongly inconsistent with the high-palatability hypothesis as the dominant explanation of the data. In my opinion, these are quibbles, not knock-down arguments.
And, in any case, the author admits,
Sure, consumption in the US went from 2,025 calories per day in 1970 to 2,481 calories per day in 2010, a difference of 456 calories.
That’s a lot! As someone who has very carefully controlled my eating before, I saw first-hand how eating a 500 calorie deficit made me lose weight, and conversely, how eating a 500 surplus made me gain weight. The author seems quick to handwave this fact away, as if a few hundred calories can’t add up over time. Their interlude responding to objections on this point also seems handwavey to me, and doesn’t give any evidence inconsistent with the high-palatability hypothesis.
Conclusion
Given a biologically plausible mechanism, its consistency with practically all the “mysteries”, common sense, and general scientific wisdom (from what I gather), it seems highly likely to me that the high-palatability hypothesis is correct. This, in my opinion, diminishes the case that money should be spent investigating alternative hypotheses (though the value of being proven wrong might be so high that it’s worth it anyway).
Perhaps explain your story in more detail. Others might find it interesting.
Re: Values are easy to learn, this mostly seems to me like it makes the incredibly-common conflation between “AI will be able to figure out what humans want” (yes; obviously; this was never under dispute) and “AI will care” (nope; not by default; that’s the hard bit).
I think it’s worth reflecting on what type of evidence would be sufficient to convince you that we’re actually making progress on the “caring” bit of alignment and not merely the “understanding” bit. Because I currently don’t see what type of evidence you’d accept beyond near-perfect mechanistic interpretability.
I think current LLMs demonstrate a lot more than mere understanding of human values; they seem to actually ‘want’ to do things for you, in a rudimentary behavioral sense. When I ask GPT-4 to do some task for me, it’s not just demonstrating an understanding of the task: it’s actually performing actions in the real world that result in the task being completed. I think it’s totally reasonable, prima facie, to admit this as evidence that we are making some success at getting AIs to “care” about doing tasks for users.
It’s not extremely strong evidence, because future AIs could be way harder to align, maybe there’s ultimately no coherent sense in which GPT-4 “cares” about things, and perhaps GPT-4 is somehow just “playing the training game” despite seemingly having limited situational awareness.
But I think it’s valid evidence nonetheless, and I think it’s wrong to round this datum off to a mere demonstration of “understanding”.
We typically would not place such a high standard on other humans. For example, if a stranger helped you in your time of need, you might reasonably infer that the stranger cares about you to some extent, not merely that they “understand” how to care about you, or that they are merely helping people out of a desire to appear benevolent as part of a long-term strategy to obtain power. You may not be fully convinced they really care about you because of a single incident, but surely it should move your credence somewhat. And further observations could move your credence further still.
Alternative explanations of aligned behavior we see are always logically possible, and it’s good to try to get a more mechanistic understanding of what’s going on before we confidently declare that alignment has been solved. But behavioral evidence is still meaningful evidence for AI alignment, just as it is for humans.
For people reading this post in the future, I’d like to note that I have written a somewhat long comment describing my mixed feelings about this post, since posting it. You can find my comment here. But I’ll also repeat it below for completeness:
The first thing I’d like to say is that we intended this post as a bet, and only a bet, and yet some people seem to be treating it as if we had made an argument. Personally, I am uncomfortable with the suggestion that our post was “misleading” because we did not present an affirmative case for our views.
I agree that LessWrong culture benefits from arguments as well as bets, but it seems a bit weird to demand that every bet come with an argument attached. A norm that all bets must come with arguments would seem to substantially damper the incentives to make bets, because then each time people must spend what will likely be many hours painstakingly outlining their views on the subject.
That said, I do want to reply to people who say that our post was misleading on other grounds. Some said that we should have made different bets, or at different odds. In response, I can only say that coming up with good concrete bets about AI timelines is actually really damn hard, and so if you wish you come up with alternatives, you can be my guest. I tried my best, at least.
More people said that our bet was misleading since it would seem that we too (Tamay and I) implicitly believe in short timelines, because our bets amounted to the claim that AGI has a substantial chance of arriving in 4-8 years. However, I do not think this is true.
The type of AGI that we should be worried about is one that is capable of fundamentally transforming the world. More narrowly, and to generalize a bit, fast takeoff folks believe that we will only need a minimal seed AI that is capable of rewriting its source code, and recursively self-improving into superintelligence. Slow takeoff folks believe that we will need something capable of automating a wide range of labor.
Given the fast takeoff view, it is totally understandable to think that our bets imply a short timeline. However, (and I’m only speaking for myself here) I don’t believe in a fast takeoff. I think there’s a huge gap between AI doing well on a handful of benchmarks, and AI fundamentally re-shaping the economy. At the very least, AI has been doing well on a ton of benchmarks since 2012. Each time AI excels in one benchmark, a new one is usually invented that’s a bit more tough, and hopefully gets us a little closer to measuring what we actually mean by general intelligence.
In the near-future, I hope to create a much longer and more nuanced post expanding on my thoughts on this subject, hopefully making it clear that I do care a lot about making real epistemic progress here. I’m not just trying to signal that I’m a calm and arrogant long-timelines guy who raises his nose at the panicky short timelines people, though I understand how my recent post could have given that impression.
[This comment has been superseded by this post, which is a longer elaboration of essentially the same thesis.]
Recently many people have talked about whether MIRI people (mainly Eliezer Yudkowsky, Nate Soares, and Rob Bensinger) should update on whether value alignment is easier than they thought given that GPT-4 seems to understand human values pretty well. Instead of linking to these discussions, I’ll just provide a brief caricature of how I think this argument has gone in the places I’ve seen it. Then I’ll offer my opinion that, overall, I do think that MIRI people should probably update in the direction of alignment being easier than they thought, despite their objections.
Here’s my very rough caricature of the discussion so far, plus my contribution:
Non-MIRI people: “Eliezer talked a great deal in the sequences about how it was hard to get an AI to understand human values. For example, his essay on the Hidden Complexity of Wishes made it sound like it would be really hard to get an AI to understand common sense. Actually, it turned out that it was pretty easy to get an AI to understand common sense, since LLMs are currently learning common sense. MIRI people should update on this information.”
MIRI people: “You misunderstood the argument. The argument was never about getting an AI to understand human values, but about getting an AI to care about human values in the first place. Hence ‘The genie knows but does not care’. There’s no reason to think that GPT-4 cares about human values, even if it can understand them. We always thought the hard part of the problem was about inner alignment, or, pointing the AI in a direction you want. We think figuring out how to point an AI in whatever direction you choose is like 99% of the problem; the remaining 1% of the problem is getting it to point at the “right” set of values.”
Me:
I agree that MIRI people never thought the problem was about getting AI to merely understand human values, and that they have always said there was extra difficulty in getting an AI to care about human values. But I distinctly recall MIRI people making a big deal about how the value identification problem would be hard. The value identification problem is the problem of creating a function that correctly distinguishes valuable from non-valuable outcomes. A foreseeable difficulty with the value identification problem—which was talked about extensively—is the problem of edge instantiation.
I claim that GPT-4 is pretty good at distinguishing valuable from non-valuable outcomes, unless you require something that vastly exceeds human performance on this task. In other words, GPT-4 looks like it’s on a path towards an adequate solution to the value identification problem, where “adequate” means “about as good as humans”. And I don’t just mean that GPT-4 “understands” human values well: I mean that asking it to distinguish valuable from non-valuable outcomes generally works well as an approximation of the human value function in practice. Therefore it is correct for non-MIRI people to point out that that this problem is less difficult than some people assumed in the past.
Crucially, I’m not saying that GPT-4 actually cares about maximizing human value. I’m saying that it’s able to transparently pinpoint to us which outcomes are bad and which outcomes are good, with the fidelity approaching an average human. Importantly, GPT-4 can tell us which outcomes are valuable “out loud” (in writing), rather than merely passively knowing this information. This element is key to what I’m saying because it means that we can literally just ask a multimodal GPT-N about whether an outcome is bad or good, and use that as an adequate “human value function”.
The supposed reason why the value identification problem was hard is because human value is complex. In fact, that’s mentioned the central foreseeable difficulty on the Arbital page. Complexity of value was used as an explicit premise in the argument for why AI alignment would be difficult many times in MIRI’s history (two examples: 1, 2), and it definitely seems like the reason for this premise was because it was supposed to be an intuition for why the value identification problem would be hard. If the value identification problem was never predicted to be hard, then what was the point of making a fuss about complexity of value in the first place?
In general, there are (at least) two ways that someone can fail to follow your intended instructions. Either your instructions aren’t well-specified, or the person doesn’t want to obey your instructions even if the instructions are well-specified. All the evidence that I’ve found seems to indicate that MIRI people thought that both problems would be hard for AI, not merely the second problem. For example, a straightforward literal interpretation of Nate Soares’ 2017 talk supports this interpretation.
It seems to me that the following statements are true:
MIRI people used to think that it would be hard to both (1) develop an explicit function that corresponds to the “human utility function” with accuracy comparable to that of an average human, and (2) separately, get an AI to care about maximizing this function. The idea that MIRI people only ever thought (2) was the hard part seems false, and unsupported by the links above.
Non-MIRI people often strawman MIRI people as thinking that AGI would literally lack an understanding of human values.
The “complexity of value” argument pretty much just tells us that we need an AI to learn human values, rather than hardcoding a utility function from scratch. That’s a meaningful thing to say, but it doesn’t tell us much about whether alignment is hard; it just means that extremely naive approaches to alignment won’t work.
As an effective altruist, I like to analyze how altruistic cause areas fare on three different axes: importance, tractability and neglectedness. The arguments you gave for the importance of aging are compelling to me (at least from a short-term, human-focused perspective). I’m less convinced that anti-aging efforts are worth it according to the other axes, and I’ll explain some of my reasons here.
The evidence is promising that in the next 5-10 years, we will start seeing robust evidence that aging can be therapeutically slowed or reversed in humans.
[...]
In the lab, we have demonstrated that various anti-aging approaches can extend healthy lifespan in many model organisms including yeast, worms, fish, flies, mice and rats. Life extension of model organisms using anti-aging approaches ranges from 30% to 1000%:
When looking at the graph you present, a clear trend emerges: the more complex and larger the organism, the less progress we have made on slowing aging for that organism. Given that humans are much more complex and larger than the model organisms you presented, I’d caution against extrapolating lab results to them.
I once heard from a cancer researcher that we had, for all practical purposes, cured cancer in mice, but the results have not yet translated into humans. Whether or not this claim is true, it’s clear that progress has been slower than the starry-eyed optimists had expected back in 1971.
That’s not to say that there hasn’t been progress in cancer research, or biological research more broadly. It’s just that progress tends to happen gradually. I don’t doubt that we can achieve modest success; I think it’s plausible (>30% credence) that we will have FDA approved anti-aging treatments by 2030. But I’m very skeptical that these modest results will trigger an anti-aging revolution that substantially affects lifespan and quality of life in the way that you have described.
Most generally, scientific fields tend to have diminishing marginal returns, since all the low-hanging fruit tends to get plucked early on. In the field of anti-aging, even the lowest hanging fruit (ie. the treatments you described) don’t seem very promising. At best, they might deliver an impact roughly equivalent to adding a decade or two of healthy life. At that level, human life would be meaningfully affected, but the millennia-old cycle of birth-to-death would remain almost unchanged.
Today, there are over 130 longevity biotechnology companies
From the perspective of altruistic neglectedness, this fact counts against anti-aging as a promising field to go into. The fact that there are 130 companies working on the problem with only minor laboratory success in the last decade indicates that the marginal returns to new inputs is low. One more researcher, or one more research grant will add little to the rate of progress.
In my opinion, if robust anti-aging technologies do exist in say, 50 years, the most likely reason would be that overall technological progress sped up dramatically (for example, due to transformative AI), and progress in anti-aging was merely a side effect of this wave of progress.
It’s also possible that anti-aging science is a different kind of science than most fields, and we have reason to expect a discontinuity in progress some time soon (for one potential argument, see the last several paragraphs of my post here). The problem is that this argument is vunerable to the standard reply usually given against arguments for technological discontinuities: they’re rare.
(However I do recommend reading some material investigating the frequency of technological discontinuities here. Maybe you can find some similarities with past technological discontinuities? :) )
My understanding is that the correct line is something like, “The COVID-19 vaccines were developed and approved unprecedentedly fast, excluding influenza vaccines.” If you want to find examples of short vaccine development, you don’t need to go all the way back to the 1957 influenza pandemic. For the 2009 Swine flu pandemic,
Analysis of the genetic divergence of the virus in samples from different cases indicated that the virus jumped to humans in 2008, probably after June, and not later than the end of November,[38] likely around September 2008… By 19 November 2009, doses of vaccine had been administered in over 16 countries.
And more obviously, the flu shot is modified yearly to keep up-to-date with new variants. Wikipedia notes that influenza vaccines were first successfully distributed in the 1940s, after developement began in 1931.
When considering vaccines other than influenza shot, this 2017 EA forum post from Peter Wildeford is informative. He tracks the development history of “important” vaccines, as he notes,
This is not intended to be an exhaustive list of all vaccines, but is intended to be exhaustive of all vaccines that would be considered “important”, such as the vaccines on the WHO list of essential medicines and notable vaccines under current development.
His bottom line:
Taken together and weighing these three sources of evidence evenly, this suggests an average of 29 years for the typical vaccine.
No vaccine on his list had been researched, manufactured, and distributed in less than one year. The closest candidate is the Rabies vaccine, which had a 4 year timeline, from 1881-1885.
Expanding on the Jacob Steinhardt quote from August 2021,
Current performance on this dataset is quite low--6.9%--and I expected this task to be quite hard for ML models in the near future. However, forecasters predict more than 50% accuracy* by 2025! This was a big update for me...
If I imagine an ML system getting more than half of these questions right, I would be pretty impressed. If they got 80% right, I would be super-impressed. The forecasts themselves predict accelerating progress through 2025 (21% in 2023, then 31% in 2024 and 52% in 2025), so 80% by 2028 or so is consistent with the predicted trend. This still just seems wild to me and I’m really curious how the forecasters are reasoning about this...Even while often expressing significant uncertainty, forecasters can make bold predictions. I’m still surprised that forecasters predicted 52% on MATH, when current accuracy is 7% (!). My estimate would have had high uncertainty, but I’m not sure the top end of my range would have included 50%. I assume the forecasters are right and not me, but I’m really curious how they got their numbers.
Google’s model obtained 50.3% on MATH, years ahead of schedule.
I am deeply worried about the prospect of a botched fire alarm response. In my opinion, the most likely result of a successful fire alarm would not be that society suddenly gets its act together and finds the best way to develop AI safely. Rather, the most likely result is that governments and other institutions implement very hasty and poorly thought-out policy, aimed at signaling that they are doing “everything they can” to prevent AI catastrophe. In practice, this means poorly targeted bans, stigmatization, and a redistribution of power from current researchers to bureaucratic agencies that EAs have no control over.
I don’t think this is right—the main hype effect of chatGPT over previous models feels like it’s just because it was in a convenient chat interface that was easy to use and free.
I don’t have extensive relevant expertise, but as a personal datapoint: I used Davinci-002 multiple times to generate an interesting dialogue in order to test its capabilities. I ran several small-scale Turing tests, and the results were quite unimpressive in my opinion. When ChatGPT came out, I tried it out (on the day of its release) and very quickly felt that it was qualitatively better at dialogue. Of course, I could have simply been prompting Davinci-002 poorly, but overall I’m quite skeptical that the main reason for ChatGPT hype was that it had a more convenient chat interface than GPT-3.
Very interesting post! However, I have a big disagreement with your interpretation of why the European conquerors succeeded in America, and I think that it undermines much of your conclusion.
In your section titled “What explains these devastating takeovers?” you cite technology and strategic ability, but Old World diseases destroyed the communities in America before the European invaders arrived, most notably smallpox, but also measles, influenza, typhus and the bubonic plague. My reading of historians (from Charles Mann’s book 1493, to Alfred W. Crosby’s The Columbian Exchange and Jared Diamond’s Guns Germs and Steel) leads me to conclude that the historical consensus is that the reason for all of these takeovers was due to Old World diseases, and had relatively little to do with technology or strategy per se.
In Chapter 11 of Guns Germs and Steel, Jared Diamond analyzes the European takeovers in America you cite from the perspective of old World diseases (Here’s a video from a Youtuber named CGP Grey who made a video on the same topic). The basic thesis is that Europeans had acquired immunity from these diseases, whereas people in America hadn’t. From Wikipedia,
After first contacts with Europeans and Africans, some believe that the death of 90–95% of the native population of the New World was caused by Old World diseases.[43] It is suspected that smallpox was the chief culprit and responsible for killing nearly all of the native inhabitants of the Americas.
These diseases were endemic by the time that Cortes and Pizarro arrived on the continent, and therefore it seems very unlikely that their victory was achieved primarily from military and technological might. From Wikipedia again,
The Spanish Franciscan Motolinia left this description: “As the Indians did not know the remedy of the disease…they died in heaps, like bedbugs. In many places it happened that everyone in a house died and, as it was impossible to bury the great number of dead, they pulled down the houses over them so that their homes become their tombs.”[46] On Cortés’s return, he found the Aztec army’s chain of command in ruins. The soldiers who still lived were weak from the disease. Cortés then easily defeated the Aztecs and entered Tenochtitlán.[47] The Spaniards said that they could not walk through the streets without stepping on the bodies of smallpox victims
The effects of smallpox on Tahuantinsuyu (or the Inca empire) were even more devastating. Beginning in Colombia, smallpox spread rapidly before the Spanish invaders first arrived in the empire. The spread was probably aided by the efficient Inca road system. Within months, the disease had killed the Incan Emperor Huayna Capac, his successor, and most of the other leaders. Two of his surviving sons warred for power and, after a bloody and costly war, Atahualpa become the new emperor. As Atahualpa was returning to the capital Cuzco, Francisco Pizarro arrived and through a series of deceits captured the young leader and his best general. Within a few years smallpox claimed between 60% and 90% of the Inca population,[49] with other waves of European disease weakening them further.
The theory that disease was more important than technology is further supported empirically by the fact that Europeans were unable to conquer African tribes/civilizations until the late 19th century, long after the conquest of the New World, despite the fact that many African civilizations had similar or even lower technological capabilities compared to the Inca and Aztecs. The reason is because Africans had immunity to Old World diseases, unlike Americans. However, even in the 19th century conquests, historians often citethe development of the drug quinine, and thus immunity to disease, as one of the primary reasons why European civilizations were able to conquer African nations.
By contrast, I was only able to find one mention of smallpox in your entire post, and the place where you do mention it, you say
Smallpox sweeps through the land, killing many on all sides and causing general chaos.
If I’m reading “all sides” correctly, this is just flat-out incorrect. It killed mainly Americans.
At one point you state that during Pizarro’s conquest,
The Inca empire is in the middle of a civil war and a devastating plague.
This “plague” was smallpox carried from earlier European travelers. Jared Diamond says
The reason for the civil war was that an epidemic of smallpox, spreading overland among South American Indians after its arrival with Spanish settlers in Panama and Colombia, had killed the Inca emperor Huayna Capac and most of his court around 1526, and then immediately killed his designated heir, Ninan Cuyuchi.
You may ask why there was an asymmetry: after all, didn’t the New World have diseases that Europeans were not immune to? Yes, but basically only syphilis. Europeans had exposure to many infectious diseases because those diseases had been acquired from livestock, but livestock was not an important component of American civilizations in the pre-Columbian period.
One reason why disease might not be salient in descriptions of the American conquest is because until modern times, historians emphasized explanations of events in terms of human-factors, such as personalities of rulers and tendencies of groups of people. According to this source, it wasn’t until the 1960s that historians started to take seriously the idea that disease was the primary culprit in the destruction of American civilizations.
There still could be an analogous situation where AI develops diseases that kills humans but not AI, but I think it’s worth exploring this type of existential risk in its own category, and emphasize that this thesis does not depend on a historical precedent of conquerors having strategic or technological advantages.
(I might write a longer response later, but I thought it would be worth writing a quick response now. Cross-posted from the EA forum, and I know you’ve replied there, but I’m posting anyway.)
I have a few points of agreement and a few points of disagreement:
Agreements:
The strict counting argument seems very weak as an argument for scheming, essentially for the reason you identified: it relies on a uniform prior over AI goals, which seems like a really bad model of the situation.
The hazy counting argument—while stronger than the strict counting argument—still seems like weak evidence for scheming. One way of seeing this is, as you pointed out, to show that essentially identical arguments can be applied to deep learning in different contexts that nonetheless contradict empirical evidence.
Some points of disagreement:
I think the title overstates the strength of the conclusion. The hazy counting argument seems weak to me but I don’t think it’s literally “no evidence” for the claim here: that future AIs will scheme.
I disagree with the bottom-line conclusion: “we should assign very low credence to the spontaneous emergence of scheming in future AI systems—perhaps 0.1% or less”
I think it’s too early to be very confident in sweeping claims about the behavior or inner workings of future AI systems, especially in the long-run. I don’t think the evidence we have about these things is very strong right now.
One caveat: I think the claim here is vague. I don’t know what counts as “spontaneous emergence”, for example. And I don’t know how to operationalize AI scheming. I personally think scheming comes in degrees: some forms of scheming might be relatively benign and mild, and others could be more extreme and pervasive.
Ultimately I think you’ve only rebutted one argument for scheming—the counting argument. A more plausible argument for scheming, in my opinion, is simply that the way we train AIs—including the data we train them on—could reward AIs that scheme over AIs that are honest and don’t scheme. Actors such as AI labs have strong incentives to be vigilant against these types of mistakes when training AIs, but I don’t expect people to come up with perfect solutions. So I’m not convinced that AIs won’t scheme at all.
If by “scheming” all you mean is that an agent deceives someone in order to get power, I’d argue that many humans scheme all the time. Politicians routinely scheme, for example, by pretending to have values that are more palatable to the general public, in order to receive votes. Society bears some costs from scheming, and pays costs to mitigate the effects of scheming. Combined, these costs are not crazy-high fractions of GDP; but nonetheless, scheming is a constant fact of life.
If future AIs are “as aligned as humans”, then AIs will probably scheme frequently. I think an important question is how intensely and how pervasively AIs will scheme; and thus, how much society will have to pay as a result of scheming. If AIs scheme way more than humans, then this could be catastrophic, but I haven’t yet seen any decent argument for that theory.
So ultimately I am skeptical that AI scheming will cause human extinction or disempowerment, but probably for different reasons than the ones in your essay: I think the negative effects of scheming can probably be adequately mitigated by paying some costs even if it arises.
I don’t think you need to believe in any strong version of goal realism in order to accept the claim that AIs will intuitively have “goals” that they robustly attempt to pursue. It seems pretty natural to me that people will purposely design AIs that have goals in an ordinary sense, and some of these goals will be “misaligned” in the sense that the designer did not intend for them. My relative optimism about AI scheming doesn’t come from thinking that AIs won’t robustly pursue goals, but instead comes largely from my beliefs that:
AIs, like all real-world agents, will be subject to constraints when pursuing their goals. These constraints include things like the fact that it’s extremely hard and risky to take over the whole world and then optimize the universe exactly according to what you want. As a result, AIs with goals that differ from what humans (and other AIs) want, will probably end up compromising and trading with other agents instead of pursuing world takeover. This is a benign failure and doesn’t seem very bad.
The amount of investment we put into mitigating scheming is not an exogenous variable, but instead will respond to evidence about how pervasive scheming is in AI systems, and how big of a deal AI scheming is. And I think we’ll accumulate lots of evidence about the pervasiveness of AI scheming in deep learning over time (e.g. such as via experiments with model organisms of alignment), allowing us to set the level of investment in AI safety at a reasonable level as AI gets incrementally more advanced.
If we experimentally determine that scheming is very important and very difficult to mitigate in AI systems, we’ll probably respond by spending a lot more money on mitigating scheming, and vice versa. In effect, I don’t think we have good reasons to think that society will spend a suboptimal amount on mitigating scheming.
For you, our patented superintelligent prediction algorithm anticipated that you would want an account, so we already created one for you. Unfortunately, it also predicted that you would make very bad investments in literal galaxy-brain hot takes. Therefore, we decided to terminate your account.
It just seems very clear to me that the sort of person who is taken in by [Paul Christiano’s slow takeoff] essay is the same sort of person who gets taken in by Hanson’s arguments in 2008 and gets caught flatfooted by AlphaGo and GPT-3 and AlphaFold 2.
We can very loosely test this hypothesis by asking whether predictors on Metaculus were surprised by these developments, since Metaculus tends to generally agree with Paul Christiano’s model (see here for example).
Unfortunately, we can’t make many inferences with the data available, as it’s too sparse. Still, I’m leaving the following information here in case people find it interesting.
AlphaGo. There were two questions on Metaculus about Go before AlphaGo beat Lee Sedol. The first was this question about whether an AI would beat a top human Go player in 2016. Before AlphaGo became widely known—following the announcement of its match against Fan Hui—the median prediction was around 30%. After the announcement, the probability shot up to 90%. Unfortunately, this can’t be taken to be much evidence that Metaculus impressively foresaw a breakthrough that year, since Demis Hassabis had already hinted at a breakthrough at the time the question was opened. (Before the matches, Metaculus put the chances of AlphaGo beating Lee Sedol at 64%).
GPT-3. It’s unclear what relevant metrics would have counted as “predicting GPT-3”. There was a question for the best Penn Treebank perplexity score in 2019 and it turned out Metaculus over-predicted progress (though this was mostly a failure in operationalization, see Daniel Filan’s post-mortem). Metaculus had generally anticipated a great increase in parameter counts for ML models in early 2020, as evidenced by this question. More generally, GPT-3 doesn’t seem like a good example of a discontinuity in machine learning progress in perplexity when looking at the benchmark data. It’s possible GPT-3 is a discontinuity from previous progress in some other, harder to measure sense, but I’m not currently aware of what that might be.
AlphaFold 2. Metaculus wasn’t generally very surprised by a breakthrough in protein folding prediction. Since early 2019, predictors placed a greater than 80% chance that a breakthrough would happen by 2031 (note, AlphaFold 2 technically doesn’t count as a “breakthrough” by the strict definition in the question criteria). However, it is probably true that Metaculus was surprised that it happened so early.
I’d suggest changing the title from “AI Girlfriends Won’t Matter Much” to “AI girlfriends won’t fundamentally alter the trend” since that’s closer to what I interpret you to be saying, and it’s more accurate. There are many things that allow long-run trends to continue while still “mattering” a lot in an absolute sense. For example, electricity likely didn’t substantially alter the per-capita GDP trajectory of the United States but I would strongly object to the thesis that “electricity doesn’t matter much”.
ETA: To clarify, I’m saying that electricity allowed the per capita GDP trend to continue, not that it had a negligible counterfactual effect on GDP.
Great question!
At the risk of derailing the discussion, asking that we temper our sanctions on Russia by measuring its cost to starving people in Africa, looks a lot like an isolated demand for rigor.
Nearly all policy choices, both foreign and domestic, involve tradeoffs like this. Almost nobody ever says, “We shouldn’t choose that policy because we could have redirected money to help people in Sub-Saharan Africa instead.” Perhaps we should! But singling out the war in Ukraine, and our response to it, is too parochial for a productive discussion of what tradeoffs we should be willing to make.
It’s as good as time as any to re-iterate my reasons for disagreeing with what I see as the Yudkowskian view of future AI. What follows isn’t intended as a rebuttal of any specific argument in this essay, but merely a pointer that I’m providing for readers, that may help explain why some people might disagree with the conclusion and reasoning contained within.
I’ll provide my cruxes point-by-point,
I think raw intelligence, while important, is not the primary factor that explains why humanity-as-a-species is much more powerful than chimpanzees-as-a-species. Notably, humans were once much less powerful, in our hunter-gatherer days, but over time, through the gradual process of accumulating technology, knowledge, and culture, humans now possess vast productive capacities that far outstrip our ancient powers.
Similarly, our ability to coordinate through language also plays a huge role in explaining our power compared to other animals. But, on a first approximation, other animals can’t coordinate at all, making this distinction much less impressive. The first AGIs we construct will be born into a culture already capable of coordinating, and sharing knowledge, making the potential power difference between AGI and humans relatively much smaller than between humans and other animals, at least at first.
Consequently, the first slightly smarter-than-human agent will probably not be able to leverage its raw intelligence to unilaterally take over the world, for pretty much the same reason that an individual human would not be able to unilaterally take over a band of chimps, in the state of nature, despite the intelligence advantage of the human.
There’s a large range of human intelligence, such that it makes sense to talk about AI slowly going from 50th percentile to 99.999th percentile on pretty much any important general intellectual task, rather than AI suddenly jumping to superhuman levels after a single major insight. In cases where progress in performance does happen rapidly, the usual reason is that there wasn’t much effort previously being put into getting better at the task.
The case of AlphaGo is instructive here: improving the SOTA on Go bots is not very profitable. We should expect, therefore, that there will be relatively few resources being put into that task, compared to the overall size of the economy. However, if a single rich company, like Google, at some point does decide to invest considerable resources into improving Go performance, then we could easily observe a discontinuity in progress. Yet, this discontinuity in output merely reflects a discontinuity in inputs, not a discontinuity as a response to small changes in those inputs, as is usually a prerequisite for foom in theoretical models.
Hardware progress and experimentation are much stronger drivers of AI progress than novel theoretical insights. The most impressive insights, like backpropagation and transformers, are probably in our past. And as the field becomes more mature, it will likely become even harder to make important theoretical discoveries.
These points make the primacy of recursive self-improvement, and as a consequence, unipolarity in AI takeoff, less likely in the future development of AI. That’s because hardware progress and AI experimentation are, for the most part, society-wide inputs, which can be contributed by a wide variety of actors, don’t exhibit strong feedback loops on an individual level, and more-or-less have smooth responses to small changes in their inputs. Absent some way of making AI far better via a small theoretical tweak, it seems that we should expect smooth, gradual progress by default, even if overall economic growth becomes very high after the invention of AGI.
[Update (June 2023): While I think these considerations are still important, I think the picture I painted in this section was misleading. I wrote about my views of AI services here.] There are strong pressures—including the principle of comparative advantage, diseconomies of scale, and gains from specialization—that incentivize making economic services narrow and modular, rather than general and all-encompassing. Illustratively, a large factory where each worker specializes in their particular role will be much more productive than a factory in which each worker is trained to be a generalist, even though no one understands any particular component of the production process very well.
What is true in human economics will apply to AI services as well. This implies we should expect something like Eric Drexler’s AI perspective, which emphasizes economic production across many agents who trade and produce narrow services, as opposed to monolithic agents that command and control.
Having seen undeniable, large economic effects from AI, policymakers will eventually realize that AGI is important, and will launch massive efforts to regulate it. The current lack of concern almost certainly reflects the fact that powerful AI hasn’t arrived yet.
There’s a long history of people regulating industries after disasters—like nuclear energy—and, given the above theses, it seems likely that there will be at least a few “warning shots” which will provide a trigger for companies and governments to crack down and invest heavily into making things go the way they want.
(Note that this does not imply any sort of optimism about the effects of these regulations, only that they will exist and will have a large effect on the trajectory of AI)
The effect of the above points is not to provide us uniform optimism about AI safety, and our collective future. It is true that, if we accept the previous theses, then many of the points in Eliezer’s list of AI lethalities become far less plausible. But, equally, one could view these theses pessimistically, by thinking that they imply the trajectory of future AI is much harder to intervene on, and do anything about, relative to the Yudkowskian view.