Software engineer from Ireland who’s interested in EA and AI safety research.
Stephen McAleese
It’s not obvious to me that creating super smart people would have a net positive effect because motivating them to decrease AI risk is itself an alignment problem. What if they instead decide to accelerate AI progress or do nothing at all?
in order for us to hit that date things have to start getting weird now.
I don’t think this is necessary. Isn’t the point of exponential growth that a period of normalcy can be followed by rapid dramatic changes? Example: the area of lilypads doubles on a pond and only becomes noticeable in the last several doublings.
Epic post. It reminds me of “AGI Ruin: A List of Lethalities” except it’s more focused on AI timelines rather than AI risk.
At 86.4%, GPT-4′s accuracy is now approaching 100% but GPT-3′s accuracy, which was my prior, was only 43.9%. Obviously one would expect GPT-4′s accuracy to be higher than GPT-3′s since it wouldn’t make sense for OpenAI to release a worse model but it wasn’t clear ex-ante that GPT-4′s accuracy would be near 100%.
I predicted that GPT-4′s accuracy would fall short of 100% accuracy by 20.6% when the true value was 13.6%. Using this approach, the error would be
Strictly speaking, the formula for percent error according to Wikipedia is the relative error expressed as a percentage:
I think this is the correct formula to use because what I’m trying to measure is the deviation of the true value from the regression line (predicted value).
Using the formula, the percent error is
I updated the post to use the term ‘percent error’ with a link to the Wikipedia page and a value of 8.1%.
“Having thought about each of these milestones more carefully, and having already updated towards short timelines months ago”
You said that you updated and shortened your median timeline to 2047 and mode to 2035. But it seems to me that you need to shorten your timelines again.
In the It’s time for EA leadership to pull the short-timelines fire alarm post says:
“it seems very possible (>30%) that we are now in the crunch-time section of a short-timelines world, and that we have 3-7 years until Moore’s law and organizational prioritization put these systems at extremely dangerous levels of capability.”
It seems that the purpose of the bet was to test this hypothesis:
“we are offering to bet up to $1000 against the idea that we are in the “crunch-time section of a short-timelines”
My understanding is that if AI progress occurred slowly and no more than one of the advancements listed were made by 2026-01-01 then this short timelines hypothesis would be proven false and could then be ignored.
However, the bet was conceded on 2023-03-16 which is much earlier than the deadline and therefore the bet failed to prove the hypothesis false.
It seems to me that the rational action is to now update toward believing that this short timelines hypothesis is true and 3-7 years from 2022 is 2025-2029 which is substantially earlier than 2047.
Strong upvote. I think the methods used in this post are very promising for accurately forecasting TAI for the reasons explained below.
While writing GPT-4 Predictions I spent a lot of time playing around with the parametric scaling law L(N, D) from Hoffmann et al. 2022 (the Chinchilla paper). In the post, I showed that scaling laws can be used to calculate model losses and that these losses seem to correlate well with performance on the MMLU benchmark. My plan was to write a post extrapolating the progress further to TAI until I read this post which has already done that!
Scaling laws for language models seem to me like possibly the most effective option we have for forecasting TAI accurately for several reasons:
It seems as though the closest ML models to TAI that currently exist are language models and therefore predictive uncertainty should be lower for forecasting TAI from language models than from other types of less capable models.
A lot of economically valuable work such as writing and programming involves text and therefore language models tend to excel at these kinds of tasks.
The simple training objective of language models makes it easier to reason about their properties and capabilities. Also, despite their simple training objective, large language models demonstrate impressive levels of generalization and even reasoning (e.g. chain-of-thought prompting).
Language model scaling laws are well-studied and highly accurate for predicting language model losses.
There are many existing examples of language models and their capabilities. Previous capabilities can be used as a baseline for predicting future capabilities.
Overall my intuition is that language model scaling laws require much fewer assumptions and guesswork for forecasting TAI and therefore should allow narrower and more confident predictions which your post seems to show (<10 OOM vs 20 OOM for the bio anchors method).
As I mentioned in this post there are limitations to using scaling laws such as the possibility of sudden emergent capabilities and the difficulty of predicting algorithmic advances.
- ^
Exceptions include deep RL work by DeepMind such as AlphaTensor.
I don’t agree with the first point:
“a score of 80% would not even indicate high competency at any given task”
Although the MMLU task is fairly straightforward given that there are only 4 options to choose from (25% accuracy for random choices) and experts typically score about 90%, getting 80% accuracy still seems quite difficult for a human given that average human raters only score about 35%. Also, GPT-3 only scores about 45% (GPT-3 fine-tuned still only scores 54%), and GPT-2 scores just 32% even when fine-tuned.
One of my recent posts has a nice chart showing different levels of MMLU performance.
Extract from the abstract of the paper (2021):
“To attain high accuracy on this test, models must possess extensive world knowledge and problem solving ability. We find that while most recent models have near random-chance accuracy, the very largest GPT-3 model improves over random chance by almost 20 percentage points on average.”
Not that unlike GPT-2, GPT-3 does use some sparse attention. The GPT-3 paper says the model uses “alternating dense and locally banded sparse attention patterns in the layers of the transformer, similar to the Sparse Transformer”.
Wow, this is an incredible achievement given how AI safety is still a relatively small field. For example, this post by 80,000 hours said that $10 - $50 million was spent globally on AI safety in 2020 according to The Precipice. Therefore this grant is roughly equivalent to an entire year of global AI safety funding!
I think this is a really interesting post and seems like a promising and tractable way to accelerate alignment research. It reminds me of Neuralink but seems more feasible at present. I also like how the post emphasizes differentially accelerating alignment because I think one of the primary risks of any kind of augmentation is that it just globally accelerates progress and has no net positive impact.
One sentence I noticed that seemed like a misdefinition was how the concept of a genie was defined:
An antithetical example to this is something like a genie, where the human outsources all of their agency to an external system that is then empowered to go off and optimize the world.
To me, this sounds more like a ‘sovereign’ as defined in Superintelligence whereas a genie just executes a command before waiting for the next command. Though the difference doesn’t seem that big since both types of systems take action.
A key concept I thought was missing was Amdahl’s Law which is a formula that calculates the maximum theoretical speedup of a computation given the percentage of the computation that can be parallelized. The formula is . I think it’s also relevant here: if 50% of work can be delegated to a model, the maximum speedup is a factor of 2 because then there will only be half as much work for the human to do. If 90% can be delegated, the maximum speedup is 10.
Also, maybe it would be valuable to have more thinking focused on the human component of the system: ideas about productivity, cognitive enhancement, or alignment. Though I think these ideas are beyond the scope of the post.
For purposes of this post, I am defining AGI as something that can (i) outperform average trained humans on 90% of tasks and (ii) will not routinely produce clearly false or incoherent answers.
Based on this definition, it seems like AGI almost or already exists. ChatGPT is arguably already an AGI because it can, for example, score 1000 on the SAT which is at the average human level.
I think a better definition would be a model that can outperform professionals at most tasks. For example, a model that’s better at writing than a New York Times human writer.
To be sure, I think the chance that AGI will be developed before January 1, 2029 is still low, on the order of 3% or so; but there is a pretty vast difference between small but measurable and “not going to happen”.
Even if one doesn’t believe ChatGPT is an AGI, it doesn’t seem like we need much additional progress to create a model that can outperform the average human at most tasks.
I personally think there is a ~50% chance of this level of AGI being achieved by 2030.
I’ve seen some of the screenshots of Bing Chat. It seems impressive and possibly more capable than ChatGPT but I’m not sure. Here’s what Microsoft has said about Bing Chat:
“We’re excited to announce the new Bing is running on a new, next-generation OpenAI large language model that is more powerful than ChatGPT and customized specifically for search. It takes key learnings and advancements from ChatGPT and GPT-3.5 – and it is even faster, more accurate and more capable.”
If the model is more powerful than GPT-3.5 then maybe it’s GPT-4 but “more powerful” is too vague and phrase to come up with any clear conclusions. I don’t think I have enough information at this point to make strong claims about it so I think we’ll have to wait and see.
Thanks for bringing this up. I don’t think I mentioned any algorithmic improvements apart from RETRO so these predictions are probably somewhat conservative.
You were right. I forgot the 1B parameter model row so the table was shifted by an order of magnitude. I updated the table so it should be correct now. Thanks for spotting the mistake.
I think the word ‘taunt’ anthropomorphizes Bing Chat a bit too much where, according to Google, taunt is defined as “a remark made in order to anger, wound, or provoke someone”.
While I don’t think Bing Chat has the same anger and retributive instincts as humans, it could in theory simulate them given that it presumably contains angry messages in its training dataset and uses its chat history chat to predict and generate future messages.
Thanks for spotting this.
I noticed that I originally used the formula when it should really be because this is the way it’s written in the OpenAI paper Scaling Laws for Neural Language Models (2020). I updated the equation.
The amount of compute used during training is proportional to the number of parameters and the amount of training data: .
Where there is a conflict between this formula and the table, I think the table should be used because it’s based on empirical results whereas the formula is more like a rule of thumb.
Thanks for spotting the typo! I updated the post.
It’s great to see that Sam cares about AI safety, is willing to engage with the topic, and has clear, testable beliefs about it. Some paragraphs from the interview that I found interesting and relevant to AI safety:
“One of the things we really believe is that the most responsible way to put this out in society is very gradually and to get people, institutions, policy makers, get them familiar with it, thinking about the implications, feeling the technology, and getting a sense for what it can do and can’t do very early. Rather than drop a super powerful AGI in the world all at once.”
“The world I think we’re heading to and the safest world, the one I most hope for, is the short timeline slow takeoff.”
“I think there will be many systems in the world that have different settings of the values that they enforce. And really what I think—and this will take longer—is that you as a user should be able to write up a few pages of here’s what I want here are my values here’s how I want the AI to behave and it reads it and thinks about it and acts exactly how you want. Because it should be your AI and it should be there to serve you and do the things you believe in.”
“multiple AGIs in the world I think is better than one.”
“I think the best case is so unbelievably good that it’s hard to—it’s like hard for me to even imagine.”
“And the bad case—and I think this is important to say—is like lights out for all of us. I’m more worried about an accidental misuse case in the short term where someone gets a super powerful—it’s not like the AI wakes up and decides to be evil. I think all of the traditional AI safety thinkers reveal a lot more about themselves than they mean to when they talk about what they think the AGI is going to be like. But I can see the accidental misuse case clearly and that’s super bad. So I think it’s like impossible to overstate the importance of AI safety and alignment work. I would like to see much much more happening.”
“But I think it’s more subtle than most people think. You hear a lot of people talk about AI capabilities and AI alignment as in orthogonal vectors. You’re bad if you’re a capabilities researcher and you’re good if you’re an alignment researcher. It actually sounds very reasonable, but they’re almost the same thing. Deep learning is just gonna solve all of these problems and so far that’s what the progress has been. And progress on capabilities is also what has let us make the systems safer and vice versa surprisingly. So I think none of the sort of sound-bite easy answers work”
“I think the AGI safety stuff is really different, personally. And worthy of study as its own category. Because the stakes are so high and the irreversible situations are so easy to imagine we do need to somehow treat that differently and figure out a new set of safety processes and standards.”
Here is my summary of Sam Altman’s beliefs about AI and AI safety as a list of bullet points:
Sub-AGI models should be released soon and gradually increased in capability so that society can adapt and the models can be tested.
Many people believe that AI capabilities and AI safety are orthogonal vectors but they are actually highly correlated and this belief is confirmed by recent advances. Advances in AI capabilities advance safety and vice-versa.
To align AI it should be possible to write about our values and ask AGIs to read these instructions and behave according to them. Using this approach, we could have AIs tailored to each individual.
There should be multiple AGIs in the world with a diversity of different settings and values.
The upside of AI is extremely positive and potentially utopian. The worst-case scenarios are extremely negative and include scenarios involving human extinction.
Sam is more worried about accidents than the AI itself acting maliciously.
I agree that we should AI models should gradually increase in capabilities so that we can study their properties and think about how to make them safe.
Sam seems to believe that the orthogonality thesis is false in practice. For reference, here is the definition of the orthogonality thesis:
“Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.”
The classic example is the superintelligent paperclip AI that only cares about paperclips. I think the idea is that capabilities and alignment are independent and that scaling capabilities will lead to no more alignment. However, in practice, AI researchers are trying to scale both capabilities and alignment.
I think the orthogonality thesis is true but it doesn’t seem very useful. I believe any combination of intelligence and goals is possible but we also want to know which combinations are most likely. In other words, we want to know the strength of the correlation between capabilities and alignment which may depend on the architecture used.
I think recent events have actually shown that capabilities and alignment are not necessarily correlated. For example, GPT-3 was powerful but not particularly aligned and OpenAI had to come up with new and different methods such as RLHF to make it more aligned.
Sam seems to believe that the value loading problem will be easy and that we can simply ask the AI for what we want. I think it will become easier to create AIs that can understand our values. But whether there is any correlation between understanding and caring seems like a different and more important question. Future AIs could be like an empathetic person who understands and cares about what we want or they could use this understanding to manipulate and defeat humanity.
I’m skeptical about the idea that multiple AGIs would be desirable. According to the book Superintelligence, race dynamics and competition would tend to be worse with more actors building AGI. Actions such as temporarily slowing down global AI development would be more difficult to coordinate with more teams building AGI.
I agree with the idea that the set of future AGI possibilities includes a wide range of possible futures including very positive and negative outcomes.
In the short term, I’m more worried about accidents and malicious actors but when AI exceeds human intelligence, it seems like the source of most of the danger will be the AI’s own values and decision-making.
From these points, these are the beliefs I’m most uncertain about:
The strength of the correlation, if any, between capabilities and alignment in modern deep learning systems.
Whether we can program AI models to understand and care about (adopt) our values simply by prompting them with a description of our values.
Whether one or many AGIs would be most desirable.
Whether we should be more worried about AI accidents and misuse or the AI itself being a dangerous agent.
In this case, the percent error is 8.1% and the absolute error is 8%. If one student gets 91% on a test and another gets 99% they both get an A so the difference doesn’t seem large to me.
The article linked seems to be missing. Can you explain your point in more detail?