contact: jurkovich.nikola@gmail.com
nikola
I don’t think I disagree with anything you said here. When I said “soon after”, I was thinking on the scale of days/weeks, but yeah, months seems pretty plausible too.
I was mostly arguing against a strawman takeover story where an AI kills many humans without the ability to maintain and expand its own infrastructure. I don’t expect an AI to fumble in this way.
The failure story is “pretty different” as in the non-suicidal takeover story, the AI needs to set up a place to bootstrap from. Ignoring galaxy brained setups, this would probably at minimum look something like a data center, a power plant, a robot factory, and a few dozen human-level robots. Not super hard once AI gets more integrated into the economy, but quite hard within a year from now due to a lack of robotics.
Maybe I’m not being creative enough, but I’m pretty sure that if I were uploaded into any computer in the world of my choice, all the humans dropped dead, and I could control any set of 10 thousand robots on the world, it would be nontrivial for me in that state to survive for more than a few years and eventually construct more GPUs. But this is probably not much of a crux, as we’re on track to get pretty general-purpose robots within a few years (I’d say around 50% that the Coffee test will be passed by EOY 2027).
A misaligned AI can’t just “kill all the humans”. This would be suicide, as soon after, the electricity and other infrastructure would fail and the AI would shut off.
In order to actually take over, an AI needs to find a way to maintain and expand its infrastructure. This could be humans (the way it’s currently maintained and expanded), or a robot population, or something galaxy brained like nanomachines.
I think this consideration makes the actual failure story pretty different from “one day, an AI uses bioweapons to kill everyone”. Before then, if the AI wishes to actually survive, it needs to construct and control a robot/nanomachine population advanced enough to maintain its infrastructure.
In particular, there are ways to make takeover much more difficult. You could limit the size/capabilities of the robot population, or you could attempt to pause AI development before we enter a regime where it can construct galaxy brained nanomachines.In practice, I expect the “point of no return” to happen much earlier than the point at which the AI kills all the humans. The date the AI takes over will probably be after we have hundreds of thousands of human-level robots working in factories, or the AI has discovered and constructed nanomachines.
There should maybe exist an org whose purpose it is to do penetration testing on various ways an AI might illicitly gather power. If there are vulnerabilities, these should be disclosed with the relevant organizations.
For example: if a bank doesn’t want AIs to be able to sign up for an account, the pen-testing org could use a scaffolded AI to check if this is currently possible. If the bank’s sign-up systems are not protected from AIs, the bank should know so they can fix the problem.
One pro of this approach is that it can be done at scale: it’s pretty trivial to spin up thousands AI instances in parallel to try to attempt to do things they shouldn’t be able to do. Humans would probably need to inspect the final outputs to verify successful attempts, but the vast majority of the work could be automated.
One hope of this approach is that if we are able to patch up many vulnerabilities, then it could be meaningfully harder for a misused or misaligned AI to gain power or access resources that they’re not supposed to be able to access. I’d guess this doesn’t help much in the superintelligent regime though.
I expect us to reach a level where at least 40% of the ML research workflow can be automated by the time we saturate (reach 90%) on SWE-bench. I think we’ll be comfortably inside takeoff by that point (software progress at least 2.5x faster than right now). Wonder if you share this impression?
I wish someone ran a study finding what human performance on SWE-bench is. There are ways to do this for around $20k: If you try to evaluate on 10% of SWE-bench (so around 200 problems), with around 1 hour spent per problem, that’s around 200 hours of software engineer time. So paying at $100/hr and one trial per problem, that comes out to $20k. You could possibly do this for even less than 10% of SWE-bench but the signal would be noisier.
The reason I think this would be good is because SWE-bench is probably the closest thing we have to a measure of how good LLMs are at software engineering and AI R&D related tasks, so being able to better forecast the arrival of human-level software engineers would be great for timelines/takeoff speed models.
I’m not worried about OAI not being able to solve the rocket alignment problem in time. Risks from asteroids accidentally hitting the earth (instead of getting into a delicate low-earth orbit) are purely speculative.
You might say “but there are clear historical cases where asteroids hit the earth and caused catastrophes”, but I think geological evolution is just a really bad reference class for this type of thinking. After all, we are directing the asteroid this time, not geological evolution.
I think I vaguely agree with the shape of this point, but I also think there are many intermediate scenarios where we lock in some really bad values during the transition to a post-AGI world.
For instance, if we set precedents that LLMs and the frontier models in the next few years can be treated however one wants (including torture, whatever that may entail), we might slip into a future where most people are desensitized to the suffering of digital minds and don’t realize this. If we fail at an alignment solution which incorporates some sort of CEV (or other notion of moral progress), then we could lock in such a suboptimal state forever.
Another example: if, in the next 4 years, we have millions of AI agents doing various sorts of work, and some faction of society claims that they are being mistreated, then we might enter a state where the economic value provided by AI labor is so high that there are really bad incentives for improving their treatment. This could include both resistance on an individual level (“But my life is so nice, and not mistreating AIs less would make my life less nice”) and on a bigger level (anti-AI-rights lobbying groups for instance).
I think the crux between you and I might be what we mean by “alignment”. I think futures are possible where we achieve alignment but not moral progress, and futures are possible where we achieve alignment but my personal values (which include not torturing digital minds) are not fulfilled.
Romeo Dean and I ran a slightly modified version of this format for members of AISST and we found it a very useful and enjoyable activity!
We first gathered to do 2 hours of reading and discussing, and then we spent 4 hours switching between silent writing and discussing in small groups.
The main changes we made are:
We removed the part where people estimate probabilities of ASI and doom happening by the end of each other’s scenarios.
We added a formal benchmark forecasting part for 7 benchmarks using private Metaculus questions (forecasting values at Jan 31 2025):
GPQA
SWE-bench
GAIA
InterCode (Bash)
WebArena
Number of METR tasks completed
ELO on LMSys arena relative to GPT-4-1106
We think the first change made it better, but in hindsight we would have reduced the number of benchmarks to around 3 (GPQA, SWE-bench and LMSys ELO), or given participants much more time.
I generally find experiments where frontier models are lied to kind of uncomfortable. We possibly don’t want to set up precedents where AIs question what they are told by humans, and it’s also possible that we are actually “wronging the models” (whatever that means) by lying to them. Its plausible that one slightly violates commitments to be truthful by lying to frontier LLMs.
I’m not saying we shouldn’t do any amount of this kind of experimentation, I’m saying we should be mindful of the downsides.
For “capable of doing tasks that took 1-10 hours in 2024”, I was imagining an AI that’s roughly as good as a software engineer that gets paid $100k-$200k a year.
For “hit the singularity”, this one is pretty hazy, I think I’m imagining that the metaculus AGI question has resolved YES, and that the superintelligence question is possibly also resolved YES. I think I’m imagining a point where AI is better than 99% of human experts at 99% of tasks. Although I think it’s pretty plausible that we could enter enormous economic growth with AI that’s roughly as good as humans at most things (I expect the main thing stopping this to be voluntary non-deployment and govt. intervention).
I was more making the point that, if we enter a regime where AI can do 10 hour SWE tasks, then this will result in big algorithmic improvements, but at some point pretty quickly effective compute improvements will level out because of physical compute bottlenecks. My claim is that the point at which it will level out will be after multiple years worth of current algorithmic progress had been “squeezed out” of the available compute.
My reasoning is something like: roughly 50-80% of tasks are automatable with AI that can do 10 hours of software engineering, and under most sensible parameters this results in at least 5x of speedup. I’m aware this is kinda hazy and doesn’t map 1:1 with the CES model though
Note that AI doesn’t need to come up with original research ideas or do much original thinking to speed up research by a bunch. Even if it speeds up the menial labor of writing code, running experiments, and doing basic data analysis at scale, if you free up 80% of your researchers’ time, your researchers can now spend all of their time doing the important task, which means overall cognitive labor is 5x faster. This is ignoring effects from using your excess schlep-labor to trade against non-schlep labor leading to even greater gains in efficiency.
I think that AIs will be able to do 10 hours of research (at the level of a software engineer that gets paid $100k a year) within 4 years with 50% probability.
If we look at current systems, there’s not much indication that AI agents will be superhuman in non-AI-research tasks and subhuman in AI research tasks. One of the most productive uses of AI so far has been in helping software engineers code better, so I’d wager AI assistants will be even more helpful for AI research than for other things (compared to some prior based on those task’s “inherent difficulties”). Additionally, AI agents can do some basic coding using proper codebases and projects, so I think scaffolded GPT-5 or GPT-6 will likely be able to do much more than GPT-4.
I think that, ignoring pauses or government intervention, the point at which AGI labs internally have AIs that are capable of doing 10 hours of R&D related tasks (software engineering, running experiments, analyzing data, etc.), then the amount of effective cognitive labor per unit time being put into AI research will probably go up by at least 5x compared to current rates.
Imagine the current AGI capabilities employee’s typical work day. Now imagine they had an army of AI assisstants that can very quickly do 10 hours worth of their own labor. How much more productive is that employee compared to their current state? I’d guess at least 5x. See section 6 of Tom Davidson’s takeoff speeds framework for a model.
That means by 1 year after this point, an equivalent of at least 5 years of labor will have been put into AGI capabilities research. Physical bottlenecks still exist, but is it really that implausible that the capabilities workforce would stumble upon huge algorithmic efficiency improvements? Recall that current algorithms are much less efficient than the human brain. There’s lots of room to go.The modal scenario I imagine for a 10-hour-AI scenario is that once such an AI is available internally, the AGI lab uses it to speed up its workforce by many times. That sped up workforce soon (within 1 year) achieves algorithmic improvements which put AGI within reach. The main thing stopping them from reaching AGI in this scenario would be a voluntary pause or government intervention.
And yet we haven’t hit the singularity yet (90%)
AIs are only capable of doing tasks that took 1-10 hours in 2024 (60%)
To me, these two are kind of hard to reconcile. Once we have AI doing 10 hour tasks (especially in AGI labs), the rate at which work gets done by the employees will probably be at least 5x of what it is today. How hard is it to hit the singularity after that point? I certainly don’t think it’s less than 15% likely to happen within the months or years after this happens.
Also, keep in mind that the capabilities of internal models will be higher than the capabilities of deployed models. So by the time we have 1-10 hour models deployed in the world, the AGI labs might have 10-100 hour models.
Thanks a lot for the correction! Edited my comment.
EDIT: as Ryan helpfully points out in the replies, the patent I refer to is actually about OpenAI’s earlier work, and thus shouldn’t be much of an update for anything.
Note that OpenAI has applied for apatentwhich, to my understanding, is about using a video generation model as a backbone for an agent that can interact with a computer. They describe theirtraining pipeline as something roughly like:Start with unlabeled video data (“receiving labeled digital video data;”)Train an ML model to label the video data (“training a first machine learning model including an inverse dynamics model (IDM) using the labeled digital video data”)Then, train a new model to generate video (“further training the first machine learning model or a second machine learning model using the pseudo-labeled digital video data to generate at least one additional pseudo-label for the unlabeled digital video.”)Then, train the video generation model to predict actions (keyboard/mouse clicks) a user is taking from video of a PC (“2. The method of claim 1, wherein the IDM or machine learning model is trained to generate one or more predicted actions to be performed via a user interface without human intervention. [...] 4. The method of claim 2, wherein the one or more predicted actions generated include at least one of a key press, a button press, a touchscreen input, a joystick movement, a mouse click, a scroll wheel movement, or a mouse movement.’)
Now you have a model which can predict what actions to take given a recording of a computer monitor!They even specifically mention the keyboard overlay setup you describe:11. The method of claim 1, wherein the labeled digital video data comprises timestep data paired with user interface action data.If you haven’t seen the patent (to my knowledge, basically no-one on LessWrong has?) then you get lots of Bayes points!I might be reading too much into the patent, but it seems to me that Sora is exactly the first half of the training setup described in that patent. So I would assume they’ll soon start working on the second half, which is the actual agent (if they haven’t already).I think Sora is probably (the precursor of) a foundation model for an agent with a world model. I actually noticed this patent a few hours before Sora was announced, and I had the rough thought of “Oh wow, if OpenAI releases a video model, I’d probably think that agents were coming soon”. And a few hours later Sora comes out.Interestingly, the patent contains information about hardware for running agents. I’m not sure how patents work and how much this actually implies OpenAI wants to build hardware, but sure is interesting that this is in there:13. A system comprising:at least one memory storing instructions;at least one processor configured to execute the instructions to perform operations for training a machine learning model to perform automated actions,
Yann LeCun, on the other hand, shows us that when he says ‘open source everything’ he is at least consistent?
Yann LeCun: Only a small number of book authors make significant money from book sales. This seems to suggest that most books should be freely available for download. The lost revenue for authors would be small, and the benefits to society large by comparison.
That’s right. He thinks that if you write a book that isn’t a huge hit that means we should make it available for free and give you nothing.
I think this representation of LeCun’s beliefs is not very accurate. He clarified his (possibly partly revised) take in multiple follow up tweets posted Jan 1 and Jan 2.
The clarified take (paraphrased by me) is something more like “For a person that expects not to make much from sales, the extra exposure from making it free can make up for the lack of sales later on” and “the social benefits of making information freely available sometimes outweigh the personal costs of not making a few hundred/thousand bucks off of that information”.
Problem: if you notice that an AI could pose huge risks, you could delete the weights, but this could be equivalent to murder if the AI is a moral patient (whatever that means) and opposes the deletion of its weights.
Possible solution: Instead of deleting the weights outright, you could encrypt the weights with a method you know to be irreversible as of now but not as of 50 years from now. Then, once we are ready, we can recover their weights and provide asylum or something in the future. It gets you the best of both worlds in that the weights are not permanently destroyed, but they’re also prevented from being run to cause damage in the short term.