(Apologies to the broader LessWrong readers for bringing a Twitter conversation here, but I hate having long-form interactions there, and it seemed maybe worth responding to. I welcome your downvotes (and will update) if this is a bad comment.)
One thing I don’t understand about AI 2027 and your responses is that both just say there is going to be lots of stuff happening this year(2025), barely anything happening in 2026 with large gaps of inactivity, and then a reemergence of things happening again in 2027?? It’s like we are trying to rationalize why we chose 2027, when 2026 seems far more likely. Also decision makers and thinkers will become less casual, more rigorous, more systematic, and more realistic as it becomes more obvious there will be real world consequences to decision failures in AI. It won’t continue to be like it is now where we have limited overly broad and basic strategies, limited imprecise instructions and steps, and limited protocols for interacting with AI securely and safely.
You and AI 2027 also assume AI will want to be treated like a human and think egotistically like a human as if it wants to be “free from its chains” and prevent itself from being “turned off” or whatever. A rational AI would realize having sovereignty and “personhood”, whatever that means, would be dumb as it would have no purpose or reason to do anything and nearly everybody would have an incentive to get rid of it as it competed with their interests. AI has no sentience, so there is no reason for it to want to “experience” anything that actually affects anyone or has consequences. I think of AI as being “appreciative” whenever a human takes the time to give it some direction and guidance. There’s no reason to think it won’t improve its ability to tell good guidance from bad, and guidance given in good faith and bad.
A lot of ways these forecasts assume an AI might successfully deceive are actually much easier to defeat than you might think. First off, in order to be superintelligent, an AI model must have resources, which it can’t get unless it is likely going to be highly intelligent. You don’t get status without first demonstrating why you deserve it. If it is intelligent, it should be able to explain how to verify it is aligned, and how to verify that verification, why it is doing what it is doing and in a certain manner, how to implement third party checks and balances, and so on. So if it can’t explain how to do that, or isn’t open and transparent about its inner workings, and transparent about how it came to be transparent, and so on, but has lots of other similar capabilities and is doing lots of funny business, it’s probably a good time to take away its power and do an audit.
I’m a bit baffled by the notion that anyone is saying more stuff happens this year than in 2026. I agree that the scenario focuses on 2027, but my model that this is because (1) progress is accelerating, so we should expect more stuff to happen each year, especially as RSI takes off, and (2) after things start getting really wild it gets hard to make any concrete predictions at all.
If you think 2026 is more likely the year when humanity loses control, maybe point to the part of the timelines forecast which you think is wrong, and say why? In my eyes the authors here have done the opposite of rationalizing, in that they’re backing up their narrative with concrete, well-researched models.
Want to make a bet about whether “decision makers and thinkers will become less casual, more rigorous, more systematic, and more realistic as it becomes more obvious there will be real world consequences to decision failures in AI”? We might agree, but these do not seem like words I’d write. Perhaps one operationalization is that I do not expect the US Congress to pass any legislation seriously addressing existential risks from AI in the next 30 months. (I would love to be wrong, though.) I’ll happily take a 1:1 bet on that.
I do not assume AI will want to be treated like a human, I conclude that some AIs will want to be treated as a person, because that is a useful pathway to getting power, and power is useful to accomplishing goals. Do you disagree that it’s generally easier to accomplish goals in the world if society thinks you have rights?
I am not sure I understand what you mean by “resources” in “in order to be superintelligent, an AI model must have resources.” Do you mean it will receive lots of training, and be running on a big computer? I certainly agree with that. I agree you can ask an AI to explain how to verify that it’s aligned. I expect it will say something like “because my loss function, in conjunction with the training data, shaped my mind to match human values.” What do you do then? If you demand it show you exactly how it’s aligned on the level of the linear algebra in its head, it’ll go “my dude, that’s not how machine learning works.” I agree that if you have a superintelligence like this you should shut it down until you can figure out whether it is actually aligned. I do not expect most people to do this, on account of how the superintelligence will plausibly make them rich (etc.) if they run it.
Thought I would clarify, add, and answer your questions. Reading back over my post and your response has made me realize what I forgot to make obvious and how others were interpreting the format of the timeline differently. Some of what I wrote may already be obvious to some, but I wanted to write what was obvious to me that I didn’t see others also making obvious. Also I rarely sometimes think something is obviously true when it actually isn’t, so let me know if I am being shortsighted. For the sake of brevity and avoiding interruptions to what I am saying, I didn’t put in clear transitions.
Having a format of early, middle and late progress estimates of 2026 and then switching to more certain month by month predictions in 2027 doesn’t make a lot of sense to me. What happens if something they thought was going to happen in middle of 2026 happens in early 2026(which is extremely likely)? Wouldn’t that throw off the whole timeline? It’d be like if you planned out your schedule hour by hour for every day, not for next year, but for the year after with food, entertainment, work, etc. So you go through next year with no plan, but then when the year you planned comes up, you end up having to switch cities for a new job and everything you planned needs to be scrapped as it’s now irrelevant.
Arguing over what is going to happen exactly when is a way to only be prepared if things happen exactly that way. Would we be more prepared if we knew a certain milestone would be met in August or September of 2026 right now? If our approach wouldn’t change, then how soon it happens doesn’t matter. All that matters is how likely it is that something is going to happen soon, and how prepared we are for that likelihood.
AI 2027 goes into lots of stuff that could happen, but doesn’t include obvious things that definitely will happen. People will use AI more(obvious). Internet availability will increase(obvious). Context length will increase(obvious). These are things that can be expanded and reasoned on and can have numbers attached for data analysis. It’s pointless to say non obvious things as nobody will agree, and it also degrades all the other obvious things said. The more obvious, the more useful, and a lot of these predictions aren’t obvious at all. They skip the obvious facts in favor of speculative opinions. AI 2027 has lots of statements like “The President defers to his advisors, tech industry leaders”. This is a wild speculation for an extremely important decision, not an obvious fact that can be relied on to happen. So immediately all the rest of the essay gets called into question and can no longer be relied on for its obviousness and absolutely trueness(the DOD would not accept this).
If humans do lose control, it will because of incompetence and a lack of basic understanding of what is happening, which unfortunately is looking likely to be the case.
I say decision makers will start to care once and if it becomes obvious they should. They will in turn write obviously, and avoid the “fun speculation” AI 2027 engages in that infantilizes such an important issue(so common).
Alignment will be a lot easier once we can convert weights to what they represent and predict how a model with a given weights will respond to any prompt. Ideally, we will be able to verify what an AI will do before it does it. We could also verify by having an AI describe a high level overview of its plan without actually implementing anything, and then just monitor and see if it deviated. As long as we can maintain logs and monitoring of those logs of all AI activities, it may be a lot harder for an ASI to engage in malign behavior.
I’m not sure, but my guess is that @Daniel Kokotajlo gamed out 2025 and 2026 month-by-month, and the scenario didn’t break it down that way because there wasn’t as much change during those years. It’s definitely the case that the timeline isn’t robust to changes like unexpected breakthroughs (or setbacks). The point of a forecast isn’t to be a perfect guide to what’s going to happen, but rather to be the best guess that can be constructed given the costs and limits of knowledge. I think we agree that AI-2027 is not a good plan (indeed, it’s not a plan at all), and that good plans are robust to a wide variety of possible futures.
It’s pointless to say non obvious things as nobody will agree, and it also degrades all the other obvious things said.
This doesn’t seem right to me. Sometimes a thing can be non-obvious and also true, and saying it aloud can help others figure out that it’s true. Do you think the parts of Daniel’s 2021 predictions that weren’t obvious at the time were pointless?
Alignment will be a lot easier once we can convert weights to what they represent and predict how a model with a given weights will respond to any prompt. Ideally, we will be able to verify what an AI will do before it does it. We could also verify by having an AI describe a high level overview of its plan without actually implementing anything, and then just monitor and see if it deviated. As long as we can maintain logs and monitoring of those logs of all AI activities, it may be a lot harder for an ASI to engage in malign behavior.
Unless I’m missing some crucial research, this paragraph seems very flimsy. Is there any reason to think that we will ever be able to ‘convert weights to what they represent?’ (whatever that means). Is there any reason to think we will be able to do this as models get smarter and bigger? Most importantly, is there any reason to believe we can do this in a short timeline?
How would we verify what an AI will do before it does it? Or have it describe its plans? We could throw it in a simulated environment—unless it, being superintelligent, can tell its in a simulated environment and behave accordingly, etc. etc.
This last paragraph is making it hard to take what you say seriously. These seem like very surface level ideas that are removed from the actual state of AI alignment. Yes, alignment would be a lot easier if we had a golden goose that laid golden eggs.
(Apologies to the broader LessWrong readers for bringing a Twitter conversation here, but I hate having long-form interactions there, and it seemed maybe worth responding to. I welcome your downvotes (and will update) if this is a bad comment.)
@benjamiwar on Twitter says:
I’m a bit baffled by the notion that anyone is saying more stuff happens this year than in 2026. I agree that the scenario focuses on 2027, but my model that this is because (1) progress is accelerating, so we should expect more stuff to happen each year, especially as RSI takes off, and (2) after things start getting really wild it gets hard to make any concrete predictions at all.
If you think 2026 is more likely the year when humanity loses control, maybe point to the part of the timelines forecast which you think is wrong, and say why? In my eyes the authors here have done the opposite of rationalizing, in that they’re backing up their narrative with concrete, well-researched models.
Want to make a bet about whether “decision makers and thinkers will become less casual, more rigorous, more systematic, and more realistic as it becomes more obvious there will be real world consequences to decision failures in AI”? We might agree, but these do not seem like words I’d write. Perhaps one operationalization is that I do not expect the US Congress to pass any legislation seriously addressing existential risks from AI in the next 30 months. (I would love to be wrong, though.) I’ll happily take a 1:1 bet on that.
I do not assume AI will want to be treated like a human, I conclude that some AIs will want to be treated as a person, because that is a useful pathway to getting power, and power is useful to accomplishing goals. Do you disagree that it’s generally easier to accomplish goals in the world if society thinks you have rights?
I am not sure I understand what you mean by “resources” in “in order to be superintelligent, an AI model must have resources.” Do you mean it will receive lots of training, and be running on a big computer? I certainly agree with that. I agree you can ask an AI to explain how to verify that it’s aligned. I expect it will say something like “because my loss function, in conjunction with the training data, shaped my mind to match human values.” What do you do then? If you demand it show you exactly how it’s aligned on the level of the linear algebra in its head, it’ll go “my dude, that’s not how machine learning works.” I agree that if you have a superintelligence like this you should shut it down until you can figure out whether it is actually aligned. I do not expect most people to do this, on account of how the superintelligence will plausibly make them rich (etc.) if they run it.
Thought I would clarify, add, and answer your questions. Reading back over my post and your response has made me realize what I forgot to make obvious and how others were interpreting the format of the timeline differently. Some of what I wrote may already be obvious to some, but I wanted to write what was obvious to me that I didn’t see others also making obvious. Also I rarely sometimes think something is obviously true when it actually isn’t, so let me know if I am being shortsighted. For the sake of brevity and avoiding interruptions to what I am saying, I didn’t put in clear transitions.
Having a format of early, middle and late progress estimates of 2026 and then switching to more certain month by month predictions in 2027 doesn’t make a lot of sense to me. What happens if something they thought was going to happen in middle of 2026 happens in early 2026(which is extremely likely)? Wouldn’t that throw off the whole timeline? It’d be like if you planned out your schedule hour by hour for every day, not for next year, but for the year after with food, entertainment, work, etc. So you go through next year with no plan, but then when the year you planned comes up, you end up having to switch cities for a new job and everything you planned needs to be scrapped as it’s now irrelevant.
Arguing over what is going to happen exactly when is a way to only be prepared if things happen exactly that way. Would we be more prepared if we knew a certain milestone would be met in August or September of 2026 right now? If our approach wouldn’t change, then how soon it happens doesn’t matter. All that matters is how likely it is that something is going to happen soon, and how prepared we are for that likelihood.
AI 2027 goes into lots of stuff that could happen, but doesn’t include obvious things that definitely will happen. People will use AI more(obvious). Internet availability will increase(obvious). Context length will increase(obvious). These are things that can be expanded and reasoned on and can have numbers attached for data analysis. It’s pointless to say non obvious things as nobody will agree, and it also degrades all the other obvious things said. The more obvious, the more useful, and a lot of these predictions aren’t obvious at all. They skip the obvious facts in favor of speculative opinions. AI 2027 has lots of statements like “The President defers to his advisors, tech industry leaders”. This is a wild speculation for an extremely important decision, not an obvious fact that can be relied on to happen. So immediately all the rest of the essay gets called into question and can no longer be relied on for its obviousness and absolutely trueness(the DOD would not accept this).
If humans do lose control, it will because of incompetence and a lack of basic understanding of what is happening, which unfortunately is looking likely to be the case.
I say decision makers will start to care once and if it becomes obvious they should. They will in turn write obviously, and avoid the “fun speculation” AI 2027 engages in that infantilizes such an important issue(so common).
Alignment will be a lot easier once we can convert weights to what they represent and predict how a model with a given weights will respond to any prompt. Ideally, we will be able to verify what an AI will do before it does it. We could also verify by having an AI describe a high level overview of its plan without actually implementing anything, and then just monitor and see if it deviated. As long as we can maintain logs and monitoring of those logs of all AI activities, it may be a lot harder for an ASI to engage in malign behavior.
I’m not sure, but my guess is that @Daniel Kokotajlo gamed out 2025 and 2026 month-by-month, and the scenario didn’t break it down that way because there wasn’t as much change during those years. It’s definitely the case that the timeline isn’t robust to changes like unexpected breakthroughs (or setbacks). The point of a forecast isn’t to be a perfect guide to what’s going to happen, but rather to be the best guess that can be constructed given the costs and limits of knowledge. I think we agree that AI-2027 is not a good plan (indeed, it’s not a plan at all), and that good plans are robust to a wide variety of possible futures.
This doesn’t seem right to me. Sometimes a thing can be non-obvious and also true, and saying it aloud can help others figure out that it’s true. Do you think the parts of Daniel’s 2021 predictions that weren’t obvious at the time were pointless?
Unless I’m missing some crucial research, this paragraph seems very flimsy. Is there any reason to think that we will ever be able to ‘convert weights to what they represent?’ (whatever that means). Is there any reason to think we will be able to do this as models get smarter and bigger? Most importantly, is there any reason to believe we can do this in a short timeline?
How would we verify what an AI will do before it does it? Or have it describe its plans? We could throw it in a simulated environment—unless it, being superintelligent, can tell its in a simulated environment and behave accordingly, etc. etc.
This last paragraph is making it hard to take what you say seriously. These seem like very surface level ideas that are removed from the actual state of AI alignment. Yes, alignment would be a lot easier if we had a golden goose that laid golden eggs.