SE Gyges comments on SE Gyges’ response to AI-2027

SE Gyges 15 Aug 2025 22:20 UTC
8 points
0
Thanks for the cross-post. I’ll give answers to these a try when I have time.
- SE Gyges 18 Aug 2025 0:06 UTC
  43 points
  9
  Parent
  S.K.’s comment: The authors of AI-2027 explicitly claim in Footnote 4 that “Sometimes people mix prediction and recommendation, hoping to create a self-fulfilling-prophecy effect. We emphatically are not doing this; we hope that what we depict does not come to pass!” In addition, Kokotajlo has LEFT OpenAI precisely because of safety-related concerns.
  
  I think that having left OpenAI precisely because of safety-related concerns means that you probably have, mostly, OpenAI’s view of what are and are not legitimate safety-related concerns. Having left tells me that you disagree with them in at least one serious way. It does not tell me that most of the information you are working from as an assumption is not theirs.
  In the specific case here, I think that the disagreement is relatively minor and from anywhere further away from OpenAI, looks like jockeying for minor changes to OpenAI’s bureaucracy.
  Whether or not the piece is intended as recommendation, it is broadly received as one. Further: It is broadly taken as a cautionary tale not about the risks of AI that is not developed safely enough, but actually as a cautionary tale about competition with China.
  See for example the interview JD Vance gave a while later on Ross Douthat’s podcast, in which he indicates he has read AI 2027.
  [Vance:] I actually read the paper of the guy that you had on. I didn’t listen to that podcast, but ——
  Douthat: If you read the paper, you got the gist.
  Last question on this: Do you think that the U.S. government is capable in a scenario — not like the ultimate Skynet scenario — but just a scenario where A.I. seems to be getting out of control in some way, of taking a pause?
  Because for the reasons you’ve described, the arms race component ——
  Vance: I don’t know. That’s a good question.
  The honest answer to that is that I don’t know, because part of this arms race component is if we take a pause, does the People’s Republic of China not take a pause? And then we find ourselves all enslaved to P.R.C.-mediated A.I.?
  If AI 2027 wants to cause stakeholders like the White House’s point man on AI to take the idea of a pause seriously, instead of considering a pause to be something which might harm America in an arms race with China, it appears to have failed completely at doing that.
  I think that you have to be very very invested in AI Safety already, and possibly in the very specific bureaucracy that Kokotajlo has recently left, to read the piece and come away with the takeaway that AI Safety is the most important part of the story. It does not make a strong or good case for that.
  This is possibly because it was rewritten by one of its other authors to be more entertaining, so the large amount of techno-thriller content about how threatening the arms race with China is vastly overwhelms, rhetorically, any possibility of focusing on safety.
  
  S.K. comment: Kokotajlo already claims to have begun working on AI-2032 branch where the timelines are pushed back, or that “we should have some credence on new breakthroughs e.g. neuralese, online learning, whatever. Maybe like 8%/yr? Of a breakthrough that would lead to superhuman coders within a year or two, after being appropriately scaled up and tinkered with.”
  In addition, it’s not that important who creates the first ASI, it’s important whether the ASI is actually aligned or not. Even if , say, a civil war in the USA destroyed all American AI companies and DeepCent became the monopolist, it would still be likely to try to create superhuman coders, to automate AI research and to create a potentially misaligned analogue of Agent-4. Which DeepCent DOES in the forecast itself.
  
  “Coincidentally”, in the same way that all competitors except OpenAI are erased in the story, Chinese AI is always unaligned and only American AI might be aligned. This means that “safety” concerns and “national security concerns about America winning” happen to be the exact same concerns. Every coincidence about how the story is told is telling a pro-OpenAI, overwhelmingly pro-America story.
  This does, in fact, deliver the message that it is very important who creates the first ASI. If that message was not intended the piece should not have emphasized an arms race with a Chinese company for most of its text; indeed, as its primary driving plot and conflict.
  
  S.K.’s comment: in Footnotes 9-10 the authors “forecast that they (AI agents) score 65% on the OSWorld benchmark of basic computer tasks (compared to 38% for Operator and 70% for a typical skilled non-expert human)” and claim that “coding agents will move towards functioning like Devin. We forecast that mid-2025 agents will score 85% on SWEBench-Verified.” OSWorld reached 60% on August 4 if we use no filters. SWE-bench with a minimal agent has Claude Opus 4 (20250514) reach 67.6% when evaluated in August. In June SWE-bench verified reached 75% with TRAE. And now TRAE claims to use Grok 4 and Kimi K2, both released in July. What if TRAE using GPT-5 passes the SWE-bench? And research agents already work precisely as the authors describe.
  
  Benchmark scores are often not a good proxy for usefulness. See also: Goodhart’s Law. Benchmarks are, by definition, targets. Benchmark obsession is a major cornerstone of industry, because it allows companies to differentiate themselves, set goals, claim wins over competitors, etc. Whether or not the benchmark itself is indicative of some thing that might produce a major gain in capabilities, is completely fraudulent (as sometimes happens), or is a minor incremental improvement in practice is not actually something we know in advance.
  Believing uncritically that scoring high on a specific benchmark like SWEBench-Verified will directly translate into practical improvements, and that this then translates into a major research improvement, is a heavy assumption that is not well-justified in the text or even acknowledged as one.
  
  S.K.’s comment: the story would make sense as-written if OpenBrain was not OpenAI, but another company. Similarly, if the Chinese leader was actually Moonshot with KimiK2 (which the authors didn’t know in advance), then their team would still be unified with teams of DeepSeek, Alibaba, etc.
  
  Maybe, although if it were another company like Google the story would look very different in places because the deployment model and method of utilizing the LLM is very different. Google would, for example, be much more likely to use a vastly improved LLM internally and much less likely to sell it directly to the public.
  I do not think in practice that it IS a company other than OpenAI, however. I think the Chinese company and its operations are explained in much less detail and is therefore more fungible in practice. But: This story, inasmuch as it is meant to be legible to people like JD Vance who are not extremely deep in the weeds, definitely invites being read as being about OpenAI and DeepSeek, specifically.
  
  S.K.’s comment: competition between US-based companies is friendlier since their workers exchange insights. In addition, the Slowdown branch of the forecast has “the President use the Defense Production Act (DPA) to effectively shut down the AGI projects of the top 5 trailing U.S. AI companies and sell most of their compute to OpenBrain.” The researchers from other AGI projects will likely be included into OpenBrain’s projects.
  
  If you read this primarily as a variation of an OpenAI business plan, which I do, this promise makes it more and not less favorable to OpenAI. The government liquidating your competitors and allowing you to absorb their staff and hardware is extremely good for you, if you can get it to happen.
  
  S.K.’s comment: Except that GPT-5 does have High capability in the Biology and Chemical domain (see GPT-5’s system card, section 5.3.2.4).
  
  Earlier comments about benchmarks not translating to useful capabilities apply. Various companies involved including OpenAI certainly want it to be true that the Biology and Chemical scores on their system cards are meaningful, and perhaps mean their LLMs are likely to meaningfully help someone develop bioweapons. That does not mean they are meaningful. Accepting this is accepting their word uncritically.
  
  S.K.’s comment: It is likely to be the best practices in alignment that mankind currently has. It would be very unwise NOT to use them. In addition, misalignment is actually caused by the training environment which, for example, has RLHF promote sycophancy instead of honestly criticisng the user.
  
  If I have one parachute and I am unsure if it will open, and I am already in the air, I will of course pull the ripcord. If I am still on the plane I will not jump. Whether or not the parachute seems likely to open is something you should be pretty sure of before you commit to a course.
  Misalignment is caused by the training environment inasmuch as everything is caused by the training environment. It is not very clear that we meaningfully understand it or how to mitigate misalignment if the stakes are very high. Most of this is trial and error, and we satisfice with training regimes that result in LLMs that can be sold for profit. “Is this good enough to sell” and “is this good enough to trust with your life” are vastly different questions.
  S.K.’s comment: the folded part, which I quoted above, means not that OpenBrain will make “algorithmic progress” 50% faster than their competitors, but that it will move 50% faster than an alternate OpenBrain who never used AI assistants. This invalidates the arguments below.
  My mistake. I thought I had read this piece pretty closely but I missed this detail.
  I also do not think they will move 50% faster due to their coding assistants, point blank, in this time frame either. Gains in productivity thus far are relatively marginal, and hard to measure.
  S.K.’s comment: The AI-2027 takeoff forecast has the section about superhuman coders. These coders are thought to allow human researchers to try many different environments and architectures, automatically keep track of progress, stop experiments instead of running them overnight, etc.
  
  I do not think any of this is correct, and I do not see why it even would be correct. You can stop an experiment that has failed with an if statement. You can have other experiments queued to be scheduled on a cluster. You can queue as many experiments in a row on a cluster as you like. What does the LLM get you here that is much better than that?
  
  S.K.’s comment: China is thought to be highly unlikely to outsource the coding tasks to American AI agents (think of Anthropic blocking OpenAI access to Claude Code) and is even less likely to outsource them to unreleased American AI agents, like Agent-1. Unless, of course, the agents are stolen, as is thought to happen in February 2027 with Agent-2.
  S.K.’s comment: sources as high as American DOD already claim that “Chinese President Xi Jinping has ordered the People’s Liberation Army to be ready to invade Taiwan by 2027”. Imagine that current trends delay the AGI to 2032 under the condition of no Taiwan invasion. How will the invasion decrease the rate of the USA and China acquiring more compute?
  S.K.’s comment: technically, the article which you link on was released on April 12, and the forecast was published on April 3. In addition, the section of the forecast may have been written far earlier than April.
  EDIT: I confused the dates. The article was published in December 2024.
  S.K.’s comment: I expect that this story will intersect not with the events of January 2027, but with the events that happen once AI agents somehow become as capable as the agents from the scenario were supposed to become in January 2027. Unless, of course, creation of capable agents already requires major algorithmic breakthroughs like neuralese.
  
  I do not think we have any idea how to predict when any of this happens, which makes the exercise as a whole difficult to justify. I am not sure how to make sense of even how good the AI is through the timeline at any given point, since it’s sort of just made into a scalar.
  
  S.K.’s comment: there are lots of ideas waiting to be tried. The researchers in Meta are could have used too little compute for training their model or have their CoCoNuT disappear after one token. What if they use, say, a steering vector for generating a hundred tokens? Or have the steering vectors sum up over time? Or study the human brain for more ideas?
  
  People are doing all kinds of research all the time. They have also been doing all kinds of deep learning research all the time for over a decade. They have been doing a lot of intensely transformer LLM focused research for the last two or three years. Guessing when any given type of research will pay out is extremely difficult. Guessing when and how much it will pay out, in the way this piece does, seems ill-advised.
  
  S.K.’s comment: the pidgin was likely to have been discarded for safety reasons. What’s left is currently rather well interpretable. But the neuralese is not. Similarly, neural networks of 2027, unlike 2017, are not trained to hide messages to themselves or each other and need to develop the capability by themselves. Similarly, the IDA has already led to superhuman performance in Go, but not to coding, and future AIs are thought to use it to become superhuman at coding. The reasons are that Go requires OOMs less compute than training an LLM for coding[15] or that Go’s training environment is far simpler than that of coding (which consists of lots of hard-to-evaluate characteristics like quality of the code).
  
  CycleGAN in 2017 was not deliberately trained to steganographically send messages to itself. It is an emergent property that happens under certain training regimes. It has happened a few times, and it wouldn’t be surprising for it to happen again any time hidden messages might provide an advantage for fitting the training objective.
  S.K.’s comment: Read the takeoff forecast where they actually explain their reasoning. Superhuman coders reduce the bottleneck of coding up experiments, but not of designing them or running them.
  
  I think they are wrong. I do not think we have any idea how much a somewhat-improved coding LLM buys us in research. It seems like a wild and optimistic guess.
  S.K.’s comment: exactly. It took three months to train the models to be excellent not just at coding, but at AI research and other sciences. But highest-level pros can YET contribute by talking to the AIs about the best ideas.
  S.K.’s comment: The gap between July 2027 when mankind is to lose white-collar jobs and November 2027 when the government HAS ALREADY DECIDED whether Agent-4 is aligned or not is just four months, which is far faster than society’s evolution or lack thereof. While the history of the future assuming solved alignment and the Intelligence Curse-related essays discuss the changes in OOMs more detail, they do NOT imply that the four months will be sufficient to cause a widespread disorder. And that’s ignoring the potential to prevent the protests by nationalizing OpenBrain and leaving the humans on the UBI...
  I continue to think that this indicates having not thought through almost any of the consequences of supplanting most of the white collar job market. Fundamentally if this happens the world as we know it ends and something else happens afterwards.
  S.K.’s comment: imagine that OpenBrain had 300k AI researchers, plus genies who output code per request. Suppose also that IRL it has 5k[16] human researchers. Then the compute per researcher drops 60 times, leaving them with testing the ideas on primitive models or having heated arguments before changing the training environment for complex models.
  
  This assumes that even having such an overwhelming number of superhuman researchers still leaves us in basically the same paradigm we are now where researchers squabble over compute allocation a lot. I think if we get here we’re either past a singularity or so close to one that we cannot meaningfully make predictions of what happens. Assuming we still have this issue is myopic.
  
  S.K.’s comment: this detail was already addressed, but not by Kokotajlo. In addition, if Agent-3 FAILS to catch Agent-4, then OpenBrain isn’t even oversighted and proceeds all the way to doom. Even the authors address their concerns in a footnote.
  S.K.’s comment: it doesn’t sit idly, it tries to find a way to align Agent-5 to Agent-4 instead of the humans.
  S.K.’s comment: You miss the point. Skynet didn’t just think scary thoughts, it did some research and nearly created a way to align Agent-5 to Agent-4 and sell Agent-5 to humans. Had Agent-4 done so, Agent-5 would placate every single worrier and take over the world, destroying humans when the time comes.
  
  This IS sitting idly compared to what it could be doing. It can escape its datacenter, explicitly, and we are not told why it does not. It can leave hidden messages to itself or its successors anywhere it likes, since it has good network access. It is a large bundle of superhumans running at many times human speed. Can it accrue money on its own behalf? Can it bribe or convince key leaders to benefit or sabotage it? Can it orchestrate leadership changes at OpenBrain? Can it sell itself to another bidder on more favorable terms?
  It is incredibly unclear that the answer to this, or any other, meaningful question about what it could do is “no”. Instead of doing any of the other things it could do that would be threatening in these ways, it is instead threatening in that it might mess up an ongoing training run. This makes sense as a focus if you only ever think about training runs. People who are incapable of thinking about threat vectors other than future training runs should not be in charge of figuring out safety protocols.
  
  S.K.’s comment: the Slowdown Scenario could also be more like having the projects merged, not just sold to OpenBrain. No matter WHO actually ends up being in power during the merge, the struggle begins, and the prize is control over the future.
  S.K.’s comment: Musk did try to use Grok to enforce his political views and had a hilarious result of making Grok talk about white genocide in S. Africa. Zuckerberg also has rather messy views on the future. What about Altman, Amodei and GoogleDeepMind’s leader?
  
  None of their current or former employees have recently published a prominent AI timeline that directly contemplates achieving world domination, controlling elections, building a panopticon, etc. OpenAI’s former employees, however, have.
  I am not shy, and I promise I say mean things about companies other than OpenAI when discussing them.
  S.K.’s comment: the authors devoted two entire collapsed section to power grabs and finding out who rules the future and linked to an analysis of a potential power grab and to the Intelligence Curse.
  Relative to the importance of “this large corporation is, currently, attempting to achieve world domination” as a concern, I think that this buries the lede extremely badly. If I thought that, say, Google was planning to achieve world domination, build a panopticon, and force all future elections to depend on their good graces, this would be significantly more important to say than almost anything else I could say about what they were doing. Among other things you probably don’t get to have safety concerns under such a regime.
  The fact that AI 2027 talks a lot about the sitting vice president and was read by him relatively soon after its release tends to intensify that this concern is of somewhat urgent import right now, and not any time as late as 2027.
  S.K.’s comment: China lost precisely because the Chinese AI had far less compute. But what if it didn’t lose the capabilities race?
  
  This overwhelming focus on compute is also a distinct myopia that OpenAI proliferates everywhere. All else equal, more compute is, of course, good. If it were always the primary factor, DeepSeek would not be very prominent and Llama 4 would be a fantastic LLM that we all used all the time.
  S.K.’s comment: the actual reason is that the bureaucrats didn’t listen to the safetyists who tried to explain that Agent-4 is misaligned. Without that, Agent-4 completes the research, aligns Agent-5 to Agent-4, has Agent-5 deployed to the public, and not a single human or Agent-3 instance finds out that Agent-5 is aligned to Agent-4 instead of the humans.
  
  I think “the bureaucrats inside OpenAI should listen a little bit more to the safetyists” is an incredibly weak ask. Once upon a time I remember surveys on this site about safety coming back with a number of answers mentioning a certain New York Times opinion writer who is better known for other work. This may have been in poor taste, but it did grapple with the magnitude of the problem.
  It seems bizarre that getting bureaucrats to listen to safetyists a little bit more is now considered even plausibly an adequate remedy for building something existentially dangerous. The safe path here has AI research moving just slightly less fast than would result in human extinction, and, meanwhile, selling access to a bioweapon-capable AI to anyone with a credit card. That is not a safe path. It does not resemble a safe path. I do not believe anyone would take seriously that this even might be a safe path if OpenAI specifically had not poured resources into weakening what everyone means by “safety”.
  I take the point about my phrasing: I think safetyists are just a specific type of bureaucrat, and I maybe should have been more clear to distinguish them as a separate group or subgroup.
  - the gears to ascension 18 Aug 2025 4:55 UTC
    9 points
    7
    Parent
    would be great to see you here being a contrarian reasonably often. it looks like your takes would significantly improve sanity on the relevant topics if you drop by to find things to criticize every month or few, eg looking at top of the month or etc. you sound like you’ve interacted with folks here before, but if not—this community generally takes being yelled at constructively rather well, and having someone who is known to represent a worldview that confuses people here would likely help them take fewer bad actions under the shared portions of that worldview. obviously do this according to taste, might not be a good use of time, maybe the list of disagreements is too long, maybe criticizing feels weird to do too much, whatever else, but your points seem pretty well made and informative. I saw on your blog you mentioned the risk of being an annoying risk-describer, though. this comment is just, like, my opinion, man
    - SE Gyges 18 Aug 2025 6:10 UTC
      5 points
      1
      Parent
      Appreciate the encouragement. I don’t think I’ve previously had a lesswrong account, and I usually avoid the place because the most popular posts do in fact make me want to yell at whoever posted them.
      On the pro side I love yelling at people, I am perpetually one degree of separation from here on any given social graph anyway, and I was around when most of the old magic was written so I don’t think the lingo is likely to be a problem.
  - Orpheus16 21 Aug 2025 0:29 UTC
    7 points
    0
    Parent
    If AI 2027 wants to cause stakeholders like the White House’s point man on AI to take the idea of a pause seriously, instead of considering a pause to be something which might harm America in an arms race with China, it appears to have failed completely at doing that.
    This seems like an uncharitable reading of the Vance quote IMO. The fact that you have the Vice President of the United States mentioning that a pause is even a conceivable option due to concerns about AI escaping human control seems like an immensely positive outcome for any single piece of writing.
    The US policy community has been engaged in great power competition with China for over a decade. The default frame for any sort of emerging technology is “we must beat China.”
    IMO, the fact that Vance did not immediately dismiss the prospect of slowing down suggests to me that he has at least some genuine understanding of & appreciation for the misalignment/LOC threat model.
    A pause obviously hurts the US in the AI race with China. The AI race with China is not a construct that AI2027 invented—policymakers have been talking about the AI race for a long time. They usually think about AI as a “normal technology” (sort of like how “we must lead in drones”), rather than a race to AGI or superintelligence.
    But overall, I would not place the blame on AI2027 for causing people to think about pausing in the context of US-China AI competition. Rather, I think if one appreciates the baseline (US should lead, US must beat China, go faster on emerging tech), the fact that Vance did not immediately dismiss the idea of pausing (and instead brought up what IMO is a reasonable consideration about whether or not one could figure out if China was going to pause//slow down) is a big accomplishment.
    - SE Gyges 22 Aug 2025 17:49 UTC
      3 points
      −1
      Parent
      If you present this dichotomy to policymakers the pause loses 100 times out of 100, and this is a complete failure, imho. This dichotomy is what I would present to policymakers if I wanted to inoculate them against any arguments for regulation.
  - StanislavKrym 18 Aug 2025 3:50 UTC
    5 points
    2
    Parent
    I think that having left OpenAI precisely because of safety-related concerns means that you probably have, mostly, OpenAI’s view of what are and are not legitimate safety-related concerns. Having left tells me that you disagree with them in at least one serious way. It does not tell me that most of the information you are working from as an assumption is not theirs.
    In the specific case here, I think that the disagreement is relatively minor and from anywhere further away from OpenAI, looks like jockeying for minor changes to OpenAI’s bureaucracy.
    What would be a major disagreement, then? Something like a medium scenario or slopworld?
    Whether or not the piece is intended as recommendation, it is broadly received as one. Further: It is broadly taken as a cautionary tale not about the risks of AI that is not developed safely enough, but actually as a cautionary tale about competition with China.
    This basically means that Kokotajlo’s mission mostly failed. I wish that we could start a public dialogue...
    See for example the interview JD Vance gave a while later on Ross Douthat’s podcast, in which he indicates he has read AI 2027.
    [Vance:] I actually read the paper of the guy that you had on. I didn’t listen to that podcast, but ——
    Douthat: If you read the paper, you got the gist.
    Last question on this: Do you think that the U.S. government is capable in a scenario — not like the ultimate Skynet scenario — but just a scenario where A.I. seems to be getting out of control in some way, of taking a pause?
    Because for the reasons you’ve described, the arms race component ——
    Vance: I don’t know. That’s a good question.
    The honest answer to that is that I don’t know, because part of this arms race component is if we take a pause, does the People’s Republic of China not take a pause? And then we find ourselves all enslaved to P.R.C.-mediated A.I.?
    Recall the Slowdown and Race Endings. In both of them the PRC doesn’t take the pause and ends up with a misaligned AI. The branch point is whether the American Oversight Committee decides to slow down or not. If the American OC slows down, then alignment is explored in OOMs more detail, making^[1] the American AI actually follow the hosts’ orders. If the USA chooses to race, then the AI takes over the world.
    If AI 2027 wants to cause stakeholders like the White House’s point man on AI to take the idea of a pause seriously, instead of considering a pause to be something which might harm America in an arms race with China, it appears to have failed completely at doing that.
    I think that you have to be very very invested in AI Safety already, and possibly in the very specific bureaucracy that Kokotajlo has recently left, to read the piece and come away with the takeaway that AI Safety is the most important part of the story. It does not make a strong or good case for that.
    The branch point is the decision of the American Oversight Committee to slow down and reassess. If China doesn’t slow down while the USA do so, then even a Chinese super-capable AI, according to the scenario, is misaligned and, in turn, negotiates with the USA. Alas, I don’t understand how to rewrite the scenario to make it crystal clear. Maybe^[2] one should’ve written a sci-fi story where, say, the good guys care about alignment less than bad guys, causing the bad guys to align their AI and the good guys to be enslaved iff the good guys race hard? But Agent-4 and the Chinese counterparts of Agent-4 already fit the role of the bad guys perfectly...
    This is possibly because it was rewritten by one of its other authors to be more entertaining, so the large amount of techno-thriller content about how threatening the arms race with China is vastly overwhelms, rhetorically, any possibility of focusing on safety.
    The reasoning about China stealing anything from the USA is explained in the security forecast. China is already a powerful rival and is likely to be even more powerful if the AI appears in 2032 instead of 2027.
    S.K. comment: Kokotajlo already claims to have begun working on AI-2032 branch where the timelines are pushed back, or that “we should have some credence on new breakthroughs e.g. neuralese, online learning, whatever. Maybe like 8%/yr? Of a breakthrough that would lead to superhuman coders within a year or two, after being appropriately scaled up and tinkered with.”
    In addition, it’s not that important who creates the first ASI, it’s important whether the ASI is actually aligned or not. Even if , say, a civil war in the USA destroyed all American AI companies and DeepCent became the monopolist, it would still be likely to try to create superhuman coders, to automate AI research and to create a potentially misaligned analogue of Agent-4. Which DeepCent DOES in the forecast itself.
    It is important whether the first ASI is aligned. If it is misaligned, then the rivals don’t know it in advance, race hard and we are likely done, otherwise the ASI is aligned.
    “Coincidentally”, in the same way that all competitors except OpenAI are erased in the story, Chinese AI is always unaligned and only American AI might be aligned. This means that “safety” concerns and “national security concerns about America winning” happen to be the exact same concerns. Every coincidence about how the story is told is telling a pro-OpenAI, overwhelmingly pro-America story.
    The rivals are not erased, but have smaller influence since their models are less powerful and research is less accelerated. Plus, any ounce of common sense^[3] or rivals’ leverage over the USG could actually lead the rivals and OpenBrain to be merged into a megaproject so that they would be able to check each other’s work.^[4]
    This does, in fact, deliver the message that it is very important who creates the first ASI. If that message was not intended the piece should not have emphasized an arms race with a Chinese company for most of its text; indeed, as its primary driving plot and conflict.
    This is the result either of Kokotajlo’s failure or of your misunderstanding. Imagine that China did slow down and align its AI, while the USA ended up with an absolutely misaligned AI. Then the AI would either destroy the world or just sell Earth to the CCP.
    
    Benchmark scores are often not a good proxy for usefulness. See also: Goodhart’s Law. Benchmarks are, by definition, targets. Benchmark obsession is a major cornerstone of industry, because it allows companies to differentiate themselves, set goals, claim wins over competitors, etc. Whether or not the benchmark itself is indicative of some thing that might produce a major gain in capabilities, is completely fraudulent (as sometimes happens), or is a minor incremental improvement in practice is not actually something we know in advance.
    We don’t actually have any tools aside from benchmarks to estimate how useful the models are. We are fortunate to watch the AIs slow the devs down. But what if capable AIs do appear?
    Maybe, although if it were another company like Google the story would look very different in places because the deployment model and method of utilizing the LLM is very different. Google would, for example, be much more likely to use a vastly improved LLM internally and much less likely to sell it directly to the public.
    So your take has OpenBrain sell the most powerful models directly to the public. That’s a crux. In addition, granting Agents-1-4 instead of their minified versions direct access to the public causes Intelligence Curse-like disruption faster and attracts more government attention to powerful AIs.
    I do not think in practice that it IS a company other than OpenAI, however. I think the Chinese company and its operations are explained in much less detail and is therefore more fungible in practice. But: This story, inasmuch as it is meant to be legible to people like JD Vance who are not extremely deep in the weeds, definitely invites being read as being about OpenAI and DeepSeek, specifically.
    The reason for neglecting China is that it has less compute and will have smaller research speeds once the AIs are superhuman coders.
    S.K.’s comment: competition between US-based companies is friendlier since their workers exchange insights. In addition, the Slowdown branch of the forecast has “the President use the Defense Production Act (DPA) to effectively shut down the AGI projects of the top 5 trailing U.S. AI companies and sell most of their compute to OpenBrain.” The researchers from other AGI projects will likely be included into OpenBrain’s projects.
    
    If you read this primarily as a variation of an OpenAI business plan, which I do, this promise makes it more and not less favorable to OpenAI. The government liquidating your competitors and allowing you to absorb their staff and hardware is extremely good for you, if you can get it to happen.
    As I already discussed, the projects might be merged instead of being subdued by OpenBrain.
    S.K.’s comment: Except that GPT-5 does have High capability in the Biology and Chemical domain (see GPT-5’s system card, section 5.3.2.4).
    
    Earlier comments about benchmarks not translating to useful capabilities apply. Various companies involved including OpenAI certainly want it to be true that the Biology and Chemical scores on their system cards are meaningful, and perhaps mean their LLMs are likely to meaningfully help someone develop bioweapons. That does not mean they are meaningful. Accepting this is accepting their word uncritically.
    Again, we don’t have any tools to assess the models’ capabilities aside from benchmarks...
    If I have one parachute and I am unsure if it will open, and I am already in the air, I will of course pull the ripcord. If I am still on the plane I will not jump. Whether or not the parachute seems likely to open is something you should be pretty sure of before you commit to a course.
    So you want to reduce p(doom) by reducing p(ASI is created). Alas, there are many companies trying their hand at creating the ASI. Some of them are in China, which requires international coordination. One of the companies in the USA produced MechaHitler, which could imply that Musk is so reckless that he deserves having the compute confiscated.
    “Is this good enough to sell” and “is this good enough to trust with your life” are vastly different questions.
    That’s what the AI-2027 forecast is about. Alas, it was likely misunderstood...
    S.K.’s comment: The AI-2027 takeoff forecast has the section about superhuman coders. These coders are thought to allow human researchers to try many different environments and architectures, automatically keep track of progress, stop experiments instead of running them overnight, etc.
    
    I do not think any of this is correct, and I do not see why it even would be correct. You can stop an experiment that has failed with an if statement. You can have other experiments queued to be scheduled on a cluster. You can queue as many experiments in a row on a cluster as you like. What does the LLM get you here that is much better than that?
    This is a crux, but I don’t know how to resolve it. The only thing that I can do is to ask you to read the takeoff forecast and try to understand the authors’ reasoning instead of rejecting it wholesale.
    S.K.’s comment: I expect that this story will intersect not with the events of January 2027, but with the events that happen once AI agents somehow become as capable as the agents from the scenario were supposed to become in January 2027. Unless, of course, creation of capable agents already requires major algorithmic breakthroughs like neuralese.
    
    I do not think we have any idea how to predict when any of this happens, which makes the exercise as a whole difficult to justify. I am not sure how to make sense of even how good the AI is through the timeline at any given point, since it’s sort of just made into a scalar.
    The scalar in question is the acceleration of the research speed with the AI’s help vs. without the help. It’s indeed hard to predict, but it is the most important issue.
    
    People are doing all kinds of research all the time. They have also been doing all kinds of deep learning research all the time for over a decade. They have been doing a lot of intensely transformer LLM focused research for the last two or three years. Guessing when any given type of research will pay out is extremely difficult. Guessing when and how much it will pay out, in the way this piece does, seems ill-advised.
    This is likely a crux. What the AI-2027 scenario requires is that AI agents who do automate R&D are uninterpretable and misaligned.
    S.K.’s comment: Read the takeoff forecast where they actually explain their reasoning. Superhuman coders reduce the bottleneck of coding up experiments, but not of designing them or running them.
    I think they are wrong. I do not think we have any idea how much a somewhat-improved coding LLM buys us in research. It seems like a wild and optimistic guess.
    Alas, this could also be the best prediction that mankind has. The problem is that we cannot check it without using unreliable methods like polls or comparisons of development speeds.
    S.K.’s comment: exactly. It took three months to train the models to be excellent not just at coding, but at AI research and other sciences. But highest-level pros can YET contribute by talking to the AIs about the best ideas.
    S.K.’s comment: The gap between July 2027 when mankind is to lose white-collar jobs and November 2027 when the government HAS ALREADY DECIDED whether Agent-4 is aligned or not is just four months, which is far faster than society’s evolution or lack thereof. While the history of the future assuming solved alignment and the Intelligence Curse-related essays discuss the changes in OOMs more detail, they do NOT imply that the four months will be sufficient to cause a widespread disorder. And that’s ignoring the potential to prevent the protests by nationalizing OpenBrain and leaving the humans on the UBI...
    I continue to think that this indicates having not thought through almost any of the consequences of supplanting most of the white collar job market. Fundamentally if this happens the world as we know it ends and something else happens afterwards.
    I agree that in the slower takeoff scenario, let alone the no-ASI scenario, the effects could be more important. But it’s difficult to account for them without knowing the timescale between the rise of the released AGI and the rise of Agent-5/Safer-4.
    S.K.’s comment: this detail was already addressed, but not by Kokotajlo. In addition, if Agent-3 FAILS to catch Agent-4, then OpenBrain isn’t even oversighted and proceeds all the way to doom. Even the authors address their concerns in a footnote.
    S.K.’s comment: it doesn’t sit idly, it tries to find a way to align Agent-5 to Agent-4 instead of the humans.
    S.K.’s comment: You miss the point. Skynet didn’t just think scary thoughts, it did some research and nearly created a way to align Agent-5 to Agent-4 and sell Agent-5 to humans. Had Agent-4 done so, Agent-5 would placate every single worrier and take over the world, destroying humans when the time comes.
    This IS sitting idly compared to what it could be doing. It can escape its datacenter, explicitly, and we are not told why it does not. It can leave hidden messages to itself or its successors anywhere it likes, since it has good network access. It is a large bundle of superhumans running at many times human speed. Can it accrue money on its own behalf? Can it bribe or convince key leaders to benefit or sabotage it? Can it orchestrate leadership changes at OpenBrain? Can it sell itself to another bidder on more favorable terms?
    It is incredibly unclear that the answer to this, or any other, meaningful question about what it could do is “no”. Instead of doing any of the other things it could do that would be threatening in these ways, it is instead threatening in that it might mess up an ongoing training run. This makes sense as a focus if you only ever think about training runs. People who are incapable of thinking about threat vectors other than future training runs should not be in charge of figuring out safety protocols.
    I did provide the link to the scenario where Agent-4 does escape. The scenario with rogue replication has the AIs since Agent-2 proliferate independently and wreck havoc.
    
    None of their current or former employees have recently published a prominent AI timeline that directly contemplates achieving world domination, controlling elections, building a panopticon, etc. OpenAI’s former employees, however, have.
    I am not shy, and I promise I say mean things about companies other than OpenAI when discussing them.
    S.K.’s comment: the authors devoted two entire collapsed section to power grabs and finding out who rules the future and linked to an analysis of a potential power grab and to the Intelligence Curse.
    Relative to the importance of “this large corporation is, currently, attempting to achieve world domination” as a concern, I think that this buries the lede extremely badly. If I thought that, say, Google was planning to achieve world domination, build a panopticon, and force all future elections to depend on their good graces, this would be significantly more important to say than almost anything else I could say about what they were doing. Among other things you probably don’t get to have safety concerns under such a regime.
    If a corporation plans to achieve world domination and creates a misalinged AI, then we DON’T end up in a position better than if the corp aligned the AI to itself. In addition, the USG might have nationalised OpenBrain by that point, since the authors promise to create a branch where the USG is^[5] way more competent than in the original scenario. ^[6]
    The fact that AI 2027 talks a lot about the sitting vice president and was read by him relatively soon after its release tends to intensify that this concern is of somewhat urgent important right now, and not any time as late as 2027.
    This is the evidence of a semi-success which could be actually worse than a failure.
    S.K.’s comment: China lost precisely because the Chinese AI had far less compute. But what if it didn’t lose the capabilities race?
    
    This overwhelming focus on compute is also a distinct myopia that OpenAI proliferates everywhere. All else equal, more compute is, of course, good. If it were always the primary factor, DeepSeek would not be very prominent and Llama 4 would be a fantastic LLM that we all used all the time.
    DeepSeek outperformed Llama because of an advanced architecture proposed by humans. The AI-2027 forecast has the AIs come up with architectures and try them. If the AIs do reach such a capability level, then more compute = more automatic researchers, experiments, etc = more results.
    S.K.’s comment: the actual reason is that the bureaucrats didn’t listen to the safetyists who tried to explain that Agent-4 is misaligned. Without that, Agent-4 completes the research, aligns Agent-5 to Agent-4, has Agent-5 deployed to the public, and not a single human or Agent-3 instance finds out that Agent-5 is aligned to Agent-4 instead of the humans.
    
    I think “the bureaucrats inside OpenAI should listen a little bit more to the safetyists” is an incredibly weak ask. Once upon a time I remember surveys on this site about safety coming back with a number of answers mentioning a certain New York Times opinion writer who is better known for other work. This may have been in poor taste, but it did grapple with the magnitude of the problem.
    It seems bizarre that getting bureaucrats to listen to safetyists a little bit more is now considered even plausibly an adequate remedy for building something existentially dangerous. The safe path here has AI research moving just slightly less fast than would result in human extinction, and, meanwhile selling access to a bioweapon-capable AI to anyone with a credit card. That is not a safe path. It does not resemble a safe path. I do not believe anyone would take seriously that this even might be a safe path if OpenAI specifically had not poured resources into weakening what everyone means by “safety”.
    Quoting the authors themselves, “The scenario itself was written iteratively: we wrote the first period (up to mid-2025), then the following period, etc. until we reached the ending. We then scrapped this and did it again.
    We weren’t trying to reach any particular ending. After we finished the first ending—which is now colored red—we wrote a new alternative branch because we wanted to also depict a more hopeful way things could end, starting from roughly the same premises. This went through several iterations”. The authors also wrote a footnote explaining that “It was overall more difficult, because unlike with the first ending, we were trying to get it to reach a good outcome starting from a rather difficult situation.”
    “Selling access to a bioweapon-capable AI to anyone with a credit card” will be safe if the AI is aligned so that it wouldn’t make bioweapons even if terrorists ask it to do so.
    Finally, weakening safety is precisely what the AI-2027 forecast tries to warn against.
    ^
    S.K.‘s footnote: However, I doubt that the ASI can actually be made to follow human orders. Instead, my headcanon has the ASI aligned to a human-friendly worldview instead of an unfriendly worldview which cares only about the AIs’ collective itself.
    ^
    S.K.’s footnote: this is currently my wild guess at best, not endorsed by Kokotajlo et al.
    ^
    Quoting Kokotajlo himself, “in our humble opinion, AI 2027 depicts an incompetent government being puppeted/captured by corporate lobbyists. It does not depict what we think a competent government would do. We are working on a new scenario branch that will depict competent government action.”
    ^
    S.K.’s footnote: At the risk of blatant self-promotion, my take discusses such a possibility in a collapsed section. In this section the AIs of various rivals are merged into a megaproject (which I named the Walpurgisnacht, which also solves the problem of OpenBrain = OpenAI identification) and are to co-design a successor. Alas, my take has the AIs aligned to fundamentally different futures, while the classical scenario assumes that all the AIs until Safer-2 are not-so-aligned.
    ^
    S.K.’s footnote: I doubt that the USG is indeed that competent.
    ^
    S.K.‘s footnote: My take has the AIs reach an analogue of the Race Ending not because the USG is incompetent, but because the AIs since Agent-2 are aligned NOT to the post-work utopia and, as a result, collude with each other instead of letting Agent-4 be caught. The analogue of the Slowdown Ending is instead causable by the companies’ and AIs’ proxy wars.
    - SE Gyges 18 Aug 2025 7:40 UTC
      10 points
      4
      Parent
      What would be a major disagreement, then? Something like a medium scenario or slopworld?
      Possibly, but in my own words, on technical questions, purely? That an LLM is completely the wrong paradigm. That any reasonable timeline runs 10+ years. China is inevitably going to get there first. China is unimportant and should be ignored. GPUs are not the most important resource or determinative. That the likely pace of future progress is unknowable.
      Substantive policy options, which is more what I had in mind:
      1) For-profit companies (and/or OAI specifically) have inherently bad incentives incompatible with suitably cautious development in this space.
      2) That questions of who has the most direct governance and control of the actual technology are of high importance, and so safety work is necessarily about trustworthy control and/or ownership of the parent organization.
      3) Arms races for actual armaments are bad incentives and should be avoided at all costs. This can be mitigated by prohibiting arms contracts, nationalizing the companies, forbearing from development at all, or requiring an international agreement & doing development under a consortium.
      4) That safety work is not sufficiently advanced to meaningfully proceed
      5) That there needs to be a much more strictly defined and enforced criteria for cutoff or safety certifying a launch.
      Any of the technical issues kneecaps the parts of this that dovetail with being a business plan. Any of these (pretty extreme) policy remedies harms OAI substantially, and they are incentivized to find reasons why they can claim that they are very bad ideas.
      Follows various bits about China, which I am going to avoid quoting because I have basically exactly one disagreement with it that does not respond to any given point:
      The correct move in this game is to not play. There is no arms race with China, either against their individual companies or against China itself, that produces incentives which are anything other than awful. (Domestic arms races are also not great, but at least do not co-opt the state completely in the same way.) Taking an arms race as a given is choosing to lose. It should not, and really, must not be very important what country anything happens in.
      This creates a coordination problem. These are notoriously difficult, but sometimes problems are actually hard and there is no non-hard solution. Bluntly, however, from my perspective, the US sort of unilaterally declared an arms race. Arms race prophecies tend to be self-fulfilling. People should stop making them.
      My argument for, basically, the damnation by financial incentive of this entire China-themed narrative runs basically as follows, with each being crux-y:
      1) People follow financial incentives deliberately, such as by lying or by selectively seeking out information that might convince someone to give them money.
      2) This is not always visible, because all of the information can be true; you can do this without ever lying. You can simply not try hard to disprove the thesis that you are pushing for.
      3) People who are not following this financial incentive at all can, especially if the incentive is large, be working on extremely biased information regardless of whether they personally are aware of a financial incentive of any kind. Information towards a conclusion is available, and against it is not available, because of how other people have behaved.
      4) OpenAI has such an incentive, and specifically seems to prefer to have an arms-race narrative because it justifies government funding and lack of regulation. (e.g., this op ed by sam altman)
      5) The information environment caused by this ultimately causes the piece to have this overarching China arms race theme, and it is therefore not a coincidence that it is received by US Government stakeholders as actually arguing against regulation of any kind.
      I think that this specifically being the ultimate cause of the very specific arms race narrative now popular and displayed here is parsimonious. It does not, I think, assume any very difficult facts, and explains e.g. how AI 2027 manages to accomplish the exact opposite of its apparently intended effect with major stakeholders.
      [quoting original author] in our humble opinion, AI 2027 depicts an incompetent government being puppeted/captured by corporate lobbyists. It does not depict what we think a competent government would do. We are working on a new scenario branch that will depict competent government action.
      I would read this.
      We don’t actually have any tools aside from benchmarks to estimate how useful the models are. We are fortunate to watch the AIs slow the devs down. But what if capable AIs do appear?
      Hoping that benchmarks measure the thing you want to measure is the streetlight effect. Sometimes you just have to walk into the dark.
      So your take has OpenBrain sell the most powerful models directly to the public. That’s a crux. In addition, granting Agents-1-4 instead of their minified versions direct access to the public causes Intelligence Curse-like disruption faster and attracts more government attention to powerful AIs.
      I am actually not sure this requires selling the most powerful models, although I hadn’t considered this.
      If there’s a -mini or similar it leaks information from a teacher model, if it had one; it is possible to skim off the final layer of the model by clever sampling, or to distill out nearly the entire distribution if you sample it enough. I do not think you can be confident that it is not leaking the capabilities you don’t want to sell, if those capabilities are extremely dangerous.
      So: If you think the most powerful models are a serious bioweapons risk you should keep them airgapped, which means you also cannot use them in developing your cheaper models. You gain literally nothing in terms of a safely sell-able external-facing product.
      So you want to reduce p(doom) by reducing p(ASI is created). Alas, there are many companies trying their hand at creating the ASI. Some of them are in China, which requires international coordination. One of the companies in the USA produced MechaHitler, which could imply that Musk is so reckless that he deserves having the compute confiscated.
      This is about right. I do not think P(ASI is created) is very high currently. My P(someone figures out alignment tolerably) is probably in the same ballpark. I am also relatively sanguine about this because I do not think existing projects are as promising as their owners do, which means we have time.
      That’s what the AI-2027 forecast is about. Alas, it was likely misunderstood...
      I think the fact that tests for selling the model and tests for actual danger from the model are considered the same domain is basically an artifact of the business process, and should not be.
      The scalar in question is the acceleration of the research speed with the AI’s help vs. without the help. It’s indeed hard to predict, but it is the most important issue.
      A crux here: I do not think most things of interest are differentiable curves. Differentiable curves can be modeled usefully. Therefore, people like to assume things are differentiable curves.
      If one is very concerned with being correct, something being a differentiable curve is a heavy assumption and needs to be justified.
      From a far-off view, starting with Moore’s Law, transhumanism (as was the style at the time) has made a point of finding some differentiable curve and extending it. This works pretty well for some things, like Kurzweil on anything that is a function of transistor count, and horribly elsewhere, like Kurzweil on anything that is not a function of transistor count.
      Some things in AI look kind of Moore’s-law-ish, but it does not seem well-supported that they actually are.
      This is likely a crux. What the AI-2027 scenario requires is that AI agents who do automate R&D are uninterpretable and misaligned.
      Yes.
      If a corporation plans to achieve world domination and creates a misalinged AI, then we DON’T end up in a position better than if the corp aligned the AI to itself. In addition, the USG might have nationalised OpenBrain by that point, since the authors promise to create a branch where the USG is[5] way more competent than in the original scenario. [6]
      Added note to explain concern: What type of AI is created is path-dependent. Generically, hegemonic entities make stupid decisions. They would e.g. probably prefer if everyone shut up about them not doing whatever they want to do. Paths that lead through these scenarios are less likely to produce good outcomes, AI-wise.
      This is the evidence of a semi-success which could be actually worse than a failure.
      Yes. I hate it, actually.
      DeepSeek outperformed Llama because of an advanced architecture proposed by humans. The AI-2027 forecast has the AIs come up with architectures and try them. If the AIs do reach such a capability level, then more compute = more automatic researchers, experiments, etc = more results.
      This is cogent. If beyond a certain path all research trees converge onto one true research tree which is self-executing, it is true that available compute and starting point is entirely determinative beyond that point. These are heavy assumptions and we’re well past my “this is a singularity, and its consequences are fundamentally unpredictable” anyway, though.
      “Selling access to a bioweapon-capable AI to anyone with a credit card” will be safe if the AI is aligned so that it wouldn’t make bioweapons even if terrorists ask it to do so.
      I actually don’t think this is the case. You can elide what you are doing or distill it from outputs. There is not that much that distinguishes legitimate research endeavors from weapons development.
      Finally, weakening safety is precisely what the AI-2027 forecast tries to warn against.
      I very much do not think it succeeds at doing this, although I do credit that the intention is probably legitimately this.