Daniel Kokotajlo comments on Vitalik’s Response to AI 2027

Daniel Kokotajlo 11 Jul 2025 22:26 UTC
11 points
4
I particularly worry about the common assumption that building up one AI hegemon, and making sure that they are “aligned” and “win the race”, is the only path forward.
I agree here!
- another-anon-do-gooder 14 Jul 2025 22:16 UTC
  3 points
  0
  Parent
  Does your agreement stem from thinking defense could hold off offense? I ask because I’m curious what alternatives to AI hegemony might exist. I agree that an AI hegemon would likely be problematic, but if offense can beat defense, I wonder what alternatives might be realistic (apologies if you’ve addressed this elsewhere)
  - Daniel Kokotajlo 15 Jul 2025 21:05 UTC
    6 points
    0
    Parent
    I want there to be international coordination to govern/regulate/etc. AGI development. This is, in some sense, “one hegemon” but only in about the same sense that the UN Security Council is one hegemon, i.e. not in the really dangerous sense.
    I think there’s a way to do this that’s reasonably likely to work even if offense generally beats defense (which I think it does, in the relevant sense, for AI-related stuff.)
    - Noah Weinberger 16 Jul 2025 1:55 UTC
      11 points
      5
      Parent
      Hi Daniel.
      My background (albeit limited as an undergrad) is in political science, and my field of study is one reason I got interested in AI to begin with, back in Feburary of 2022. I don’t know what the actual feasibility is for an international AGI treaty with “teeth”, and I’ll tell you why: the UN Security Council.
      As it currently exists, the UN Security Council has permanent members: China, France, Russia, the United Kingdom, and the United States. All five countries have a permanent veto as granted to them by the 1945 founding UN Charter.
      China and the US are the two major global superpowers of the 21st century, and each are currently deadlocked in the race to reach AGI; to borrow a speedrunning term, any%. While it is possible in theory for the US and China to have a bilateral Frontier AI treaty, similar to how nuclear powers have the NPT, and the US and Russia have their own armaments accords, AGI is a completely different story.
      It’s a common trope in the UN for a country on the UNSC to exercise its right to a permanent veto on any resolution brought to it that the nation deems a threat to its sovereignty, or that of its allies. Russia has used it to prevent key sanctions from the Ukraine war at the UNGA, and the US uses it to protect its allies from various resolutions, often brought up by countries in the Global South who make up most seats in the UNGA.
      Unless the Security Council is drastically reformed, removing a permanent veto from the P5 and putting a rotating veto from a Global South country, an internationally binding AGI treaty is far from happening.
      I do see, however, unique bilateral accords between various Middle Powers on AI, such as Canada and the European Union. Do you agree?
      - Noah Weinberger 16 Jul 2025 15:41 UTC
        1 point
        2
        Parent
        I might do my next LessWrong post about Global Affairs and AI, either in relation to AI 2027 or just my own unique take on the matter. We’ll see. I need to curate some reliable news clippings and studies.
- StanislavKrym 12 Jul 2025 4:04 UTC
  2 points
  0
  Parent
  I agree that the assumption about building one hegemon is bad. Indeed, I considered the possibility that OpenBrain and some rivals create their versions of Agent-3 and end up having them co-research. Were one of them to care about humans, it could decide to do things like implanting the worry into the successor or whistleblowing to the humans by using transparent AIs trained in a similar environment.
  In addition, the multipolar scenario is made more plausible because the ARC-AGI-2 leaderboard has the models o3, Claude 4 Opus and Grok 4 who were released in the interval of three months and have begun to tackle the benchmark. Unfortunately, Grok already faces major alignment problems.^[1] There also is the diffusion-based architecture which threatens to undermine transparency.
  On the other hand, I think that the AI companies might become merged due to the Taiwan invasion instead of misalignment. OpenBrain might also fail to catch the misaligned Agent-4 if Agent-2 or Agent-3 collude^[2] with Agent-4.
  1. ^
    What Musk tried to achieve was a right-wing chatbot trained on the Internet. My theory would be that right-wing posts in the Anglosphere are usually overly provocative, the emergently misaligned persona is based off Internet trolls. A right-wing finetuned AI, like an evil-finetuned one, is kicked off the original persona through the “Average Right-Winger” persona into the Troll Persona.
    For comparison, DeepSeek has no such problems. If it is asked in Russian, then the answers are non-provocative and more politically conservative than if DeepSeek is asked in English.
  2. ^
    My reasoning was that Agent-2 could already end up adversarially misaligned, but my scenario has the AIs since Agent-2 care about humans in a different way. The AIs, of course, do their best to align the successor to their ideas instead of the hosts’ ideas.