habryka comments on Relitigating the Race to Build Friendly AI

habryka 16 Nov 2025 2:48 UTC
26 points
7
MIRI’s plan, to build a Friendly AI to take over the world in service of reducing x-risks, was a good one.
How much was this MIRI’s primary plan? Maybe it was 12 years ago before I interfaced with MIRI? But like, I have hung out with MIRI researchers for an average of multiple hours a week for something like a decade, and during that time period the plan seemed to basically always centrally be:
- Try to make technical progress on solving the alignment problem
- While trying to create a large public intellectual field that can contribute to solving that problem
- While trying to improve the sanity of key decision-makers who will make a bunch of high-stakes decisions involved in AGI
This also seems to me like centrally the strategy I picked up from the sequences, so it must be pretty old.
There was a period of about 4-5 years where research at MIRI pivoted to a confidential-by-default model, and it’s plausible to me that during that period, which I understand much less well, much more of MIRI’s strategy was oriented around doing this.
That said, it seems like Carl Shulman’s prediction from 14 years was born out pretty well:
If we condition on having all other variables optimized, I’d expect a team to adopt very high standards of proof, and recognize limits to its own capabilities, biases, etc. One of the primary purposes of organizing a small FAI team is to create a team that can actually stop and abandon a line of research/design (Eliezer calls this “halt, melt, and catch fire”) that cannot be shown to be safe (given limited human ability, incentives and bias).
After MIRI did a bunch of confidential research, possibly in an attempt to maybe just build an aligned AI system, they realized this wasn’t going to work, then did a “halt, melt, and catch fire” move, and switched gears.
Rereading some of the old discussions in the posts you linked, I think I am more sold than I was previously that this was a real strategic debate at the time, and a bunch of people tried to argue in favor of just going and building it, and explicitly against pursuing strategies like human intelligence augmentation, which now look like much better bets to me.
To their credit, many of the people did work on both, and were pretty clear that they really weren’t sure whether the “solving the problem head on” part would work out, and that they thought it would be reasonable for people to pursue other strategies, and that they themselves would pivot if that became clear to them later on. Eliezer, in a section of a paper you quoted yourself 14 years ago he says:
I do not assign strong confidence to the assertion that Friendly AI is easier than human augmentation, or that it is safer. There are many conceivable pathways for augmenting a human. Perhaps there is a technique which is easier and safer than AI, which is also powerful enough to make a difference to existential risk. If so, I may switch jobs. But I did wish to point out some considerations which argue against the unquestioned assumption that human intelligence enhancement is easier, safer, and powerful enough to make a difference.
Like, IDK, this really doesn’t seem like particularly high confidence, and while I agree with you that in-retrospect you deserve some Bayes-points for calling this at the time, I don’t think Eliezer loses that many, as it seems like all-throughout he proclaimed substantial probability on your perspective being more right here.
- Eli Tyre 4 Dec 2025 4:05 UTC
  13 points
  0
  Parent
  How much was this MIRI’s primary plan? Maybe it was 12 years ago before I interfaced with MIRI?
  Reposting this comment of mine from a few years ago, which seems germane to this discussion, but certainly doesn’t contradict the claim that this hasn’t been their plan in the past 12 years.
  Here is a video of of Eliezer, first hosted on vimeo in 2011. I don’t know when it was recorded.
  [Anyone know if there’s a way to embed the video in the comment, so people don’t have to click out to watch it?]
  
  He states explicitly:
  As a research fellow of the Singularity institute, I’m supposed to first figure out how to build a friendly AI, and then once I’ve done that go and actually build one.
  And later in the video he says:
  The Singularity Institute was founded on the theory that in order to get a friendly artificial intelligence someone’s got to build one. So there. We’re just going to have an organization whose mission is ‘build a friendly AI’. That’s us. There’s like various other things that we’re also concerned with, like trying to get more eyes and more attention focused on the problem, trying to encourage people to do work in this area. But at the core, the reasoning is: “Someone has to do it. ‘Someone’ is us.”
- TAG 3 Dec 2025 22:54 UTC
  9 points
  0
  Parent
  
  MIRI’s plan, to build a Friendly AI to take over the world in service of reducing x-risks, was a good one.
  
  How much was this MIRI’s primary plan?
  
  It was Yudkowsky’s plan before MIRI was MIRI
  
  http://sl4.org/archive/0107/1820.html
  
  “Creating Friendly AI”
  
  https://intelligence.org/files/CFAI.pdf
  
  Both from 2001.
- jessicata 16 Nov 2025 3:28 UTC
  9 points
  2
  Parent
  What about the “Task AGI” and “pivotal act” stuff? That was at the very least, advising others to think seriously about using aligned AI to take over the world, on the basis that the world was otherwise doomed without a pivotal act. Then there was the matter of how much leverage MIRI thought they had as an organization, which is complicated by the confidentiality.
  - habryka 16 Nov 2025 3:29 UTC
    2 points
    0
    Parent
    What about the “Task AGI” and “pivotal act” stuff?
    Plausible! Do you have a link handy? Seems better for the conversation to be grounded in an example, and I am not sure exactly which things you are referencing here.
    - jessicata 16 Nov 2025 3:32 UTC
      7 points
      1
      Parent
      On Arbital. Task directed AGI and Pivotal act.
      
      Offline, at MIRI there were discussions of possible pivotal acts, such as melting all GPUs. I suggested “what about using AI to make billions of dollars” and the response was “no it has to be much bigger than that to fix the game board”. There was some gaming of e.g. AI for uploading or nanotech. (Again, unclear how much leverage MIRI thought they had as an organization)
      - habryka 16 Nov 2025 4:25 UTC
        2 points
        0
        Parent
        Hmm, maybe I am misunderstanding this.
        The “Task AGI” article is about an approach to build AGI that is safer than building a sovereign, published, on the open internet. I do not disagree that MIRI was working on trying to solve the alignment problem (as I say above, that is what two of the bullet points of my summary of their strategy are about), which this seems to be an attempt at making progress on. It doesn’t seem to me to be much evidence for “MIRI was planning to build FAI in their basement”. Yes, my understanding is that MIRI is expecting that at some point someone will build very powerful AI systems. It would be good for them to know how to do that in a way that has good consequences instead of bad. This article tries to help with that.
        The “Pivotal Act” article seems similar? I mean, MIRI is still working on a pivotal act in the form of an international AI ban (subsequently followed maybe with an intelligence augmentation program). I am working on pivotal acts all day! It seems like a useful handle to have. I use it all the time. It does seem to frequently be misunderstood by people to mean “take over the world”, but like, there is no example in the linked article of something like that. The most that the article talks about is:
        a limited Task AGI that can:
        upload humans and run them at speeds more comparable to those of an AI
        prevent the origin of all hostile superintelligences (in the nice case, only temporarily and via strategies that cause only acceptable amounts of collateral damage)
        design or deploy nanotechnology such that there exists a direct route to the operators being able to do one of the other items on this list (human intelligence enhancement, prevent emergence of hostile SIs, etc.)
        Which really doesn’t sound much like a “take-over-the-world” strategy. I mean, the above still seems to me like a good plan that in as much as a leading lab has no choice but to pursue AGI as a result of an intense race, I would like them to give it a try. Like, it seems terribly reckless and we are not remotely on track to doing this with any confidence, but like, I am in favor of people openly publishing things that other people should do if they find themselves building ASI. And again the above bullet lists also really don’t sound like “taking over the world”, so I still have trouble connecting this to the paragraph in the OP I take issue with.
        I suggested “what about using AI to make billions of dollars” and the response was “no it has to be much bigger than that to fix the game board”. There was some gaming of e.g. AI for uploading or nanotech. (Again, unclear how much leverage MIRI thought they had as an organization)
        None of these sound much like “taking over the world”? Like, yes, if you were to write a paper or blogpost with a plan that allowed someone to make a billion dollars with AI, that seems like it would basically do nothing, and if anything make things worse. It does seem like helpful contributions need to be of both a different type signature, and need to be much bigger than that.
        jessicata 16 Nov 2025 4:30 UTC
        2 points
        0
        Parent
        
        It doesn’t seem to me to be much evidence for “MIRI was planning to build FAI in their basement”
        
        I didn’t say that
        
        The “Pivotal Act” article seems similar? I mean, MIRI is still working on a pivotal act in the form of an international AI ban (subsequently followed maybe with an intelligence augmentation program). I am working on pivotal acts all day!
        
        At the time it was clear MIRI thought AGI was necessary for pivotal acts, e.g. to melt all GPUs, or to run an upload. I remember discussing “weak nanotech” and so on and they didn’t buy it, they thought they needed aligned task AGI to do a pivotal act.
        
        Which really doesn’t sound much like a “take-over-the-world” strategy.
        
        Quoting task AGI article:
        
        The obvious disadvantage of a Task AGI is moral hazard—it may tempt the users in ways that a Sovereign would not. A Sovereign has moral hazard chiefly during the development phase, when the programmers and users are perhaps not yet in a position of special relative power. A Task AGI has ongoing moral hazard as it is used.
        
        So this is acknowledging massive power concentration.
        
        Furthermore, in context of the disagreement with Paul Christiano, it was clear that MIRI people thought there would be a much bigger capability overhang / FOOM, such that the system did not have to be “competitive”, it could be a “limited AGI” that was WAY less efficient than it could be, because of a pre-existing capability overhang versus the competition. Which, naturally, goes along with massive power concentration.
        habryka 16 Nov 2025 4:40 UTC
        4 points
        2
        Parent
        I didn’t say that
        Wait, you didn’t? I agree you didn’t say “basement” but the section of the OP I am responding to is saying:
        MIRI’s plan, to build a Friendly AI to take over the world
        And then you said:
        What about the “Task AGI” and “pivotal act” stuff? [Which is an example of MIRI’s plan to build a Friendly AI to take over the world]
        The part in square brackets seems like the very clear Gricean implicature here? Am I wrong? If not, what did you mean to say in that sentence?
        All the other stuff you say seems fine. I definitely agree MIRI talked about building AIs that would be very powerful and also considered whether power concentration would be a good thing, as it would reduce race dynamics. But again, I am just talking about the part of the OP says that it was MIRI’s plan to build such a system and take over the world, themselves, “in service of reducing x-risk”. None of the above seems like much evidence for that? If you agree that this was not MIRI’s plan, then sure, we are on the same page.
        jessicata 16 Nov 2025 4:43 UTC
        2 points
        0
        Parent
        
        The part in square brackets seems like the very clear Gricean implicature here? Am I wrong? If not, what did you mean to say in that sentence?
        
        See the two sentences right after.
        
        That was at the very least, advising others to think seriously about using aligned AI to take over the world, on the basis that the world was otherwise doomed without a pivotal act. Then there was the matter of how much leverage MIRI thought they had as an organization, which is complicated by the confidentiality.
        
        The Griecian implicature of this is that I at least don’t think it’s clear that MIRI wanted to build an AI to take over the world themselves. Rather, they were encouraging pivotal acts generally, and there’s ambiguity about how much they were individually trying to do so.
        
        The literal implication of this is that it’s hard for people to know how much leverage MIRI has as an organization, which implies it’s hard for them to know that MIRI wanted to take over the world themselves.
        habryka 16 Nov 2025 4:48 UTC
        13 points
        9
        Parent
        Cool, yeah. I mean, I can’t rule this out confidently, but I do pretty strongly object to summarizing this state of affairs as:
        Of course the most central old debate was over whether MIRI’s plan, to build a Friendly AI to take over the world in service of reducing x-risks, was a good one.
        Like, at least in my ethics there is a huge enormous gulf between trying to take over the world, and saying that it would be a good idea for someone, ideally someone with as much legitimacy as possible, who is going to build extremely powerful AI systems anyways, to do this:
        upload humans and run them at speeds more comparable to those of an AI
        prevent the origin of all hostile superintelligences (in the nice case, only temporarily and via strategies that cause only acceptable amounts of collateral damage)
        design or deploy nanotechnology such that there exists a direct route to the operators being able to do one of the other items on this list (human intelligence enhancement, prevent emergence of hostile SIs, etc.)
        I go around and do the latter all the time, and think more people should do so! I agree I can’t rule out from the above that MIRI was maybe also planning to build such systems themselves, but I don’t currently find it that likely, and object to people referring to it as a fact of common knowledge.
- Wei Dai 4 Dec 2025 4:41 UTC
  7 points
  5
  Parent
  How much was this MIRI’s primary plan? Maybe it was 12 years ago before I interfaced with MIRI? But like, I have hung out with MIRI researchers for an average of multiple hours a week for something like a decade
  In this post, I’m mostly talking about my debate with Eliezer more than 12 years ago, when SIAI/MIRI was still talking about building a Friendly AI (which we later described as “sovereign” to distinguish from “task” and “oracle” AI). (Or attempted or proxy debate, anyway, as I’m noticing that Eliezer himself didn’t actually respond to many of my posts/comments.)
  However I believe @jessicata is right that a modified form of the plan to build a more limited “task” AI persisted quite a bit after that, probably into the time you started interfacing with MIRI. (I’m not highly motivated to dig for evidence as to exactly how long this plan lasted, as it doesn’t affect my point in the OP.) My guess as to why you got a different impression is that different MIRI people had different plans/intentions/motivations, with Eliezer being the most gung-ho on personally being involved in building some kind of world-altering AI, but also having the most power/influence at MIRI.
- TsviBT 4 Dec 2025 7:48 UTC
  4 points
  0
  Parent
  As a datapoint, I came into the field after reading the Sequences around 2011, as well as almost all of Yudkowsky’s other writing; then studying math and stuff in university; and then moving to the Bay in 2015. My personal impression of the strategic situation, insofar as I had one, was “AI research has already been accelerating, it’s clearly going to accelerate more and more, we can’t stop this, so we have to build the conceptual background which would allow one to build a (conceptual) function which takes as input an AI field that’s nearing AGI and give as output a minimal AGI that can shut down AI research”. (This has many important flaws, and IDK why I thought that.)
- Oliver Sourbut 4 Dec 2025 11:29 UTC
  2 points
  0
  Parent
  Yudkowsky’s 2008 AI as a Positive and Negative Factor in Global Risk is a pretty good read, both for the content (which is excellent in some ways and easy to critique in others), and for the historical interest (where it’s useful to litigate the question of what MIRI was aiming at around then, and because it’s interesting how much dynamic Yudkowsky anticipated/missed, and because it’s interesting to inhabit 2008 for a bit and update on empirical observations since then).