TsviBT comments on Relitigating the Race to Build Friendly AI

TsviBT 5 Dec 2025 1:01 UTC

LW: 5 AF: 4

That’s kind of surprising (that this is your response), given that you signed the Superintelligence Statement which seems to contradict this.

I think no one should build AGI. If someone is going to build AGI anyway, then it might be correct to make AGI yourself first, if you have a way to make actually aligned (hopefully task-ish or something).

are you now convinced that sovereign-FAI (I’m avoiding “takeover” due to objection from Habryka and this) was a real and serious plan, or do you want more evidence?

Let me rephrase. I already believed that there had been a plan originally, like 2004 +/- 3 years, to make sovereign AI. When I was entering the field, I don’t recall thinking too hard about “what do you do with it”, other than thinking about a Task-ish thing, but with Sovereign AI as a good test case for thought experiments. I don’t know when Yudkowsky and others updated (and still don’t).

I think you should (if you haven’t already) update more towards the view I have of Eliezer, that he is often quite seriously wrong and/or overconfident

I’m still not sure where you’re getting this? I mean, there are places where I would disagree with Yudkowsky’s action-stances or something. For example, I kinda get the sense that he’s planning “as though” he has confident short timelines; I don’t think confident short timelines make sense, but that’s different from how an individual makes their plans. For example, I’m working almost exclusively on plans that only pay off after multiple decades, which looks like “very confident of long timelines”, but I’m not actually very confident of that and I say so...

For example, the rejection of HIA you quoted again seems pretty tepid, which is to say, quite non-confident, explicitly calling out non-confidence. In practice one can only seriously work on one or two different things, so I wonder if you’re incorrectly inferring confidence on his part.

Maybe after doing this update, it becomes more plausible that his third idea (Task AGI, which I guess is the first that you personally came into contact with, and then spent years working towards) was also seriously wrong (or more seriously wrong than you think)?

I think I’m not following; I think Task AI is a bad plan, but that’s because it’s extremely difficult. I think you’re asking me to imagine that it is solved, but that we should have unknown-unknown type certainty about our solution; and I just don’t feel like I know how to evaluate that. If there was (by great surprise) some amazing pile of insights that made a safe Task-AGI seem feasible, and that stood up to comprehensive scrutiny (somehow), then it would plausibly be a good plan to actually do. I think you’re saying this is somehow overconfident anyway, and maybe I just disagree with that? But it feels hard to disagree with because it’s pretty hypothetical. If you’re argument is “but other people are overconfident and wrong about their alignment ideas” I think this proves too much? I mean, it seems to prove too much if applied to cryptography, no? Like, you really can make probably-quite-secure systems, though it takes a lot of work and carefulness and stuff, and the guarantees are conditional on certain mathematical conjectures and only apply to some class of attacks. But I mean, the fact that many people with a wide range of abilities could become wrongly overconfident in the security of their hand-rolled system, doesn’t prove an expert can’t know when their system is secure in certain senses.

Wei Dai 7 Dec 2025 5:58 UTC

LW: 5 AF: 4

AF Parent

Let me rephrase. I already believed that there had been a plan originally, like 2004 +/- 3 years, to make sovereign AI. When I was entering the field, I don’t recall thinking too hard about “what do you do with it”, other than thinking about a Task-ish thing, but with Sovereign AI as a good test case for thought experiments. I don’t know when Yudkowsky and others updated (and still don’t).

I got a free subscription to Perplexity Pro, and used it to generate a report about this. The result seems pretty good. The short answer to “when Yudkowsky updated (from Sovereign to Task AGI)” is very likely 2014 or 2015 (see second collapsed section below).

Perplexity AI research report on “using this post and comment and linked resources as a starting point, can you build a comprehensive picture of SIAI/MIRI’s strategy over time, especially pinpointing the specific dates that its strategy changed/pivoted, backed up by the best available evidence
https://www.lesswrong.com/posts/dGotimttzHAs9rcxH/relitigating-the-race-to-build-friendly-ai″

SIAI/MIRI’s strategy over time: phases and pivot points

The Machine Intelligence Research Institute (MIRI), originally the Singularity Institute for Artificial Intelligence (SIAI), has changed strategy multiple times.

Over 25 years, the strategy has moved through roughly these phases:

2000–2003: “Build the Seed AI ourselves, as fast as possible” (accelerationist + Friendly AI).
~2003–2010: “Friendly AI and existential risk” + heavy outreach, still with a strong self‑image as the eventual FAI builder.
2011–2013: Professionalisation and pivot to technical FAI research; formal mission broadened from “build FAI” to “ensure AI turns out well,” though key people still expected MIRI (or a tight equivalent) to build a world‑altering Friendly AI.
2013–2016: Mature “world‑altering Friendly AI” / CEV vision plus growing emphasis on Task‑AGI and “pivotal acts.”
2017–2020: New engineering‑heavy, largely confidential research program aimed at new foundations for alignable AI, with a clear plan around “minimal aligned AGI” performing a pivotal act.
2020–2022: Public admission that the 2017 program had “largely failed” and a period of strategic regrouping; increasingly pessimistic public tone (e.g. AGI Ruin).
2023–present: Official pivot to policy and communications first, with research now framed as secondary support for an “international halt” to progress toward smarter‑than‑human AI.

Below is a more detailed, evidence‑based timeline, with particular attention to when strategy clearly shifted.

1. Founding: “Create a Friendly, self‑improving AI” (2000–2003)

Founding mission (July 2000).
The organization is founded on 27 July 2000 as the Singularity Institute for Artificial Intelligence by Eliezer Yudkowsky and Brian & Sabine Atkins. The mission statement on the early IRS forms and timeline is explicit:

“Create a Friendly, self‑improving Artificial Intelligence.”timelines.issarice

Early technical writings like Coding a Transhuman AI and Creating Friendly AI 1.0 (2001) describe specific architectures for a transhuman AI that self‑improves while remaining benevolent. This is not just “study risks”; it is a plan to design the Seed AI.intelligence+1

Early accelerationism.
MIRI’s own 2024 retrospective states that at founding, the goal was:

“to try to accelerate to smarter‑than‑human AI as quickly as possible, on the assumption that greater‑than‑human intelligence entails greater‑than‑human morality.”intelligence

So the earliest SIAI strategy can be summarized as:

Build a self‑improving Seed AI (with Friendly properties).
Speeding up AGI is seen as good, provided it is Friendly.

Approximate pivot:
By MIRI’s own later account, Eliezer’s “naturalistic awakening” — realizing that superintelligence would not automatically be moral — occurs between 2000–2003, and:

“MIRI shifted its focus to the alignment problem around 2003.”intelligence+1

So around 2003 there is a conceptual pivot from “accelerate AGI, assume it’s benevolent” to “AI alignment is a hard, central problem.”

2. Mid‑2000s: Friendly AI + outreach, still as would‑be FAI builders (2003–2010)

Mission reformulation (2007 and 2010).
By 2007 the formal mission is updated to:

“developing safe, stable, and self‑modifying Artificial General Intelligence.”timelines.issarice

In 2010 it further becomes:

“To develop the theory and particulars of safe self‑improving Artificial Intelligence… and otherwise improve the probability of humanity surviving future technological advances.”timelines.issarice

This is already less “we will build it” and more “develop theory and improve odds,” but still assumes SIAI as central technical actor.

Strategic plan 2011: three‑pronged strategy.
The 2011 Strategic Plan lays out three “core strategies”:

“Solve open research problems in Friendly AI theory.”
“Mobilize additional human capital and other resources” (Singularity Summit, LessWrong, volunteers, outreach).
“Improve the function and capabilities of the organization.”intelligence

Near‑term (2011–2012) priorities emphasise:

Public‑facing research on “creating a positive singularity.”
Outreach/education/fundraising via the Summit and rationality community.
Organizational maturity.intelligence

So in this period, strategy is mixed:

Technical aim: make progress on FAI theory.
Meta‑aim: grow a rationalist / EA / donor ecosystem and mainstream attention.

Evidence they still saw themselves as the eventual FAI builders.

Several pieces of evidence support the picture that, in the 2000s and up to at least ~2011, SIAI’s central plan was that it (or a very close “functional equivalent”) would actually build the first Friendly AGI:

The MIRI timeline notes the original mission (2000–2006) as “Create a Friendly, self‑improving Artificial Intelligence.”timelines.issarice
In a video first hosted in 2011, Eliezer says:
“As a research fellow of the Singularity Institute, I’m supposed to first figure out how to build a Friendly AI, and then once I’ve done that go and actually build one…
The Singularity Institute was founded on the theory that in order to get a Friendly artificial intelligence someone’s got to build one. So there. We’re just going to have an organization whose mission is ‘build a Friendly AI’. That’s us. … At the core, the reasoning is: ‘Someone has to do it. “Someone” is us.’”lesswrong
Luke Muehlhauser’s 2013 post Friendly AI Research as Effective Altruism (looking back on the founding) explicitly confirms:
“MIRI was founded in 2000 on the premise that creating Friendly AI is the best way to ensure a positive Singularity…
MIRI co‑founder Eliezer Yudkowsky usually talks about MIRI in particular — or at least, a functional equivalent — creating Friendly AI.”intelligence

Wei Dai’s 2025 LessWrong post also recalls that he was “arguing against SIAI/MIRI’s plan to build FAI” as early as 2004. That is consistent with the explicit “someone is us” framing above.lesswrong+1

So for roughly 2000–2010, the strategic picture is:

Goal: Design and eventually build a Friendly, self‑improving AGI.
Means: Foundational theory + community‑building + outreach (Summits, LessWrong, rationality training).
Role: SIAI is not just an advocacy group; it is intended Seed AI shop.

3. 2011–2013: professionalisation and pivot to FAI math research

Here we see the first clearly documented organizational pivot with a date.

3.1. 2013: “MIRI’s Strategy for 2013” – research‑only, FAI math‑first

In April 2013, MIRI (new name) publishes MIRI’s Strategy for 2013. Key elements:intelligence

They have:
“once [been] doing three things — research, rationality training, and the Singularity Summit. Now we’re doing one thing: research. Rationality training was spun out to CFAR, and the Summit was acquired by Singularity University.”intelligence
After strategic planning with 20+ advisors in early 2013, they decide to:
“(1) put less effort into public outreach, and
(2) shift our research priorities to Friendly AI math research.”intelligence
They distinguish expository, strategic, and Friendly AI research and explicitly say:
“Strategic research will consume a minority of our research budget in 2013…”intelligence

In other words, as of early 2013:

Outreach and rationality training are explicitly deprioritised.
The core strategy is: focus limited staff and budget on math‑style Friendly AI research (logical uncertainty, self‑modification, decision theory) with some high‑quality strategy work, but little expository or broad public outreach.

3.2. 2013: bylaw / mission statement change

In the same period, MIRI’s formal mission statement is softened from “create Friendly AI” to a more conditional role. Luke writes in June 2013:

“When updating MIRI’s bylaws in early 2013, Yudkowsky and I came to a compromise on the language of MIRI’s mission statement, which now reads:
‘[MIRI] exists to ensure that the creation of smarter‑than‑human intelligence has a positive impact… Thus, the charitable purpose… is to: (a) perform research… (b) raise awareness… (c) advise… and (d) as necessary, implement a smarter‑than‑human intelligence with humane, stable goals’ (emphasis added).”intelligence

Luke adds:

“My own hope is that it will not be necessary for MIRI (or a functional equivalent) to attempt to build Friendly AI itself. But… I must remain open to the possibility that this will be the wisest course of action as the first creation of AI draws nearer.”intelligence

So early 2013 marks a formal pivot:

In documents: from “we exist to build FAI” to “we exist to ensure AGI goes well; we might build AGI if necessary.”
In practice: from “outreach + community + research” to “almost pure technical FAI math research.”

4. Circa 2013: the “world‑altering Friendly AI” plan

Wei Dai’s 2025 LessWrong post, which you linked, is primarily about this era. He characterizes the “circa 2013” strategy as:

“MIRI’s circa 2013 plan, to build a world‑altering Friendly AI, was a good one?”alignmentforum+1

His argument is that this plan was strategically bad even ex ante, because a small team attempting to directly build a Friendly sovereign AI faces “illegible safety problems” — traps they cannot even see.lesswrong

Does the evidence support that this really was the plan?

Luke’s 2013 post explicitly notes that Eliezer usually talks about MIRI specifically (or a functional equivalent) creating Friendly AI.intelligence
In the 2013 strategy conversation with Holden Karnofsky and others, Eliezer says:
“Civilization doesn’t seem to be handling AI friendliness on its own … so our mission: just go do it…
I have never held out much hope for approaching a random AGI project 30 years from now… and showing them a bunch of results… So it doesn’t necessarily have to be in MIRI’s basement. It’s going to be some project that shows what I would consider a reasonable level of concern… The present community that we’re trying to gather is not so much a community of people who are going to be influential on someone else. It’s more like a community of people who do research work.”openphilanthropy

This is consistent with: MIRI expects either itself, or a close “strategically adequate” project built around its ideas and people, to be the group that actually builds the first aligned AGI.
The LessWrong discussion notes that Yudkowsky “usually talks about MIRI in particular — or at least, a functional equivalent — creating Friendly AI,” and this needed explicit textual evidence by 2025 because it was no longer “common knowledge.”alignmentforum+1

Put together, the best reading is:

Around 2013, MIRI’s operational strategy is “do deep foundational technical work now so that MIRI or a tightly aligned group can someday build a world‑altering Friendly AI (often conceived as CEV‑like), thereby securing the future.”
Critics like Wei Dai were indeed pushing alternative strategies (e.g. invest in more geniuses via embryo selection / cloning and buy time by delaying AI efforts) as early as 2004.lesswrong

This is almost exactly the “race to build Friendly AI” picture the LessWrong post is relitigating, although there is internal disagreement over how much this should be described as “take over the world yourselves” versus “someone legitimate should, and we might be that someone.”lesswrong

5. Emergence of Task‑AGI and “pivotal acts” (≈2014–2017)

Over time, there is a shift in how MIRI talks about what the first aligned system should do.

Early concepts such as Coherent Extrapolated Volition (CEV) envisaged a very powerful sovereign AI that, roughly, does what humanity would want if we were smarter, more informed, and more coherent. That is a maximally ambitious alignment target.
Later, Yudkowsky distinguishes between:
- A Sovereign (a CEV‑like system running the long‑term future), and
- A Task AGI, a powerful but limited system intended to perform a pivotal act — some one‑time intervention that “upsets the gameboard” in our favor (e.g. uploading researchers, preventing hostile superintelligences, enabling nanotech), after which further decisions can be made under safer conditions.lesswrong+1

The LessWrong “pivotal act” and “Task AGI” pages (compiled later, but summarizing this thinking) explicitly give examples of a Task AGI that can:

Upload humans and run uploads at high speed.
Prevent the origin of hostile superintelligences, possibly temporarily.
Deploy nanotechnology that gives a route to intelligence enhancement or strong defense.lesswrong+1

These are very close to the examples quoted in the comment thread you provided.

When does this show up as strategy?

The clearest institutional articulation is MIRI’s 2017 Fundraiser and Strategy Update (Nov 30, 2017). That document lays out a multi‑step high‑level plan:effectivealtruism

Humanity needs a “stable period” to think clearly about long‑run futures and avoid locking in bad outcomes.
To get there, we must first end an “acute risk period” in which AGI could cause extinction.
MIRI expects this to require a “risk‑mitigating technology” developed via:
“AGI‑empowered technological development carried out by task‑based AGI systems.”effectivealtruism
That, in turn, requires:
“Construction of minimal aligned AGI… aligned AGI that has the minimal necessary capabilities for ending the acute risk period.”effectivealtruism
They stress the need for a “strategically adequate” project with strong opsec, research closure, commitment to the common good, and “white‑boxed system understanding.”effectivealtruism

This is, in effect, the Task‑AGI / pivotal‑act plan written down in strategic language.

So while Task‑AGI ideas were developing earlier, 2017‑11‑30 is the first strong documented pivot point where MIRI’s official strategic picture becomes:

Don’t aim first for a full CEV‑style sovereign.
Aim to enable a strategically adequate team to build a minimal aligned AGI capable of executing a pivotal act that ends the acute risk period.

Eli Tyre’s later comment, which you saw in the LessWrong thread, describes an “intentional switch, around 2016 (though I’m not confident in the date), from aiming to design a Friendly CEV‑optimizing sovereign AI to aiming to design a corrigible minimal Science‑And‑Engineering‑AI to stabilize the world.” That matches what shows up publicly in 2017.lesswrong

6. 2017–2018: new engineering‑heavy, nondisclosed‑by‑default research program

In parallel, there was a major research‑tactics pivot.

6.1. New research directions (2017–2018)

The 2017 fundraiser already notes that, with new funding and changed views, MIRI is “starting to explore new research directions while also continuing to push on our agent foundations agenda,” and is hiring software engineers to “experiment and explore some ideas in implementation.”effectivealtruism

The 2018 post 2018 Update: Our New Research Directions gives a detailed picture:intelligence

MIRI still has the Agent Foundations agenda (logical uncertainty, embedded agency, decision theory).openphilanthropy+1
“Recently, some MIRI researchers developed new research directions that seem to enable more scalable progress” by building software systems to test intuitions and iterate quickly.intelligence
High‑level themes include:
- “Seeking entirely new low‑level foundations for optimization, designed for transparency and alignability from the get‑go, as an alternative to gradient‑descent‑style machine learning foundations.”intelligence
- Figuring out parts of cognition that can be “very transparent as cognition.”intelligence

These directions are explicitly engineering‑heavy, Haskell‑centric, and aimed at deconfusion while building real code, not just pen‑and‑paper math.intelligence

6.2. “Nondisclosed‑by‑default” policy

The same 2018 post announces a major change in publication policy:

“MIRI recently decided to make most of its research ‘nondisclosed‑by‑default’, by which we mean that going forward, most results discovered within MIRI will remain internal‑only unless there is an explicit decision to release those results…”intelligence

Reasons include:

Worry that deconfusion results might also lead to capabilities advances.
Desire to let researchers think without constant self‑censorship about dual‑use risks.
Focus on speed of progress rather than exposition.intelligence

So 2018‑11‑22 marks a clear tactical pivot:

From publishing most core work to doing a large fraction of the new engineering‑heavy research under confidentiality.
From a more outreach‑oriented identity to “we really genuinely don’t mean outreach; we’re simply and only aiming to directly make research progress on the core problems of alignment.”intelligence

This is almost certainly the “4–5 years of confidential‑by‑default research” that Habryka references in the comment thread.lesswrong

7. 2020: admitting the 2017 research push “largely failed” and regrouping

By late 2020, MIRI states that the 2017–2019 “new directions” have not panned out as hoped.

In 2020 Updates and Strategy (Dec 20, 2020), Nate Soares writes:intelligence

2020 saw “limited progress in the research MIRI’s leadership had previously been most excited about: the new research directions we started in 2017.”intelligence
Senior staff have become “more pessimistic about this approach”:
“The non‑public‑facing research I (Nate) was most excited about had a flavor of attempting to develop new pragmatically‑feasible foundations for alignable AI… After pushing more on these lines of research, MIRI senior staff have become more pessimistic about this approach.”intelligence
As a result:
“We are currently in a state of regrouping, weighing our options, and searching for plans that we believe may yet have a shot at working.”intelligence

Some researchers leave; others attempt to “rescue” the 2017‑initiated lines; others pivot to new exploratory projects.intelligence

This is a clear strategic pivot dated 2020‑12‑20:

Official recognition that the previous core engineering program is unlikely to succeed.
An explicit search phase for new strategy, with no replacement yet fully specified.

8. 2022: “AGI Ruin” and strong public pessimism

Eliezer’s widely read 2022 essay AGI Ruin: A List of Lethalities further clarifies how MIRI leadership now see the situation.intelligence+1

Key points relevant to strategy:

Alignment, in the sense “get to less than certain doom on the first dangerous try,” is portrayed as extremely difficult; the problem is lethal and must be solved under both a time limit and a “first critical try” constraint.intelligence
He argues a pivotal act is still needed:
“We need to align the performance of some large task, a ‘pivotal act’ that prevents other people from building an unaligned AGI that destroys the world.”intelligence
But he also says he sees no realistic path to doing this under current conditions, and regards doom as the default outcome.

Although AGI Ruin is not itself a strategy document, it marks a public rhetorical pivot:

The earlier plan “MIRI (or a strategically adequate project we seed) devises and executes a pivotal act using aligned AGI” is treated as effectively unachievable.
This viewpoint sets the stage for the next major institutional shift: toward halting or pausing AGI rather than expecting to align and deploy it.

9. 2023–2024: pivot to policy and communications first

MIRI’s 2024 Mission and Strategy Update (Jan 3–4, 2024) is the most explicit top‑level strategy document and also provides a retrospective on earlier phases.lesswrong+2

After reviewing its history (founding accelerationism, 2003 alignment shift, 2006–2012 community‑building, 2013 Agent Foundations, 2017 engineering push, 2020 regrouping), it states:

“In 2023, MIRI shifted its strategy to pursue three objectives:
Policy: Increase the probability that the major governments of the world end up coming to some international agreement to halt progress toward smarter‑than‑human AI, until humanity’s state of knowledge… has drastically changed; and until we are able to secure these systems…
Communications: Share our models of the situation with a broad audience…
Research: Continue to invest in a portfolio of research… including technical alignment research (though we’ve become more pessimistic that such work will have time to bear fruit if policy interventions fail to buy the research field more time).”effectivealtruism+2

They add:

“Although we plan to pursue all three of these priorities, it’s likely that policy and communications will be a higher priority for MIRI than research going forward.”lesswrong+1

Thus, in 2023 (per the 2024 retrospective) MIRI’s top‑level strategy pivots to:

Primary: Push for an international halt / moratorium on smarter‑than‑human AI (policy) and make their models legible enough to shift the Overton window (communications).
Secondary: Continue alignment research, but now framed as conditional on policy success buying time.

This is a very sharp contrast with:

2013 (“less outreach, more technical research”).intelligence
2018 (“we really don’t mean outreach; we’re just trying to solve core alignment problems”).intelligence

Now, outreach and policy work are explicitly prioritized over new technical work.

10. Consolidated list of major strategic pivots with dates and evidence

Below is a concise list of the clearest, evidence‑backed pivot points.

Date / period	Pivot	Evidence
2000‑07‑27	Founding as SIAI with mission “Create a Friendly, self‑improving AI” and early accelerationist assumption “greater intelligence ⇒ greater morality.”	Founding mission on 2000 Form 990 and timeline; MIRI 2024 retrospective.timelines.issarice+1
~2003	Conceptual shift from “accelerate AGI, assume morality” to “alignment is hard and central”; focus moves to the Friendly AI problem and x‑risk.	MIRI 2024 history: “shifted its focus to the alignment problem around 2003.”intelligence+1
2007–2010	Mission reworded to “developing safe, stable, and self‑modifying AGI” and then “develop the theory… of safe self‑improving AI,” still with self‑image as eventual FAI builder.	Mission entries on timeline; early FAI texts.timelines.issarice+2
2011 strategic plan (Aug 2011)	First formal organisational strategy: three pillars (FAI research, outreach/human capital, organisational improvements); heavy emphasis on Summit and rationality outreach.	Strategic Plan, updated Aug 2011.intelligence
By ~2011 (video)	Eliezer states explicitly that SI’s mission is “first figure out how to build a Friendly AI, and then… actually build one… ‘Someone has to do it. “Someone” is us’,” confirming the self‑builder strategy.	2011 video quoted by Eli Tyre.lesswrong
Early 2013	Formal bylaw/mission change: from “build FAI” to “ensure that the creation of smarter‑than‑human intelligence has a positive impact,” with building AGI only “as necessary.”	Luke’s Friendly AI Research as Effective Altruism quoting the new bylaws.intelligence
2013‑04‑12	MIRI’s Strategy for 2013: pivot from “research + outreach + Summit + rationality” to “one thing: research,” with outreach and expository work sharply deprioritized and FAI math research prioritized.	MIRI’s Strategy for 2013.intelligence
Circa 2013 (world‑altering FAI plan)	Operational plan: MIRI or a closely allied small team aims to build a world‑altering Friendly AI (often sovereign / CEV‑like) to secure the future; this is the central “race to build Friendly AI” debated by Wei Dai et al.	Wei Dai’s retrospective; Luke’s confirmation that Eliezer talks of MIRI in particular creating FAI; 2013 strategy conversation stating “mission: just go do it,” possibly not “in MIRI’s basement” but via a tightly concerned project.lesswrong+3
2017‑11‑30	2017 Fundraiser and Strategy Update: explicit strategic model of success via “AGI‑empowered technological development” by task‑based AGI, construction of “minimal aligned AGI”, and need for a strategically adequate project; this is the Task‑AGI / pivotal‑act plan in official form.	2017 strategy/fundraiser post.effectivealtruism
2017–2018	Launch of “new research directions” — engineering‑heavy, software‑based deconfusion work — alongside Agent Foundations, enabled by new funding.	2017 fundraiser and 2018 update.effectivealtruism+1
2018‑11‑22	Adoption of “nondisclosed‑by‑default” policy: most new technical results kept internal unless clear safety upside in release; MIRI explicitly positions itself as doing non‑outreach, straight‑shot alignment research.	2018 Update: Our New Research Directions.intelligence
2020‑12‑20	2020 Updates and Strategy: senior staff judge that the 2017–2019 non‑public‑facing research “has, at this point, largely failed”; MIRI enters a state of regrouping and strategy search.	2020 Updates and Strategy.intelligence
2022‑06‑10	AGI Ruin: A List of Lethalities: public articulation that alignment is extremely difficult and likely to fail; still endorses need for a “pivotal act” but with very little hope of achieving it under current dynamics.	AGI Ruin.intelligence+1
2023 (described 2024‑01‑03)	Major strategy shift to policy and communications as higher priority than research: goal is to increase odds of an international agreement to halt progress toward smarter‑than‑human AI, while still doing some technical and supporting research.	MIRI 2024 Mission and Strategy Update.intelligence+2

How this lines up with the LessWrong post and comments

Starting from the post and comments you linked:

The claim that “MIRI’s circa‑2013 plan [was] to build a world‑altering Friendly AI” is well supported by:
- The original mission (“Create a Friendly, self‑improving AI”).timelines.issarice
- Eliezer’s 2011 statements that SI’s mission is literally to build Friendly AI and that “someone has to do it… ‘Someone’ is us.”lesswrong
- Luke’s 2013 remark that Eliezer usually talks about MIRI in particular (or a functional equivalent) creating FAI.intelligence
- The 2013 strategy conversation, which explicitly dismisses the idea of merely influencing “a random AGI project 30 years from now,” in favor of gathering a community that “just [goes and does] it” with safety baked in from the ground up.openphilanthropy
The “pivotal act” and “Task AGI” thinking that comes up in the comments corresponds closely to what MIRI formalizes in 2017 as “minimal aligned AGI” used for “AGI‑empowered technological development” to end the acute risk period.lesswrong+2
The comment‑thread reference to a 4–5‑year confidential research period matches the 2017–2018 launch of nondisclosed‑by‑default, engineering‑heavy research and the 2020 admission that this push had largely failed.intelligence+1
The observation that MIRI later pivoted away from “build FAI ourselves” because alignment turned out to be too hard is consistent with:
- The 2020 regrouping post.intelligence
- The strongly pessimistic tone of AGI Ruin (2022).intelligence
- The 2023 pivot toward policy and communication, treating large‑scale AI pause as central and technical alignment research as increasingly unlikely to arrive in time absent that pause.effectivealtruism+2

The main point of disagreement in the LessWrong discussion is about phrasing like “MIRI’s plan, to build a Friendly AI to take over the world,” which some participants (e.g. Habryka) see as overstating how explicitly “take over the world ourselves” was ever adopted as official organizational strategy, versus “someone should do something world‑altering, and maybe that someone is us or a very similar team.”lesswrong

On the factual side, however, the evidence is fairly clear that:

For roughly its first 10–15 years, SIAI/MIRI saw itself as a small group that might directly build the first Friendly superintelligence (initially a sovereign CEV‑like system, later a minimal Task‑AGI for a pivotal act).
Around 2013 it professionalized and narrowed to technical FAI work while still effectively planning to be (or seed) that project.
From 2017 onward it concentrated on an internal, engineering‑heavy attempt to find new foundations for alignable AI.
By 2020–2023 it concluded that this had failed and pivoted to policy and communication, with technical work now secondary and survival odds seen as hinging largely on halting or drastically slowing frontier AI.

That is the best evidence‑backed reconstruction of SIAI/MIRI’s strategic trajectory given the post, its links, and MIRI’s own later retrospectives.

Follow-up answer to “Can you dig deeper, for example into LessWrong posts/comments, to find out when Eliezer or MIRI first started thinking or talking about task AGI?”

Based on a deeper analysis of LessWrong posts, Arbital edit histories, and MIRI archives, the concept of Task AGI (and the associated shift to “pivotal acts” rather than immediate “Sovereign/CEV”) emerged earlier than the 2017 strategy update.

The specific “origin date” for the term and formalized concept is June 9, 2015.

Key Dates and Evidence

June 9, 2015 (The “Smoking Gun”): Eliezer Yudkowsky created the page “Task-directed AGI” on Arbital on this exact date.
- Evidence: Metadata from Arbital scrapes explicitly list the page as “by Eliezer Yudkowsky Jun 9 2015 updated Mar 25 2017.”arbital.obormot
- Significance: This proves the concept was being actively formalized in mid-2015, fully two years before it became the centerpiece of the 2017 public strategy.
May 24, 2016: The Arbital page for the “Do-What-I-Mean hierarchy” was created, which integrated “Task AGI” into a broader safety framework. This confirms that by 2016, MIRI had a developed internal language around “Task AGI” vs “Sovereign AGI” that was distinct from their earlier “Tool AI” discussions.arbital
2014 (The Intellectual Catalyst): The term likely crystallized as a response to Nick Bostrom’s book Superintelligence (published July 2014).
- Context: Bostrom popularized the typology of “Oracle, Genie, Sovereign, Tool.”
- Connection: The Arbital definition of Task AGI explicitly references Bostrom: “An advanced AI that’s meant to pursue a series of limited-scope goals… In Bostrom’s terminology, a Genie.”arbital.obormot
- Pivot Logic: MIRI had previously been skeptical of “Tool AI” (arguing that passive tools are unsafe). “Task AGI” appears to be their 2015 attempt to construct a safe version of a Genie/Tool—one that is agentic enough to be useful (perform a pivotal act) but limited in scope (“corrigible” and “mildly optimizing”) to avoid taking over the future.

Summary of the Evolution

The timeline of MIRI’s internal pivot to “Task AGI” can be reconstructed as:

2013–2014: Debate over “Tool AI” vs. “Oracle AI.” Eliezer is critical of the idea that AI can just be a “passive tool” (see Holden Karnofsky’s Singularity Institute Objection, 2012–2014).
July 2014: Bostrom’s Superintelligence provides new vocabulary (Genie vs. Sovereign).
June 2015: Eliezer formally documents “Task-directed AGI” on Arbital. The strategic hope shifts from “Build CEV immediately” to “Build a Task AGI to perform a pivotal act.”
2016: Internal consensus solidifies. MIRI researchers (like Eli Tyre and Nate Soares) begin treating “Task AGI for a pivotal act” as the standard plan. This is the “intentional switch around 2016″ mentioned in the comments.
Nov 2017: The strategy is officially announced to the public in the 2017 Fundraiser and Strategy Update.

In short: While the public saw the pivot in 2017, the intellectual shift happened in mid-2015, likely triggered by the digestion of Bostrom’s 2014 book and the need for a more pragmatic “minimum viable” target than full CEV.

Wei Dai 6 Dec 2025 2:02 UTC
LW: 3 AF: 3
1
AF Parent

I think no one should build AGI. If someone is going to build AGI anyway, then it might be correct to make AGI yourself first, if you have a way to make actually aligned (hopefully task-ish or something).

If Eliezer or MIRI as a whole had said something like this, especially the first part “I think no one should build AGI.” while pursuing their plans, I would be more tempted to give them a pass. But I don’t recall them saying this, and a couple of AIs I asked couldn’t find any such statements (until after their latest pivot).

Also I wouldn’t actually endorse this statement, because because it doesn’t take into account human tendency/bias to think of oneself as good/careful and others as evil/reckless.

I’m still not sure where you’re getting this?

Eliezer claiming to have solved metaethics. Saying that he wouldn’t “flinch from” trying to solve all philosophical problems related to FAI by himself. (man, it took me 30-60 minutes to find this link) Being overconfident on other philosophical positions like altruism and identity.

If there was (by great surprise) some amazing pile of insights that made a safe Task-AGI seem feasible, and that stood up to comprehensive scrutiny (somehow), then it would plausibly be a good plan to actually do.

I would be more ok with this (but still worried about unknown unknowns) if “comprehensive scrutiny” meant scrutiny by thousands of world-class researchers over years/decades with appropriate institutional design to help mitigate human biases (e.g., something like academic cryptography research + NIST’s open/public standardization process for crypto algorithms). But nothing like this was part of MIRI’s plans, and couldn’t be because of the need for speed and secrecy.
- TsviBT 6 Dec 2025 5:06 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Ok. I think I might bow out for now unless there’s something especially salient that I should look at, but by way of a bit of summary: I think we agree that Yudkowsky was somewhat overconfident about solving FAI, and that there’s a super high bar that should be met before making an AGI, and no one knows how to meet that bar; my guess would be that we disagree about
  1. the degree to which he was overconfident,
  2. how high a bar would have to be met before making an AGI, in desperate straits.
  - Wei Dai 6 Dec 2025 5:42 UTC
    LW: 3 AF: 3
    1
    AF Parent
    Definitely read the second link if you haven’t already (it’s very short and salient), but otherwise, sure.
    - TsviBT 6 Dec 2025 5:45 UTC
      LW: 2 AF: 1
      0
      AF Parent
      (I did read that one; it’s interesting but basically in line with how I think he’s overconfident; it’s possible one or both of us is incorrectly reading in / not reading in to what he wrote there, about his absolute level of confidence in solving the philosophical problems involved.)
      - Wei Dai 6 Dec 2025 6:02 UTC
        LW: 3 AF: 3
        1
        AF Parent
        Hmm, did you also read my immediate reply to him, where I made the point “if you’re the only philosopher in the team, how will others catch your mistakes?” How to understand his (then) plan except that he would have been willing to push the “launch” button even if there were zero other similarly capable philosophers available to scrutinize his philosophical ideas?
        TsviBT 6 Dec 2025 6:56 UTC
        LW: 5 AF: 4
        3
        AF Parent
        (Also just recording that I appreciate the OP and these threads, and people finding historical info. I think the topic of how “we” have been going wrong on strategy is important. I’m participating because I’m interested, though my contributions may not be very helpful because
        
        I was a relative latecomer, in that much of the strategic direction (insofar as that existed) had already been fixed and followed;
        I didn’t especially think about strategy that much initially, so I didn’t have many mental hooks for tracking what was happening in the social milieu in terms of strategic thinking and actions.)
        TsviBT 6 Dec 2025 6:17 UTC
        LW: 3 AF: 1
        −1
        AF Parent
        (Oh I hadn’t read the full thread, now I have; still no big update? Like, I continue to see him being seemingly overconfident in his ability to get those solutions, but I’m not seeing “oh he would have mistakenly come to think he had a solution when he didn’t”, if that’s what you’re trying to say.)