hauspost

Karma: 23

I run AI transformation work in Hamburg and have long been pondering the significance of AI alignment for humanity. AI 2027 resonated deeply with me—it was the first time some of my thoughts and worries were translated into concrete, evidence-based and falsifiable predictions. So I built the AI 2027 Tracker (https://ai2027-tracker.com) to keep score: 53 predictions extracted, tracked individually and updated weekly.

I believe AI is potentially the last human invention. I also believe the gap between “taking this seriously” and “not taking this seriously” will determine which companies, institutions, and societies make it. I want to help close that gap.

hauspost 11 Jun 2026 22:36 UTC
0 points
0
in reply to: Baybar’s comment on: Checking in on AI-2027
Good question. I should probably have been more precise: I don’t think all capability claims are behind, and I agree that some headline benchmark/revenue claims now look broadly on time or even stronger than expected.
The places I had in mind were more specific:
1. SWE-bench timing/comparability. The 85% numerical threshold now looks plausibly crossed in self-reported/leaderboard terms, but it arrived roughly 10-12 months after the mid-2025 target and comparability across scaffolding/eval setups is messy.
2. RE-Bench / AI R&D engineering. I have not seen a clean published 1.3+ RE-Bench result. METR time-horizon evidence is very encouraging, but I would not treat it as equivalent to the specific research-engineering benchmark target.
3. R&D productivity multiplier. This is the big one for me. The evidence for AI being useful inside AI labs is strong, but a clean public demonstration of a 1.5x AI R&D multiplier still seems missing. This is also where the authors’ later timeline revisions seem most relevant.
4. Training compute scale. I don’t treat this as a current falsification, since the 10^28 FLOP run is really a 2027 completion claim, but public estimates still look meaningfully below the aggressive compute path.
On Cybench and OSWorld specifically, I’m less confident saying “behind.” OSWorld’s 65% target looks basically confirmed, just late; the 80% early-2026 target is the part I’d still watch. Cybench also looks much stronger after the newer Mythos/Opus results, though I still care about subset/system-card vs uniform public eval issues.
So my shorter answer is: if “capabilities” means the broad direction of benchmark movement, I agree things look broadly on time. If it means the specific chain from benchmark scores → reliable long-horizon work → AI R&D acceleration, I think the evidence is still mixed, and some key claims are late or not yet cleanly demonstrated.

hauspost 2 Jun 2026 21:19 UTC
3 points
0
on: Checking in on AI-2027
I’m scanning how people on LessWrong have been tracking AI-2027 - it’s a bit of a hobby of mine as well.
So I loved this post and the discussion below it. Reading it eight months later, what stands out to me is how well many of the comments aged, especially the disagreement over whether “a month or two late” is a small miss or an early sign that the whole trajectory stretches out.
I have been tracking the AI 2027 predictions more systematically since then, across 53 separate claims. My current take is that both sides of that discussion got something important right.
I think the raw capability benchmarks look more mixed than the optimistic reading in the post. SWE-bench and OSWorld-style results did move quickly, but the benchmark comparability issues raised in the comments have become pretty obvious over the past months. Test-time compute, benchmark revisions, and the gap between “score goes up” and “real work gets done” make me more cautious than I was when I first looked at these numbers.
At the same time, the thread’s focus on METR task lengths now looks very well placed. That is still the signal I would watch most closely. The recent METR evidence seems more important to me than any isolated benchmark jump, because it speaks more directly to the question of whether agents are becoming useful over longer, messier work horizons—and this is core to the takeoff dynamics that make or break the scenario down the road.
As for what has changed since the discussions here—the pattern that surprised me most is that the security and governance side seems to be arriving earlier than the clean capability story. The AI 2027 scenario often presents the scary security consequences as downstream of very strong agents. In my tracker, the order looks messier. Cyber capability signals, lab-security pressure, government contracting, and model-control concerns are becoming concrete while some of the headline capability milestones are still late or ambiguous.
That has changed how I read AI 2027. Contrary to the argument in the article—I no longer look at the overall timeline that much. The more interesting insight may be which parts of the scenario arrive before the supposed prerequisites, and which alternative paths may open up to achieve the mother of all predictions: the AI R&D feedback loop. (The authors themselves shifted back the median predictions by 3-5 years largely due to less confidence in this one prediction)
I recently wrote up my view here on LessWrong—so if you commented on this article and want to catchup on what’s happened since, I invite you to pick up the discussion again!
The TLDR: I think AI 2027 is behind on several capability and compute claims, but still unusually useful as a structured map of the pressures now showing up in reality. The parts I would watch hardest are METR task horizons, real AI R&D productivity, and security incidents that look like side effects of general code/reasoning progress.

AI 2027 Tracker: One Year of Predictions vs. Reality

hauspost21 Apr 2026 1:47 UTC

23 points

0 comments3 min readLW link

(ai2027-tracker.com)

hauspost

AI 2027 Tracker: One Year of Pre­dic­tions vs. Reality

AI 2027 Tracker: One Year of Predictions vs. Reality