I think this is a plausible top submission of a term project for a team in an undergraduate computer architecture class in an elite university.
sanxiyn
Evolution also distinguishes between one and two progeny so it is not binary, but yeah, just a few bits per lifetime.
I am quite uninformed, but when I read about compute multipliers I considered it to obviously include data-related improvements. To quip, FineWeb-Edu was algorithmically filtered, it obviously wasn’t manually curated. As an evidence that it is not just my misunderstanding, I quote Dean W. Ball (my point is that it may well be my misunderstanding, but then such misunderstanding is common):
… Amodei describes this as a “compute multiplier”: … These gains come from all sorts of places: … improvements to training datasets that allow the model to learn more quickly …
OpenSSL is extremely widely used and it is hard to argue with OpenSSL CVEs. On the other hand, I am starting to suspect OpenSSL is a somewhat special case. My understanding is unfortunately for such a widely used codebase, OpenSSL codebase is not in a good state. Someone on Hacker News noted 0⁄12 of CVEs apply to BoringSSL (Google’s OpenSSL fork).
I have much less problems with curl CVEs and I think they are impressive.
The other lab leaders have not commented on the topic in public in 2025.
I don’t think this is true. Amodei on AI: “There’s a 25% chance that things go really, really badly”.
2025-08 update. Anthropic now defaults to (you can opt out) using your chats for AI training, see for example https://techcrunch.com/2025/08/28/anthropic-users-face-a-new-choice-opt-out-or-share-your-data-for-ai-training/
I think IMO results were driven by general purpose advances, but I agree I can’t conclusively prove it because we don’t know details. Hopefully we will learn more as time goes by.
An informal argument: I think currently agentic software engineering is blocked on context rot, among other things. I expect IMO systems to have improved on this, since IMO time control is 1.5 hours per problem.
I think non-formal IMO gold was unexpected and we heard explicitly that it won’t be in GPT-5. So I would wait to see how it would pan out. It may not matter in 2025 but I think it can in 2026.
I think it is important to note that Gemini 2.5 Pro Capable of Winning Gold at IMO 2025, with good enough scaffolding and prompt engineering.
Do you have any Solomonoff inductor you know? I don’t, and I would like an introduction.
Ethan Mollick’s Using AI Right Now: A Quick Guide from 2025-06 is in the same genre and pretty much says the same thing, but the presentation is a bit different and it may suit you better, so check it out. Naturally it doesn’t discuss Grok 4, but it also does discuss some things missing here.
Anthropic does have a data program, although it is only for Claude Code, and it is opt in. See About the Development Partner Program. It gives you 30% discount in exchange.
CloudMatrix was not, but Huawei Ascend has been there for a long time, and was used to train LLM even back in 2022. I didn’t realize AI 2027 predated CloudMatrix but I still think ignoring China for Compute Production was unjustified.
This is a good argument and I think it is mostly true, but this absolutely should be in AI 2027 Compute Forecast page. Simply not saying a word about the topic makes it looks unserious and incompetent. In fact, that reaction happened repeatedly in my discussion with my friends in South Korea.
I know cyber eval results are underelicitation. Sonnet 4 can find zero day vulnerabilities, we are now in process of disclosing. If you can’t get it to do that it’s your skill issue.
Preordered ebook version on Amazon. I am also interested in doing Korean translation.
I disagree on DeekSeek and innovation. Yes R1 is obviously a reaction to o1, but its MoE model is pretty innovative, and it is Llama 4 that obviously copied DeepSeek. But yes I agree innovation is unpopular in China. But from interviews of DeepSeek founder Liang Wenfeng, we know DeepSeek was explicitly an attempt to overcome China’s unwillingness to innovate.
Maybe we are talking about different problems, but we found instructing models to give up (literally “give up”, I just checked the source) under certain conditions to be effective.
Our experience so far is while reasoning models don’t improve performance directly (3.7 is better than 3.6, but 3.7 extended thinking is NOT better than 3.7), they do so indirectly because thinking trace helps us debug prompts and tool output when models misunderstand them. This was not the result we expected but it is the case.
Unhackable operating system for all practical purposes already exists: seL4. DARPA funded HACMS program to use seL4. It worked.
seL4 is formally verified for functional correctness. It means the implementation corresponds to the specification. This verification eliminates all implementation bugs. It does not eliminate specification bugs, but seL4 also proved integrity, availability, and confidentiality of OS. Definition of integrity/availability/confidentiality is small enough to be manually reviewed, and was in fact widely reviewed (including myself).
So there IS a single mechanism that can rule out nearly all of 30% of bugs left over after 70% of memory safety bugs are fixed. Formal verification for functional correctness is not particularly simple, but it is demonstrably possible as seL4 proved, and AI assistance will make formal verification proof engineering cheaper.