Why do you believe the loss goes down “semi-uselessly”? By now I think it’s generally accepted as conventional wisdom that RL, including RLVR, doesn’t create much deep knowledge in the LLM (which, I guess, would theoretically be unexpected, given the small magnitude of its updates) but rather converts it from just token-prediction into practical, economically valuable skills. As an example, see pages 2, 3 and 21 of the Composer 2 technical report from March: https://cursor.com/resources/Composer2.pdf
Petropolitan
Petropolitan’s Shortform
Sorry, I made a mistake in this claim, as “the Chicago Pile” was not actually the first experimental pile to gather data on reactor construction. I should have written “the nuclear piles of 1939-1941″ instead, 2021 article An inter-country comparison of nuclear pile development during World War II (PDF)[1] describes their history and design. 1948 Manhattan District History conveniently summarizes state of the art knowledge in 1940 in the book about research on reactors:
An average of one to three high-speed neutrons are released in the fission of a uranium nucleus.
These fast neutrons can be slowed down or “moderated” to the speeds of gas molecules at ordinary temperatures by elastic collisions with relatively inert atoms such as carbon, helium, or hydrogen.
Fast neutrons cause fission in uranium-235 and uranium-238. However, slow neutrons cause fission of U-235 but do not cause fission of U-238; instead, they react with U-238 to form transuranic elements, neptunium and plutonium.
Fission of thorium and protoactinium, two other heavy elements, is caused only by fast neutrons.
Extremely high kinetic energy is imparted to the fission fragments, which are identified as radioactive isotopes of elements with atomic masses approximately half the mass of the uranium atom.
This is the state of the knowledge which I actually had in mind. Indeed this does includes the general understanding of criticality and chain reaction but stops short of control rods, understanding of impurities etc. I striked out the incorrect words in my comment above and corrected myself in two footnotes.
Do you believe this state of understanding is more, less or roughly comparable to the contemporary knowledge in AI alignment in general or LLM alignment specifically? Because this is why we are having this debate: to use history as an analogy.
- ^
BTW, the same author published two books, The History and Science of the Manhattan Project and The Physics of the Manhattan Project, which might be of interest for you if you enjoyed Rhodes (and they are available on pirate websites)
The reason why I wanted to limit to major branches of science and engineering was that evaluating thousands of technologies on the scale of “first-principles theoretical insights vs. scientific trial and error” is a large-scale research project while few dozens of branches of science with one or two related major technologies can be investigated manually, and finding one counterexample among few dozens is a stronger falsification than finding one among 100+, which is still stronger than finding one among thousands.
I am not sure why you believe stealth aircraft is so important, as we see that even countries which have this aircraft prefer to use cruise missiles, drones and conventional jets as much as possible and stealth planes are used for specialized missions if at all. As a speculative counterfactual, I would argue that neither the Russo-Ukrainian War nor the War in Iran would have gone much differently if stealth was never invented.
The 3 OOMs shape, 1 OOM absorption is interesting, especially since you seem to base it upon one expert, the exact competences of whom you don’t specify. Have you tried to locate any sources in the public literature, or ask the expert on how did they estimate it?
Let me acknowledge that I could have worded my comments in this thread better and less ambiguously. I believe the term of “narrow understanding” corresponds well to the current state of affairs in AI alignment while “developed theory” corresponds poorly, which is why I have contrasted them. You might choose other words but I believe you need some kind of opposition to use it as an analogy for the purpose of the post above.
History of relativity is very well described, it began as a solution to inconsistencies uncovered by the experiments and there were plenty of unsuccessful and largely forgotten predecessors in similar spirit, see https://en.wikipedia.org/wiki/History_of_special_relativity
The nuclear bomb is technically a major technology, the branch of physics is nuclear physics while the engineering field is nuclear engineering and both were built with scientific trial and error. There was no developed theory of reactors when the
Chicago Pile was[1] constructed, it was the other way round. Only then the information gathered on the firstreactor[2] was applied to nuclear bomb design. For example, to design the bomb you need cross-sections of nuclear reactions and the average numbers of neutrons which is not derivable from the first principles.Stealth aircraft are not even a major technology, it’s just one of many thousands of technologies in our modern life which has neither a corresponding branch of science nor a field of engineering. The main limiting factor in developing stealth airplanes were materials: your essay seems to falsely imply that F-117 doesn’t use radar-absorbent materials and ignores the fact the geometry of modern stealth fighters generally have no more flat surfaces than modern non-stealth fighters. And needless to say, there was plenty of scientific trial and error in developing these materials.
And the “atheoretical” trial and error has not really been used since the 18th century, so applying it to my comment when I specifically said “informed by narrow understanding” seems a strawman fallacy. In these cases there is always some kind of vague theoretical understanding what might happen (i. e., if you put large enough amounts of enriched uranium together it might initiate a chain reaction but what amount at what enrichment and whether it will stop on its own one doesn’t know unless experiments are being made)
I know about this counterexample because GR is known in history of physics as the exception to this rule, and it’s the main reason why I used the word “usually” in the quote.
I didn’t consider general relativity as “a major branch of science” when writing this sentence and didn’t expect someone would argue otherwise. It’s an important theory (or even theoretical framework) in a branch of physics called relativity, which was itself founded based on experimental findings from the turn of the century.
Now I looked up in some dictionary and understood that people tend to call branch every field of study which has its research departments (should have done it earlier and add this caveat). Under this sensu lato definition the number of such branches is over 100, and 99% of them was born in the way I described.
No, 5.5 can’t be derived from 4.5 because they use wildly different image tokenizers: 4.5 use exactly that of 4o while 5.5 has a variation of a newer architecture introduced on 5.1 in October 2025! https://developers.openai.com/api/docs/guides/images-vision
(Also it’s now quite hard for me to imagine 5.1 adapted from 5 with such a difference)
Thanks again for that (and the hat tip), I will link to your post in the future when this topic comes up again! As a side note, I would have laid out that section as a postscript because that’s what it basically is ;-)
Thanks a lot (and big thanks to Helen Toner!), do you think you could add AI capable of a takeover, or rather a civilization-ending failed takeover attempt (as a lower and more practical bar) to the post?
they may be smart, agentic, and competent enough to be takeover-capable themselves
Might we just put aside AGI and ASI buzzwords and use “takeover-capable AI” as a go-to term instead? I believe it’s of secondary importance whether a takeover-capable AI system is “general” (whatever that means) and/or “superhuman”
Maybe, but a literal reading would assign Mythos Preview to “other Antropic models” and thus allow it to be used both by the US Government as well as a few non-American Glasswing partners
Doesn’t the Glasswing use Mythos Preview not Mythos 5?
There’s an argument you have likely encountered at least once that the empirical work on non-superintelligent alignment will be useful for aligning ASI (in Yudkowskian sense) as well, and since any human coordination is imperfect and we can only delay the development of the latter for a limited amount of time, this is the only realistic way to go.
Also, I’m pretty sure very few people in the field back at the time understood the “no historical track record” part. Seems likely to be a selection effect: the people who did probably abstained from entering AI safety in the first place
To the contrary, rockets are an excellent example confirming my thesis! Actually, the first rockets were built empirically in ancient China, and Early Modern Indians developed them into such an effective weapon that British copied them, improved them and literally burned Copenhagen with just ~300 of them in 1807. So all the theoretical insights on rocket ballistics in the early 20th century were built upon the military research of the 19th century.
If you have better counterexamples, please present them
Since at least the 18th century, every single new science and every major branch of sciences (not to speak of engineering itself and its branches) was build by “trial and error informed by a narrow understanding” (sometimes called “scientific trial and error”) and not some abstract theoretical insights which usually came much later.
In retrospect this assumption that AI would somehow be the other way round, in my opinion, looks quite silly
I have not seen any research backing this claim up, have you?
A very interesting paper quantifying the conventional wisdom that a large degree of progress is from inference scaling.
You have got access to GPT-3 and 3.5 from OpenAI, but have you tried to get the same for now publicly unavailable Claude models for a better coverage of the first half of 2025?
I liked your post, and particularly the section 5a a lot, and strongly upvoted it. However today I read somewhat of a counter-essay by Dwarkesh Patel: https://www.dwarkesh.com/p/the-sample-efficiency-black-hole It’s a good piece which I think I could recommend reading to all readers of this blogpost.
In particular, the counterargument I found strongest was about scaling:
The way the scaling law equations work is that parameter and data terms are added to the loss independently. If you have a model that is trained compute optimally, and suppose you ask, well what if I just wanna maximize sample efficiency and use less data—and I’ll throw in as many parameters as it takes to make that happen. With the constants from the Chinchilla scaling laws paper (and the nature of the result wouldn’t change even with different constants), even if you increased the number of parameters by infinity, that would only decrease by a factor of ~10 the amount of data you need in order to keep the same loss. Humans are somewhere between thousands to millions of times more sample efficient than these models. Scaling of current models simply can’t make up for that discrepancy. This really does suggest that humans are on a different scaling curve altogether.
I tried to refute that with some Fermi estimates but ran into difficulties regarding interconversions between bytes and tokens. Mathematically the logic is sound. I wonder what would you answer?
Or it’s a different quant, or the initial pricing included a big surcharge for the lack of anything even roughly comparable at the time (similar to o1, which probably wasn’t different in size from 4o or o3 despite almost an OOM cost difference), or they used to objectively lack hardware to serve it and the price reflected that
Epoch’s own research on this topic actually seems to correspond very closely to Artificial Analysis https://epoch.ai/data-insights/open-closed-eci-gap