An "AI researcher" has written a paper on optimizing AI architecture and optimized a language model to several orders of magnitude more efficiency.

“CS-ReFT finetunes LLMs at the subspace level, enabling the much smaller Llama-2-7B to surpass GPT-3.5′s performance using only 0.0098% of model parameters.”
https://x.com/IntologyAI/status/1901697581488738322

The Moravec paradox is an observation that high-level reasoning is relatively simple for computers, while sensorimotor skills that humans find effortless are computationally challenging. This is why AI is superhuman at chess but we have no self driving cars. Evolutionarily recent developments such as critical thinking is easier for computers than older ones, because these recent developments are less efficient in humans, and they are computationally simpler anyway (much more straightforward to pick one good decision among many rather than to move a limb through space).

This is what fast takeoff looks like. The paper is very math heavy, and the solution is very intelligent. It is still very short. Why would it not be? If you can make something simple that works, that’s all it will take.

It reminds me of two papers, one written by Anthropic about interpretability in neural networks, where they used self supervised learning to discover the abstract representations in latent space, a technique that they later used to launch “Golden Gate Claude” wherein they made the language model was forced to include the Golden Gate Bridge in all of its inputs, and another paper called “The Platonic Representation Hypothesis” where the main hypothesis is that neural networks trained with different objectives on different data and modalities (like vision and language) are converging toward a shared statistical model of reality in their representation spaces, analogous to Plato’s concept of an ideal reality that underlies our sensations.

Anthropic interpretability research: https://claude.ai/chat/4400cbcc-66d6-4794-8c10-ad09efafa74c

The platonic representation hypothesis: https://arxiv.org/abs/2405.07987

The paper itself, as written by the AI researcher over the course of several days:

https://arxiv.org/abs/2503.10617

An “AI researcher” has written a paper on optimizing AI architecture and optimized a language model to several orders of magnitude more efficiency.