Person

Karma: 129

Person 20 Nov 2025 0:26 UTC
10 points
−1
in reply to: LWLW’s comment on: LWLW’s Shortform
It’s happened before, see Reflexion (I hope I’m remembering the name right) hyping up their supposed real time learner model only for it to be a lie. Tons of papers overpromise and don’t seem to get lasting consequences. But yeah I also don’t know why Intology would be lying, but the fact there’s no paper and that their deployment plans are waitlist-based and super vague (and the fact no one ever talks about zochi despite their beta program being old by this point) means we likely won’t ever know. They say they plan on sharing Locus’ discoveries “in the coming months”, but until they actually do there’s no way to verify past checking their kernel samples on GitHub.
For now I’m heavily, heavily skeptical. Agentic scaffolds don’t usually magically 10x frontier models’ performance, and we know the absolute best current models are still far from RE-Bench human performance (per their model cards, in which they also use proper scaffolding for the benchmark).

Person 19 Nov 2025 23:36 UTC
3 points
0
in reply to: LWLW’s comment on: LWLW’s Shortform
Per its LinkedIn it’s a tiny 2-10 member lab. Their only previous contribution was Zochi, a model for generating experiments and papers, one seemingly being accepted into ACL 2025. But there’s barely any transparency on what their model actually is, even on their technical report.
I personally see red flags with Intology too, main one being that such a performance form a tiny lab is hard to believe. On RE-Bench they compare against Sonnet 4.5, which has the best performance thus far per its model card, so them achieving superhuman results seems strange. Then there’s the fact there seems to be no paper as it’s their early results, the fact these results are all self-reported with minimal verification (a single Tsinghua student checked the kernels), and we have no technical details on the system itself or even what the underlying model is.
Another smaller lab with seemingly big contributions I can think of would be Sakana AI,but even they have far more employees and much more contributions + actual detailed papers for their models. And even they had an issue at one point where their CUDA Engineer system reported a 100x CUDA speedup that turned out to be cheating. Here Intology claims to get 20x-100x speedups like candy.

Person 26 Jul 2025 22:00 UTC
1 point
0
in reply to: mishka’s comment on: AlphaGo Moment for Model Architecture Discovery (arXiv)
Thanks for the link, will add it to the post. I originally included just the arXiv pdf viewer link for it, not sure what happened for it to be gone

AlphaGo Moment for Model Architecture Discovery (arXiv)

Person26 Jul 2025 21:31 UTC

8 points

4 comments1 min readLW link

Person 19 Jul 2025 15:38 UTC
14 points
4
in reply to: Garrett Baker’s comment on: OpenAI Claims IMO Gold Medal
Don’t have the link, but it seems DeepMind researchers on X have tacitly confirmed they had already reached gold. What we don’t know is whether it was done with a general LLM like OAI or a narrower one.

Person 15 Jun 2025 21:45 UTC
1 point
0
in reply to: Vladimir_Nesov’s comment on: Vladimir_Nesov’s Shortform
Do you have specific predictions/intuitions regarding the feasibility of what you describe and how strong the feedback loop could be?
Your post being about technical AI R&D automation capabilities kind of immediately made me curious about the timelines, since they’re where I’m somewhat worried.
Also, would Sakana AI’s recent work on adaptative text-to-LORA systems count towards what you’re describing^

Self-Adapting Language Models (from MIT, arXiv preprint)

Person13 Jun 2025 13:08 UTC

5 points

1 comment1 min readLW link

Person 2 Jun 2025 20:55 UTC
1 point
0
in reply to: Vladimir_Nesov’s comment on: Absolute Zero: Alpha Zero for LLM
Thank you for the quick reply.

Person 2 Jun 2025 20:04 UTC
1 point
0
in reply to: Vladimir_Nesov’s comment on: Absolute Zero: Alpha Zero for LLM
That paper is being contradicted by this new NVIDIA paper that shows the opposate using a 1.5B distill of DeepSeek R1. I don’t have much technical knowledge, so a deep dive by someone more knowledgeable would be appreciated, especially in comparison to the Tsinghua paper.

Person 14 May 2025 23:38 UTC
1 point
0
in reply to: Cole Wyeth’s comment on: Cole Wyeth’s Shortform
Heads up: I am not an AI researcher or even an academic, just someone who keeps up with AI
But I do have quick thoughts as well;
Kernel optimization (which they claim is what resulted in the 1% decrease in training time) is something we know AI models are great at (see RE-Bench and the multiple arXiv papers on the matter, including from DeepSeek).
It seems to me like AlphaEvolve is more-or-less an improvement over previous models that also claimed to make novel algorithmic and mathematical discoveries (FunSearch, AlphaTensor) notably by using better base Gemini models and a better agentic framework. We also know that AI models already contribute to the improvement of AI hardware. What AlphaEvolve seems to do is to unify all of that into a superhuman model for those multiple uses. In the accompanying podcast they give us some further information:
- The rate of improvement is still moderate, and the process still takes months. They phrase it as an interesting and promising area of progress for the future, not as a current large improvement.
- They have not tried to distill all that data into a new model yet, which seems strange to me considering they’ve had it for a year now.
- They say that a lot of improvements come from the base model’s quality.
- They do present the whole thing as part of research rather than a product
So yeah I can definitely see a path for large gains in the future, thought for now those are still on similar timetables as per their own admission. They expect further improvements when base models improve and are hoping that future versions of AlphaEvolve can in turn shorten the training time for models, the hardware pipeline, and improve models in other ways. And for your point about novel discoveries, previous Alpha models seemed to already be able to do the same categories of research back in 2023, on mathematics and algorithmic optimization. We need more knowledgeable people to weight in, especially to compare with previous models of the same classification.
This is also a very small thing to keep in mind, but GDM models don’t often share the actual results of their models’ work as usable/replicable papers, which has caused experts to cast some doubts on results in the past. It’s hard to verify their results, since they’ll be keeping them close to their chests.

Person 17 Apr 2025 16:33 UTC
4 points
0
in reply to: Daniel Kokotajlo’s comment on: AI 2027: What Superintelligence Looks Like
Thanks for the clarification.
Side question, but you had recently moved your AGI median from 2027 to 2028 after updating on Grok 3 and GPT-4.5. Has this changed, especially with Gemini 2.5 and o3/o4-mini + these new METR datapoints?

Person 14 Dec 2023 17:09 UTC
1 point
0
on: Person’s Shortform
Google DeemMind’s recent FunSearch system seems pretty important, I’d really appreciate people with domain knowledge to disect this:
Blog post: https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/
Paper: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/Mathematical-discoveries-from-program-search-with-large-language-models.pdf
Large Language Models (LLMs) have demonstrated tremendous capabilities in solving complex tasks, from quantitative reasoning to understanding natural language. However, LLMs sometimes suffer from confabulations (or hallucinations) which can result in them making plausible but incorrect statements (Bang et al., 2023; Borji, 2023). This hinders the use of current large models in scientific discovery. Here we introduce FunSearch (short for searching in the function space), an evolutionary procedure based on pairing a pre-trained LLM with a systematic evaluator. We demonstrate the effectiveness of this approach to surpass the best known results in important problems, pushing the boundary of existing LLM-based approaches (Lehman et al., 2022). Applying FunSearch to a central problem in extremal combinatorics — the cap set problem — we discover new constructions of large cap sets going beyond the best known ones, both in finite dimensional and asymptotic cases. This represents the first discoveries made for established open problems using LLMs. We showcase the generality of FunSearch by applying it to an algorithmic problem, online bin packing, finding new heuristics that improve upon widely used baselines. In contrast to most computer search approaches, FunSearch searches for programs that describe how to solve a problem, rather than what the solution is. Beyond being an effective and scalable strategy, discovered programs tend to be more interpretable than raw solutions, enabling feedback loops between domain experts and FunSearch, and the deployment of such programs in real-world applications.

Person’s Shortform

Person14 Dec 2023 17:09 UTC

2 points

1 comment1 min readLW link

Person 6 Dec 2023 18:15 UTC
17 points
3
on: Google Gemini Announced
https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf
AlphaCode 2, which is powered by Gemini Pro, seems like a big deal.
AlphaCode (Li et al., 2022) was the first AI system to perform at the level of the median competitor in competitive programming, a difficult reasoning task involving advanced maths, logic and computer science. This paper introduces AlphaCode 2, a new and enhanced system with massively improved performance, powered by Gemini (Gemini Team, Google, 2023). AlphaCode 2 relies on the combination of powerful language models and a bespoke search and reranking mechanism. When evaluated on the same platform as the original AlphaCode, we found that AlphaCode 2 solved 1.7× more problems, and performed better than 85% of competition participants.
Seems important for speeding up coders or even model self-improvement, unless competitive coding benchmarks are deceptive for actual applications for ML training.

Person 23 Nov 2023 15:57 UTC
1 point
0
in reply to: Steven Byrnes’s comment on: Possible OpenAI’s Q* breakthrough and Google’s AlphaGo-type systems
I also think the thing in question is not in fact an extremely important breakthrough that paves the path to imminent AGI anyway
Could you explain this assessment please? I am not knowledgeable at all on the subject, so I cannot intuit the validity of the breakthrough claim.

Person 17 Nov 2023 20:56 UTC
15 points
0
in reply to: Max H’s comment on: Sam Altman fired from OpenAI
I couldn’t remember where from, but I know that Ilya Sutskever at least takes x-risk seriously. I remember him recently going public about how failing alignment would essentially mean doom. I think it was published as an article on a news site rather than an interview, which are what he usually does. Someone with a way better memory than me could find it.
EDIT: Nevermind, found them.

Person 5 Oct 2023 20:06 UTC
2 points
1
on: This anime storyboard doesn’t exist: a graphic novel written and illustrated by GPT4
How the AI can give new abilities to humans (the author of this post is incapable of writing novels or making paintings, yet here we are).
(Not a serious comment, just a passing remark)
At the point where the AI is making every step of it and the human has barely any actual contribution, I’m curious to see whether the standard for “artistic ability” will be loosened or if the pendulum will swing the other way, where artistic worth will have a bigger basis in craft, skill and effort, which (my intuition) seems like how artistic worth was determined back in the Renaissance for example.

Person 28 Sep 2023 4:03 UTC
1 point
0
on: Investigating the rumors of OpenAI achieving AGI
I am trying to see if it is true. I need other people to help me alongside.
The whole thing generated enough buzz that Sam Altmann himself debunked it in a reddit comment (fitting, he was CEO of reddit at one point after all).
People say that he made correct predictions in the past.
His past predictions are either easily explained by a common trick used by sports fans on twitter, or have very shaky evidence for them since he keeps deleting his posts every few months, leaving us with 3rd party sources. Also, I wouldn’t a priori consider “GPT-5 finished training in October 2022 with 125T parameters” a correct prediction.

Person 26 Sep 2023 1:53 UTC
1 point
0
in reply to: jacquesthibs’s comment on: jacquesthibs’s Shortform
Or that he was genuinely just making things up and tricking us for fun, and a cryptic exit is a perfect way to leave the scene. I really think people are looking way too deep into him and ignoring the more outlandish predictions he’s made (125T GPT-4 and 5 in October 2022), along with the fact there is never actual evidence of his accurate ones, only 2nd hand very specific and selective archives.

Person 25 Sep 2023 13:15 UTC
3 points
0
in reply to: jacquesthibs’s comment on: jacquesthibs’s Shortform
Predicting the GPT-4 launch date can easily be disproven with the confidence game. It’s possible he just created a prediction for every day and deleted the ones that didn’t turn out right.
For the Gobi prediction it’s tricky. The only evidence is the Threadreader and a random screenshot from a guy who seems clearly related to jim. I am very suspicious of the Threadreader one. On one hand I don’t see a way it can be faked, but it’s very suspicious that the Gobi prediction is Jimmy’s only post that was saved there despite him making an even bigger bombshell “prediction”. It’s also possible, though unlikely, that the Information’s article somehow found his tweet and used it as a source for their article.
What kills Jimmy’s credibility for me is his prediction back in January (you can use the Wayback Machine to find it) that OAI had finished training GPT-5, no not a GPT-5 level system, the ACTUAL GPT-5 in October 2022 and that it was 125T parameters.
Also goes without saying, pruning his entire account is suspicious too.

Person

AlphaGo Mo­ment for Model Ar­chi­tec­ture Dis­cov­ery (arXiv)

Self-Adapt­ing Lan­guage Models (from MIT, arXiv preprint)

Per­son’s Shortform

AlphaGo Moment for Model Architecture Discovery (arXiv)

Self-Adapting Language Models (from MIT, arXiv preprint)

Person’s Shortform