(Plus ongoing poor results on Pokémon, modern LLMs still can only win with elaborate task-specific scaffolding)
Though performance on the IMO seems impressive, the very few examples of mathematical discoveries by LLMs don’t seem (to me) to be increasing much in either frequency or quality, and so far are mostly of type “get a better lower bound by combinatorially trying stuff” which seems to advantage computers with or without AI. Also, again, even that type of example is rare, probably the vast majority of such attempts have failed and we only hear about a few successful ones, none of which seem to have been significant for any reason other than coming from an LLM.
I increasingly suspect a lot of the recent progress in LLMs has been illusory, from overfitting to benchmarks which may even leak to the training set (am I right about this?) and seeming useful, and METR is sufficiently good at their job that this will become apparent in task length measurements before the 8 hour mark.
I’m trying to make belief in rapid LLM progress pay rent, and at some point benchmarks are not the right currency. Maybe that point is “not yet” and we see useful applications only right before superintelligence etc. but I am skeptical of that narrative; at least, it does little to justify short timelines, because it leaves the point of usefulness to guesswork.
Are you looking for utility in all the wrong places?
Recent news have quite a few mentions of: AI tanking the job prospects of fresh grads across multiple fields and, at the same time, AI causing a job market bloodbath in the usual outsourcing capitals of the world.
That sure lines up with known AI capabilities.
AI isn’t at the point of “radical transformation of everything” yet, clearly. You can’t replace a badass crew of x10 developers who can build the next big startup with AIs today. AI doesn’t unlock all that many “things that were impossible before” either—some are here already, but not enough to upend everything. What it does instead is take the cheapest, most replaceable labor on the market, and make it cheaper and more replaceable. That’s the ongoing impact.
The “entry-level jobs” study looked alright at a glance. I did not look into the claims of outsourcing job losses in any more detail—only noted that it was claimed multiple times.
There’s this recent paper, see Zvi’s summary/discussion here. I have not looked into it deeply. Looks a bit weird to me. Overall, the very fact that there’s so much confusion around whether LLMs are or are not useful is itself extremely weird.
(Disclaimer: off-the-cuff speculation, no idea if that is how anything works.)
I’m not sure how much I buy this narrative, to be honest. The kind of archetypical “useless junior dev” who can be outright replaced by an LLM probably… wasn’t being hired to do the job anyway, but instead as a human-capital investment? To be transformed into a middle/senior dev, whose job an LLM can’t yet do. So LLMs achieving short-term-capability parity with juniors shouldn’t hurt juniors’ job prospects, because they weren’t hired for their existing capabilities anyway.
Hmm, perhaps it’s not quite like this. Suppose companies weren’t “consciously” hiring junior developers as a future investments; that they “thought”[1] junior devs are actually useful, in the sense that if they “knew” they were just a future investment, they wouldn’t have been hired. The appearance of LLMs who are as capable as junior devs would now remove the pretense that the junior devs provide counterfactual immediate value. So their hiring would stop, because middle/senior managers would be unable to keep justifying it, despite the quiet fact that they were effectively not being hired for their immediate skills anyway. And so the career pipeline would get clogged.
Maybe that’s what’s happening?
(Again, no idea if that’s how anything there works, I have very limited experience in that sphere.)
In a semi-metaphorical sense, as an emergent property of various social dynamics between the middle managers reporting on juniors’ performance to senior managers who set company priorities based in part on what would look good and justifiable to the shareholders, or something along those lines.
This is the hardest evidence anyone has brought up in this thread (?) but I’m inclined to buy your rebuttal about the trend really starting in 2022 which it is hard to believe comes from LLMs.
I don’t think it’s reasonable to expect such evidence to appear after such short period of time. There were no hard evidence that electricity is useful in a sense you are talking about until 1920s. Current LLMs are clearly not AGIs in a sense that they can integrate into economy as migrant labor, therefore, productivity gains from LLMs are bottlenecked on users.
I find this reply broadly reasonable, but I’d like to see some systematic investigations of the analogy between gradual adoption and rising utility of electricity and gradual adoption and rising utility of LLMs (as well as other “truly novel technologies”).
There is a difference between adoption as in “people are using it” and adoption as in “people are using it in economically productive way”. I think supermajority of productivity from LLMs is realized as pure consumer surplus right now.
My impression is that so far the kinds of people whose work could be automated aren’t the kind to navigate the complexities of building bespoke harnesses to have llms do useful work. So we have the much slower process of people manually automating others.
I want an option to filter for writing with zero LLM influence.
I do not trust LLMs and I am not sure how I feel about LLM / human collaboration. As systems become more powerful, I am worried that they may become a serious epistemic hazard, up to and including actually hacking my brain. I would like to be able to protect myself from this aggressively.
For that reason, I think that the current LW policy on LLM usage is insufficient. Every post that uses an LLM in any part of its production process whatsover should be flagged as such. Personally, I am currently willing to accept some LLM usage upstream of the writing I read and I would not routinely filter such posts out of my feed, but I would like the option to do so (which I would occasionally use as a sanity check) very aggressively and with no exceptions. Basically, an off-switch.
I would also like to be able to filter out any writing of which even a single word is LLM generated (except perhaps parenthetically). I think I would use this option routinely, though perhaps I would also like to exempt specific users (e.g. authors I have followed). But this softer option should allow consultation with LLMs, experiments with LLMs, etc.
I consider it epistemic violence that I occasionally discover, after the fact, that an LLM was used extensively in the writing process of a post.
I think extensive use of LLM should be flagged at the beginning of a post, but “uses an LLM in any part of its production process whatsoever” would probably result in the majority of posts being flagged and make the flag useless for filtering. For example I routinely use LLMs to check my posts for errors (that the LLM can detect), and I imagine most other people do so as well (or should, if they don’t already).
Unfortunately this kind of self flagging/reporting is ultimately not going to work, as far as individually or societally protecting against AI-powered manipulation, and I doubt there will be a technical solution (e.g. AI content detector or other kind of defense) either (short of solving metaphilosophy). I’m not sure it will do more good than harm even in the short run because it can give a false sense of security and punish the honest / reward the dishonest, but still lean towards trying to establish “extensive use of LLM should be flagged at the beginning of a post” as a norm.
“uses an LLM in any part of its production process whatsoever” would probably result in the majority of posts being flagged and make the flag useless for filtering. For example I routinely use LLMs to check my posts for errors (that the LLM can detect), and I imagine most other people do so as well (or should, if they don’t already).
My own data point: for the vast majority of my posts, there is zero LLM involved at any stage.
I recently, rather embarrassingly, made a post with a massive error which an LLM would have found immediately. I seriously misread a paper in a way that cut/pasting the paper and the post into Claude and asking “any egregious misreadings” would have stopped me from making that post. This is far too useful for me to turn down, and this kind of due diligence is +EV for everyone.
Yes mostly agree. Unless the providers themselves log all responses and expose some API to check for LLM generation, we’re probably out of luck here, and incentives are strong to defect.
One thing I was thinking about (similar to i.e—speedrunners) is just making a self-recording or screenrecording of actually writing out the content / post? This probably can be verified by an AI or neutral third party. Something like a “proof of work” for writing your own content.
If it became common to demand and check proofs of (human) work, there will be a strong incentive to use AI to generate such proofs, which doesn’t not seem very hard to do.
Maybe we want a multi-level categorization scheme instead? Something like:
Level 0: Author completely abstains from LLM use in all contexts (not just this post) Level 1: Author uses LLMs but this particular post was made with no use of LLM whatsoever Level 2: LLM was used (e.g. to look up information), but no text/images in the post came out of LLM Level 3: LLM was used for light editing and/or image generation Level 4: LLM was used for writing substantial parts Level 5: Mostly LLM-generated with high-level human guidance/control/oversight
This is an edge case, but just flagging that it’s a bit unclear to me how to apply this to my own post in a useful way. As I’ve disclosed in the post itself:
OpenAI’s o3 found the idea for the dovetailing procedure. The proof of the efficient algorithmic Kraft coding in the appendix is mine. The entire post is written by myself, except the last paragraph of the following section, which was first drafted by GPT-5.
Does this count as Level 3 or 4? o3 provided a substantial idea, but the resulting proof was entirely written down by myself. I’m also unsure whether the full drafting of precisely one paragraph (which summarizes the rest of the post) by GPT-5 counts as editing or the writing of substantial parts.
We need another “level” here, probably parallel to the others, for when LLMs are used for idea-generation, criticism of outlines, as a discussion partner et cetera. For instance, let’s say I think about countries that are below their potential in some tragic way, like Russia and Iran, countries with loads of cultural capital, educated population, that historically have lots going for them. Then I can ask an LLM “any other countries like that?” and it might mention, say, North Korea, Iraq and Syria, maybe Greece or Turkey or South Italy, with some plausible story attached to them. When I do this interaction with an LLM the end product is going to be colored by it. If I initially intended to talk about how Russia and Iran have been destroyed by some particular forms of authoritarianism, my presentation, hypothesis, or whatever, will likely be modified so I can put Greece and Iraq into the same bucket. This alters my initial thoughts and probably changes my though-generation process into a mold more-or-less shaped by the LLM, “hacking my brain”. When this happen across many posts, it’s likely to make writing homogenized not through writing style, but semantic content.
This example is kinda weak, but I think this is the kind of thing OP is worried about. But I’d be curious to hear stronger examples if anyone can think of them.
I use LLMs for basically anything substantial that I write. Like, a lot of my knowledge of random facts about the world is downstream of having asked LLMs about it. It would be IMO pretty dumb to write a post that is e.g. trying to learn from past social movement failures and not have an LLM look over it to see whether it’s saying anything historically inaccurate.
So I do think there needs to be some bar here that is not “LLMs were involved in any way”. I do share a bunch of concerns in the space.
Like, a lot of my knowledge of random facts about the world is downstream of having asked LLMs about it.
Uhhh… that seems maybe really bad. Do you sometimes do the kind of check which, if it were applied to The New York Times pre-AI, would be sufficient to make Gell-Mann Amnesia obvious?
Personally, the most I’ve relied on LLMs for a research project was the project behind this shortform in February 2025, and in hindsight (after reading up on some parts more without an LLM) I think I ended up with a very misleading big picture as a result. I no longer use LLMs for open-ended learning like that; it was worth trying but not a good idea in hindsight.
It would be IMO pretty dumb to write a post that is e.g. trying to learn from past social movement failures and not have an LLM look over it to see whether it’s saying anything historically inaccurate.
Do you then look over what the LLM has said and see whether it’s saying anything historically inaccurate, without using an LLM?
Not most of the time! Like, I sometimes ask multiple LLMs, but I don’t verify every fact that an LLM tells me, unless it’s a domain where I predict LLMs are particularly likely to hallucinate. I keep in mind that stuff is sometimes hallucinated, but most of the time it’s fine to know that something is quite probably true.
There’s no such thing as “a domain where LLMs are particularly likely to hallucinate”. In every domain there’s some obscure jagged boundary, not very far from normal standard questions to ask, where LLMs will hallucinate, usually plausibly to a non-expert.
To me, this sounds like you’re simply pushing the problem a little bit downstream without actually addressing it. You’re still not verifying the facts; you’re just getting another system with similar flaws to the first (you). You aren’t actually fact checking at any point.
That is not how bayesian evidence works. I am treating LLM output as somewhat less trustworthy than I would trust what a colleague of mine says, but not fundamentally different. I am skeptical that you spend your days double checking every conversation you have with another human. I also don’t think you should spend your days double checking every single thing an LLM tells you.
This feels kind of like the early conversations about Wikipedia where people kept trying to insist Wikipedia is “not a real source”.
I am treating LLM output as somewhat less trustworthy than I would trust what a colleague of mine says, but not fundamentally different.
If you’re asking a human about some even mildly specialized topic, like history of Spain in the 17th century or different crop rotation methods or ordinary differential equations, and there’s no special reason that they really want to appear like they know what they’re talking about, they’ll generally just say “IDK”. LLMs are much less like that IME. I think this is actually a big difference in practice, at least in the domains I’ve tried (reproductive biology). LLMs routinely give misleading / false / out-of-date / vague-but-deceptively-satiating summaries.
I agree the LLMs are somewhat worse, especially compared to rationalist-adjacent experts in specialized fields, but they really aren’t that bad for most things. Like I researched the state of the art of datacenter security practices yesterday, and I am not like 99% confident that the AI got everything right, but I am pretty sure it helped me understand the rough shape of things a lot better.
This seems fine and good—for laying some foundations, which you can use for your own further theorizing, which will make you ready to learn from more reliable + rich expert sources over time. Then you can report that stuff. If instead you’re directly reporting your immediately-post-LLM models, I currently don’t think I want to read that stuff, or would want a warning. (I’m not necessarily pushing for some big policy, that seems hard. I would push for personal standards though.)
Fwiw, in my experience LLMs lie far more than early Wikipedia or any human I know, and in subtler and harder to detect ways. My spot checks for accuracy have been so dismal/alarming that at this point I basically only use them as search engines to find things humans have said.
I am wondering whether your experiences were formed via the first generation of reasoning models, and my guess is you are also thinking of asking different kinds of questions.
The thing that LLMs are really great at is to speak and think in the ontology and structure that is prevalent among experts in any field. This is usually where the vast majority of evidence comes from. LLMs aren’t going to make up whole ontologies about how bankruptcy law works, or how datacenter security works. It might totally make up details, but it won’t make up the high-level picture.
Second, this has just gotten a lot better over the last 6 months. GPT-5 still lies a good amount, but vastly less than o1 or o3. I found o1 almost unusable on this dimension.
Datapoint: I’m currently setting up a recording studio at Lighthaven, and I am using them all the time to get guides for things like “how to change a setting on this camera” or “how to use this microphone” or “how to use this recording software”.
Yes, they confabulate menus and things a lot, but as long as I keep uploading photos of what I actually see, they know the basics much better than me (e.g. what bit rate to set the video vs the audio, where to look to kill the random white noise input I’m getting, etc).
I’d say they confabulate like 50% of the time but that they’re still a much more effective search engine for me than google, and can read the manual much faster than me. My guess is I simply couldn’t do some of the projects I’m doing without them.
It’s perfectly fine to have strong personal preferences for what content you consume, and how it’s filtered, and to express these preferences. I don’t think it’s cool to make hyperbolic accusations of violence. It erodes the distinctions we make between different levels of hostility that help prevent conflicts from escalating. I don’t think undisclosed LLM assistance can even be fairly characterized as deceptive, much less violent.
I don’t think it’s hyperbolic at all; I think this is in fact a central instance of the category I’m gesturing at as “epistemic violence.” For instance, p-hacking, lying, manipulation, misleading data, etc. If you don’t think that category is meaningful or you dislike my name for it, can you be more specific about why? Or why this is not an instance? Another commenter @Guive objected to my usage of the word violence here because “words can’t be violence” which is I think a small skirmish of a wider culture war which I am really not trying to talk about.
To be explicit (again) I do not in any way want to imply that somehow a person using an LLM without disclosing it justifies physical violence against them. I also don’t think it’s intentionally an aggression. But depending on the case, it CAN BE seriously negligent towards the truth and community truth seeking norms, and in that careless negligence it can damage the epistemics of others, when a simple disclaimer / “epistemic status” / source would have been VERY low effort to add. I have to admit I hesitate to say this so explicitly a bit because many people I respect use LLMs extensively, and I am not categorically against this, and I feel slightly bad about potentially burdening or just insulting them—generally speaking I feel some degree of social pressure against saying this. And as a result I hesitate to back down from my framing, without a better reason than that it feels uncomfortable and some people don’t like it.
The way I think about this is a bit more like “somehow, we need immune systems against arbitrary nuanced persuasion.” Which is for sure a very hard problem, but, I don’t think simple tricks of “check if LLM influenced” will turn out to be that useful.
I think at the very least you want more metadata about how the AI was used.
Something like “somehow automatically track metadata about how documents came to be and include it”, the way you might try to do with photography. (I guess the metaphor here is more like “have text documents automatically include info about what text entered via “paste” instead of by typing manually?”)
It tends to be bad (or at least costly) to have a rule that has the property that violations of the rule cannot reliably be detected, which leads to the question of how you propose to detect LLM-written content.
You can see the chat here. I prompted Claude with a detailed outline, a previous draft that followed a very different structure, and a copy of “The case for ensuring powerful AIs are controlled” for reference about my writing style. The outline I gave Claude is in the Outline tab, and the old draft I provided is in the Old draft tab, of this doc.
As you can see, I did a bunch of back and forth with Claude to edit it. Then I copied to a Google doc and edited substantially on my own to get to the final product.
And I don’t think he did anything against lesswrong rules, or anything immoral really, but I still really don’t like it.
If it was up to me, we’d have a rule that every single word in your post should either be physically typed by you, or be in quotation marks.
So its fine if you copy your article into some AI and ask it to fix grammar mistakes, as long as you go and fix them yourself.
Its also fine to have a fair bit of LLM involvement in the post, even conceptual stuff and writing, as long as the finished product is typed up by you.
What way I know every single word has at least passed through the brain of the author.
UPDATE: the blog post I referred to from Scott Aaronson has now been updated to reflect that A) the trick from GPT-5 “should have been obvious” but more importantly a human has already come up with a better trick (directly replacing the one from GPT-5) which resolves an open problem left in the paper. To me, this opens the possibility that GPT-5 may have net slowed down the overall research process (do we actually want papers to be written faster, but of possibly-lower quality?) though I would guess it was still a productivity booster. Still, I am less impressed with this result than I was at first.
To be clear, I wasn’t talking about the kind of weak original insight like saying something technically novel and true but totally trivial or uninteresting (like, say, calculating the value of an expression that was maybe never calculated before but could be calculated with standard techniques). Obviously, this is kind of a blurred line, but I don’t think it’s an empty claim at all: an LLM could falsify my claim by outputting the proof of a conjecture that mathematicians were interested in.
At the time, IMO no one could come up with a convincing counter example. Now, I think the situation is a lot less clear, and it’s very possible that this will in retrospect be the LAST time that I can reasonably claim that what I said holds up. For instance, GPT-5 apparently helped Scott Aaronson prove a significant result: https://scottaaronson.blog/?p=9183#comments
This required some back and forth iteration where it made confident mistakes he had to correct. And, it’s possible that this tiny part of the problem didn’t require original thinking on its own.
However, it’s also possible that I am actually just on copium and should admit I was wrong (or at least, what I said then is wrong now). I’m not sure. Anything slightly more convincing than this would be enough to change my mind.
I’m aware of various small improvements to combinatorial bounds, usually either from specialized systems or not hard enough to be interesting enough or (usually) both. Has anyone seen anything beyond this (and beyond Aaronson’s example)?
For my part, I now (somewhat newly) find LLMs useful as a sort of fuzzy search engine which can be used before the real search engine to figure out what to search, which includes usefulness for research, but certainly does not include DOING research.
Some signal: Daniel Litt, the mathematician who seems most clued-in regarding LLM use, still doesn’t think there have been any instances of LLMs coming up with new ideas.
I’m currently watching this space closely, but I don’t think anything so far has violated my model. LLMs may end up useful for math in the “prove/disprove this conjecture” way, but not in the “come up with new math concepts (/ideas)” way.
Ah, though perhaps our cruxes there differed from the beginning, if you count “prove a new useful conjecture” as a “novel insight”. IMO, that’d only make them good interactive theorem provers, and wouldn’t bear much on the question of “can they close the loop on R&D/power the Singularity”.
Meta: I find the tone of your LW posts and comments to be really good in some way and I want to give positive feedback and try to articulate what I like about the vibe.
I’d describe it as something like: independent thinking/somewhat original takes expressed in a chill, friendly manner, without deliberate contrarianism, and scout mindset but not in a performative way. Also no dunking on stuff. But still pushing back on arguments you disagree with.
To my tastes this is basically optimal. Hope this comment doesn’t make you overthink it in the future. And maybe this can provide some value to others who are thinking about what style to aim for/promote.
Edit: maybe a short name for it is being intellectually disagreeable while being socially agreeable? Idk that’s probably an oversimplification.
I think it might be worthwhile to distinguish cases where LLMs came up with a novel insight on their own vs. were involved, but not solely responsible.
You wouldn’t credit Google for the breakthrough of a researcher who used Google when making a discovery, even if the discovery wouldn’t have happened without the Google searches. The discovery maybe also wouldn’t have happened without the eggs and toast the researcher had for breakfast.
“LLMs supply ample shallow thinking and memory while the humans supply the deep thinking” is a different and currently much more believable claim than “LLMs can do deep thinking to come up with novel insights on their own.”
In my view, you don’t get novel insights without deep thinking except extremely rarely by random, but you’re right to make sure the topic doesn’t shift without anyone noticing.
Full Scott Aaronson quote in case anyone else is interested:
This is the first paper I’ve ever put out for which a key technical step in the proof of the main result came from AI—specifically, from GPT5-Thinking. Here was the situation: we had an N×N Hermitian matrix E(θ) (where, say, N=2n), each of whose entries was a poly(n)-degree trigonometric polynomial in a real parameter θ. We needed to study the largest eigenvalue of E(θ), as θ varied from 0 to 1, to show that this λmax(E(θ)) couldn’t start out close to 0 but then spend a long time “hanging out” ridiculously close to 1, like 1/exp(exp(exp(n))) close for example.
Given a week or two to try out ideas and search the literature, I’m pretty sure that Freek and I could’ve solved this problem ourselves. Instead, though, I simply asked GPT5-Thinking. After five minutes, it gave me something confident, plausible-looking, and (I could tell) wrong. But rather than laughing at the silly AI like a skeptic might do, I told GPT5 how I knew it was wrong. It thought some more, apologized, and tried again, and gave me something better. So it went for a few iterations, much like interacting with a grad student or colleague. Within a half hour, it had suggested to look at the function
(the expression doesn’t copy-paste properly)
It pointed out, correctly, that this was a rational function in θ of controllable degree, that happened to encode the relevant information about how close the largest eigenvalue λmax(E(θ)) is to 1. And this … worked, as we could easily check ourselves with no AI assistance. And I mean, maybe GPT5 had seen this or a similar construction somewhere in its training data. But there’s not the slightest doubt that, if a student had given it to me, I would’ve called it clever. Obvious with hindsight, but many such ideas are.
I had tried similar problems a year ago, with the then-new GPT reasoning models, but I didn’t get results that were nearly as good. Now, in September 2025, I’m here to tell you that AI has finally come for what my experience tells me is the most quintessentially human of all human intellectual activities: namely, proving oracle separations between quantum complexity classes.
“Here we present Copilot for Real-world Experimental Scientists (CRESt), a platform that integrates large multimodal models (LMMs, incorporating chemical compositions, text embeddings, and microstructural images) with Knowledge-Assisted Bayesian Optimization (KABO) and robotic automation. [...] CRESt explored over 900 catalyst chemistries and 3500 electrochemical tests within 3 months, identifying a state-of-the-art catalyst in the octonary chemical space (Pd–Pt–Cu–Au–Ir–Ce–Nb–Cr) which exhibits a 9.3-fold improvement in cost-specific performance.”
“We leveraged frontier genome language models, Evo 1 and Evo 2, to generate whole-genome sequences with realistic genetic architectures and desirable host tropism [...] Experimental testing of AI-generated genomes yielded 16 viable phages with substantial evolutionary novelty. [...] This work provides a blueprint for the design of diverse synthetic bacteriophages and, more broadly, lays a foundation for the generative design of useful living systems at the genome scale.”
FWIW, my understanding is that Evo 2 is not a generic language model that is able to produce innovations, it’s a transformer model trained on a mountain of genetic data which gave it the ability to produce new functional genomes. The distinction is important, see a very similar case of GPT-4b.
The textbook reading group on “An Introduction to Universal Artificial Intelligence,”which introduces the necessary background for AIXI research, has started, and really gets underway this Monday (Sept. 8th) with sections 2.1 − 2.6.2. Now is about the last chance to easily jump in (since we have only read the intro chapter 1 so far). Please read in advance and be prepared to ask questions and/or solve some exercises. First session had around 20-25 attendees, will probably break up into groups of 5.
Reach out to me in advance for a meeting link, DM or preferably colewyeth@gmail.com. Include your phone number if you want to be added to the WhatsApp group (optional).
I wonder if the reason that polynomial time algorithms tend to be somewhat practical (not runtime n^100) is just that we aren’t smart enough to invent really necessarily complicated polynomial time algorithms.
Like, the obvious way to get n^100 is to nest 100 for loops. A problem which can only be solved in polynomial time by nesting 100 for loops (presumably doing logically distinct things that cannot be collapsed!) is a problem that I am not going to solve in polynomial time…
Reasons I deem more likely: 1. Selection effect: if it’s unfeasible you don’t work on it/don’t hear about it, in my personal experience n^3 is already slow 2. If in n^k k is high, probably you have some representation where k is a parameter and so you say it’s exponential in k, not that it’s polinomial
1: Not true, I hear about exponential time algorithms! People work on all sorts of problems only known to have exponential time algorithms.
2: Yes, but the reason k only shows up as something we would interpret as a parameter and not as a result of the computational complexity of an algorithm invented for a natural problem is perhaps because of my original point—we can only invent the algorithm if the problem has structure that suggests the algorithm, in which case the algorithm is collapsible and k can be separated out as an additional input for a simpler algorithm.
I think canonical high-degree polynomial problem is high-dimensional search. We usually don’t implement exact grid search because we can deploy Monte Carlo or gradient descent. I wonder if there is any hard lower bounds on approximation hardness for polynomial time problems.
A fun illustration of survivorship/selection bias is that nearly every time I find myself reading an older paper, I find it insightful, cogent, and clearly written.
Selection bias isn’t the whole story. The median paper in almost every field is notably worse than it was in, say, 1985. Academia is less selective than it used to be—in the U.S., there are more PhDs per capita, and the average IQ/test scores/whatever metric has dropped for every level of educational attainment.
Grab a journal that’s been around for a long time, read a few old papers and a few new papers at random, and you’ll notice the difference.
To what degree is this true regarding elite-level Ph.D. programs that are likely to lead to publication in (i) mathematics and/or (ii) computer science?
Separately, we should remember that academic selection is a relative metric, i.e. graded on a curve. So, when it comes to Ph.D. programs, is the median 2024 Ph.D. graduate more capable (however you want to define it) than the corresponding graduate from 1985? This is complex, involving their intellectual foundations, depth of their specialized knowledge, various forms of raw intelligence, attention span, collaborative skills, communication ability (including writing skills), and computational tools?
I realize what I’m about to say next may not be representative of the median Ph.D. student, but it feels to me the 2024 graduates of, say, Berkeley or MIT (not to mention, say, Thomas Jefferson High School) are significantly more capable than the corresponding 1985 graduates. Does my sentiment resonate with others and/or correspond to some objective metrics?
Based on my observations, I would also think some current publication chasing culture could get people push out papers more quickly (in some particular domains like CS), even though some papers may be partially completed
Rationality (and other) heuristics I’ve actually found useful for getting stuff done, but unfortunately you probably won’t:
1: Get it done quickly and soon. Every step of every process outside of yourself will take longer than expected, so the effective deadline is sooner than you might think. Also if you don’t get it done soon you might forget (or forget some steps).
1(A): 1 is stupidly important.
2: Do things that set off positive feedback loops. Aggressively try to avoid doing other things. I said aggressively.
2(A): Read a lot, but not too much.*
3: You are probably already making fairly reasonable choices over the action set you are considering. It’s easiest to fall short(er) of optimal behavior by failing to realize you have affordances. Discover affordances.
4: Eat.
(I think 3 is least strongly held)
*I’m describing how to get things done. Reading more has other benefits, for instance if you don’t know the thing you want to get done yet, and its pleasant and self-actualizing.
The primary optimization target for LLM companies/engineers seems to be making them seem smart to humans, particularly the nerds who seem prone to using them frequently. A lot of money and talent is being spent on this. It seems reasonable to expect that they are less smart than they seem to you, particularly if you are in the target category. This is a type of Goodharting.
In fact, I am beginning to suspect that they aren’t really good for anything except seeming smart, and most rationalists have totally fallen for it, for example Zvi insisting that anyone who is not using LLMs to multiply their productivity is not serious (this is a vibe not a direct quote but I think it’s a fair representation of his writing over the last year). If I had to guess, LLMs have 0.99x’ed my productivity by occasionally convincing me to try to use them which is not quite paid for by very rarely fixing a bug in my code. The number is close to 1x because I don’t use them much, not because they’re almost useful. Lots of other people seem to have much worse ratios because LLMs act as a superstimulus for them (not primarily a productivity tool).
Certainly this is an impressive technology, surprising for its time, and probably more generally intelligent than anything else we have built—not going to get into it here, but my model is that intelligence is not totally “atomic” but has various pieces, some of which are present and some missing in LLMs. But maybe the impressiveness is not a symptom of intelligence, but the intelligence a symptom of impressiveness—and if so, it’s fair to say that we have (to varying degrees) been tricked.
I use LLMs throughout my personal and professional life. The productivity gains are immense. Yes hallucination is a problem but it’s just as spam/ads/misinformation on wikipedia/internet—an small drawback that doesn’t oblivate the ginormous potential of the internet/LLMs
I am 95% certain you are leaving value on the table.
I do agree straight LLMs are not generally intelligent (in the sense of universal intelligence/AIXI) and therefore not completely comparable to humans.
On LLMs vs search on internet: agree that LLMs are very helpful in many ways, both personally and professionally, but the worse parts of the misinformation in LLM comparing to wikipedia/internets in my opinion includes: 1) it is relatively more unpredictable when the model will hallucinate, whereas for wikipedia/internet, you would generally expect higher accuracy for simpler/purely factual/mathematical information. 2) it is harder to judge the credibility without knowing the source of the information, whereas on the internet, we could get some signals where the website domain, etc.
From my personal experience, I agree. I find myself unexcited about trying the newest LLM models. My main use-case in practice these days is Perplexity, and I only use it when I don’t care much about the accuracy of the results (which ends up being a lot, actually… maybe too much). Perplexity confabulates quite often even with accurate references in hand (but at least I can check the references). And it is worse than me at the basics of googling things, so it isn’t as if I expect it to find better references than me; the main value-add is in quickly reading and summarizing search results (although the new Deep Research option on Perplexity will at least iterate through several attempted searches, so it might actually find things that I wouldn’t have).
I have been relatively persistent about trying to use LLMs for actual research purposes, but the hallucination rate seems to go to 100% almost whenever an accurate result would be useful to me.
The hallucination rate does seem adequately low when talking about established mathematics (so long as you don’t ask for novel implications, such as applying ideas to new examples). For this and for other reasons I think they can be quite helpful for people trying to get oriented to a subfield they aren’t familiar with—it can make for a great study partner, so long as you verify what it says be checking other references.
Also decent for coding, of course, although the same caveat applies—coders who are already an expert in what they are trying to do will get much less utility out of it.
I recently spoke to someone who made a plausible claim that LLMs were 10xing their productivity in communicating technical ideas in AI alignment with something like the following workflow:
Take a specific cluster of failure modes for thinking about alignment which you’ve seen often.
Hand-write a large, careful prompt document about the cluster of alignment failure modes, which includes many specific trigger-action patterns (if someone makes mistake X, then the correct counterspell to avoid the mistake is Y). This document is highly opinionated and would come off as rude if directly cited/quoted; it is not good communication. However, it is something you can write once and use many times.
When responding to an email/etc, load the email and the prompt document into Claude and ask Claude to respond to the email using the document. Claude will write something polite, informative, and persuasive based on the document, with maybe a few iterations of correcting Claude if its first response doesn’t make sense. The person also emphasized that things should be written in small pieces, as quality declines rapidly when Claude tries to do more at once.
They also mentioned that Claude is awesome at coming up with meme versions of ideas to include in powerpoints and such, which is another useful communication tool.
So, my main conclusion is that there isn’t a big overlap between what LLMs are useful for and what I personally could use. I buy that there are some excellent use-cases for other people who spend their time doing other things.
Still, I agree with you that people are easily fooled into thinking these things are more useful than they actually are. If you aren’t an expert in the subfield you’re asking about, then the LLM outputs will probably look great due to Gell-Mann Amnesia type effects. When checking to see how good the LLM is, people often check the easier sorts of cases which the LLMs are actually decent at, and then wrongly generalize to conclude that the LLMs are similarly good for other cases.
for example Zvi insisting that anyone who is not using LLMs to 10x their productivity is not serious … a vibe not a direct quote
I expect he’d disagree, for example I vaguely recall him mentioning that LLMs are not useful in a productivity-changing way for his own work. And 10x specifically seems clearly too high for most things even where LLMs are very useful, other bottlenecks will dominate before that happens.
10x was probably too strong but his posts are very clear he things it’s a large productivity multiplier. I’ll try to remember to link the next instance I see.
AI doesn’t accelerate my writing much, although it is often helpful in parsing papers and helping me think through things. But it’s a huge multiplier on my coding, like more than 10x.
This is a schelling point for a AI X-risk activism.
I appreciate all of the reviews of “If Anyone Builds it…” particularly from those with strong disagreements. Of course, this community really rewards independent thinking (sometimes to the point of contrarian nitpicking) and this is a very good thing.
However, I basically agree with the central claims of the book and most of the details. So I want to point out that this is a good time for more ~typical normie activism, if you think the book is basically right.
I have personally convinced a few people to buy it, advertised it on basically every channel available to me, and plan to help run the Kitchner-Waterloo reading group (organized by @jenn).
Maybe it’s not the best possible introduction to X-risk, maybe it’s not even the best one out there, but it is the schelling-introduction to AI X-risk.
Maybe activism can possibly backfire in some cases, but non-activism does not look like a plan to me.
Maybe this isn’t the best possible time for activism, but it is the schelling-time for activism.
So if you are part of this coalition, even if you have some disagreements, please spread the word now.
Mathematics students are often annoyed that they have to worry about “bizarre or unnatural” counterexamples when proving things. For instance, differentiable functions without continuous derivative are pretty weird. Particularly engineers tend to protest that these things will never occur in practice, because they don’t show up physically. But these adversarial examples show up constantly in the practice of mathematics—when I am trying to prove (or calculate) something difficult, I will try to cram the situation into a shape that fits one of the theorems in my toolbox, and if those tools don’t naturally apply I’ll construct all kinds of bizarre situations along the way while changing perspective. In other words, bizarre adversarial examples are common in intermediate calculations—that’s why you can’t just safely forget about them when proving theorems. Your logic has to be totally sound as a matter of abstraction or interface design—otherwise someone will misuse it.
While I think the reaction against pathological examples can definitely make sense, and in particular there is a bad habit of some people to overfocus on pathological examples, I do think mathematics is quite different from other fields in that you want to prove that a property holds for all objects with a certain property, or prove that there exists an object with a certain property, and in these cases you can’t ignore the pathological examples, because they can provide you with either solutions to your problem, or show why your approach can’t work.
This is why I didn’t exactly like Dalcy’s point 3 here:
There is also the reverse case, where it is often common practice in math or logic to ignore bizarre and unnatural counterexamples. For example, first-order Peano arithmetic is often identified with Peano arithmetic in general, even though the first order theory allows the existence of highly “unnatural” numbers which are certainly not natural numbers, which are the subject of Peano arithmetic.
Another example is the power set axiom in set theory. It is usually assumed to imply the existence of the power set of each infinite set. But the axiom only implies that the existence of such power sets is possible, i.e. that they can exist (in some models), not that they exist full stop. In general, non-categorical theories are often tacitly assumed to talk about some intuitive standard model, even though the axioms don’t specify it.
From soares and fallenstein “towards idealized decision theory”:
“If someone cannot formally state what it means to find the best decision in theory, then they are probably not ready to construct heuristics that attempt to find the best decision in practice.”
This statement seems rather questionable. I wonder if it is a load-bearing assumption.
I’m not sure what you mean. What is “best” is easily arrived at. If you’re a financier and your goal is to make money, then any formal statement about your decision will maximize money. If you’re a swimmer and your goal is to win an Olympic gold medal, then a formal statement of your decision will obviously include “win gold medal”—part of the plan to execute it may include “beat the current world record for swimming in my category” but “best” isn’t doing the heavy lifting here—the actual formal statement that encapsulates all the factors is—such as what are the milestones.
And if someone doesn’t know what they mean when they think of what is best—then the statement holds true. If you don’t know what is “best” then you don’t know what practical heuristics will deliver you “good enough”.
To put it another way—what are the situations where not defining in clear terms what is best still leads to well constructed heuristics to find the best decision in practice? (I will undercut myself—there is something to be said for exploration [1]and “F*** Around and Find Out” with no particular goal in mind. )
and if they are, how do you define the direction such that you’re sure that among all possible worlds, maximizing this statement actually produces the world that maxes out goal-achievingness?
that’s where decision theories seem to me to come in. the test cases of decision theories are situations where maxing out, eg, CDT, does not in fact produce the highest-goal-score world. that seems to me to be where the difference Cole is raising comes up: if you’re merely moving in the direction of good worlds you can have more complex strategies that potentially make less sense but get closer to the best world, without having properly defined a single mathematical statement whose maxima is that best world. argmax(CDT(money)) may be less than genetic_algo(policy, money, iters=1b) even though argmax is a strict superlative, if the genetic algo finds something closer to, eg, argmax(FDT(money)).
edit: in other words, I’m saying “best” as opposed to “good”. what is good is generally easily arrived at. it’s not hard to find situations where what is best is intractable to calculate, even if you’re sure you’re asking for it correctly.
how do you define the direction such that you’re sure that among all possible worlds, maximizing this statement actually produces the world that maxes out goal-achievingness?
by using the suffix “-(e)st”. “The fastest” “the richest” “the purple-est” “the highest” “the westernmost”. That’s the easy part—defining theoretically what is best. Mapping that theory to reality is hard.
Can you rephrase that—because you’re mentioning theory and possibility at once which sounds like an oxymoron to me. That which is in theory best implies that which is impossible or at least unlikely. If you can rephrase it I’ll probably be able to understand what you mean.
Also, if you had a ‘magic wand’ and could change a whole raft of things at once, do you have a vision of your “best” life that you preference? Not necessarily a likely or even possible one. But one that of all fantasies you can imagine is preeminent? That seems to me to be a very easy way to define the “best”—it’s the one that the agent wants most. I assume most people have their visions of their own “best” lives, am I a rarity in this? Or do most people just kind of never think about what-ifs and have fantasies? And isn’t that, or the model of the self and your own preferences that influences that fantasy going to similarly be part of the model that dictates what you “know” would improve your life significantly.
Because if you consider it an improvement, then you see it as being better. It’s basic English: Good, Better, Best.
I think that “ruggedness” and “elegance” are alternative strategies for dealing with adversity—basically tolerating versus preparing for problems. Both can be done more or less skillfully: low-skilled ruggedness is just being unprepared and constantly suffering, but the higher skilled version is to be strong, healthy, and conditioned enough to survive harsh circumstances without suffering. Low-skilled elegance is a waste of time (e.g. too much makeup but terrible skin) and high skilled elegance is… okay basically being ladylike and sophisticated. Yes I admit it this is mostly about gender.
Other examples: it’s rugged to have a very small number of high quality possessions you can easily throw in a backpack in under 20 minutes, including 3 outfits that cover all occasions. It’s elegant to travel with three suitcases containing everything you could possibly need to look and feel your best, including a both an ordinary and a sun umbrella.
I also think a lot of misunderstanding between genders results from these differing strategies, because to some extent they both work but are mutually exclusive. Elegant people may feel taken advantage of because everyone starts expecting them to do all the preparation. Rugged people may feel they aren’t given enough autonomy and get impatient (“no, I’ll be fine without sunscreen”). There are obvious advantages to having a rugged and elegant member of a team or couple though.
Thanks to useful discussions with my friends / family: Ames, Adriaan, Lauren. Loosely expect I picked the idea up from someone else, can’t be original.
Other examples: it’s rugged to have a very small number of high quality possessions you can easily throw in a backpack in under 20 minutes, including 3 outfits that cover all occasions. It’s elegant to travel with three suitcases containing everything you could possibly need to look and feel your best, including a both an ordinary and a sun umbrella.
This one highlights that the sense of “elegant” you mean is not the math & engineering sense, which is associated with minimalism.
If you asked me to guess what would be the ‘elegant’ counterpoint to ‘traveling with a carefully-curated of the very best prepper/minimalist/nomad/hiker set of gear which ensure a bare minimum of comfort’ was, I would probably say something like ‘traveling with nothing but cash/credit card/smartphone’. You have elegantly solved the universe of problems you encounter while traveling by choosing a single simple tool which can obtain nearly anything from the universe of solutions.
Your categories are not essentially gendered, although I understand why we feel that way. For example, in your travel-packing example my wife would be considered rugged while I would be considered elegant, under your definitions. I also think that in traditional Chinese culture, both of your definitions would be considered masculine. (Sorry women, I guess you get nothing lol)
I also think that we apply these strategies unequally in different parts of our lives. I’d guess if you have to give a research talk at a conference, you’d take an ‘elegant’ approach of “let me prepare my talk well and try to anticipate possible questions the audience will have” instead of “let me do the minimal prep and then just power through any technical difficulties or difficult questions’.
Maybe our gender socialization leads us to favour different strategies in different situations along gendered lines?
to complicate this along gender lines for fun, when i first read your first sentence i totally reversed the descriptions since it’s rugged and masculine to tackle problems and elegant and feminine to tolerate them. per a random edgy tumblr i follow:
that sounds more “rugged” than “elegant” by your definitions, no?
Since this is mid-late 2025, we seem to be behind the aggressive AI 2027 schedule? The claims here are pretty weak, but if LLMs really don’t boost coding speed, this description still seems to be wrong.
[edit: okay actually it’s pretty much mid 2025 still, months don’t count from zero though probably they should because they’re mod 12]
I don’t think there’s enough evidence to draw hard conclusions about this section’s accuracy in either direction, but I would err on the side of thinking ai-2027′s description is correct.
Footnote 10, visible in your screenshot, reads:
For example, we think coding agents will move towards functioning like Devin. We forecast that mid-2025 agents will score 85% on SWEBench-Verified.
(Is it fair to allow pass@k? This Manifold Market doesn’t allow it for its own resolution, but here I think it’s okay, given that the footnote above makes claims about ‘coding agents’, which presumably allow iteration at test time.)
Also, note the following paragraph immediately after your screenshot:
The agents are impressive in theory (and in cherry-picked examples), but in practice unreliable. AI twitter is full of stories about tasks bungled in some particularly hilarious way. The better agents are also expensive; you get what you pay for, and the best performance costs hundreds of dollars a month.11 Still, many companies find ways to fit AI agents into their workflows.12
AI twitter sure is full of both impressive cherry-picked examples, but also stories about bungled tasks. I also agree that the claims about “find[ing] ways to fit AI agents into the workflows” is exceedingly weak. But it’s certainly happening. A quick Google for “AI agent integration” turns up this article from IBM, where agents are diffusing across multiple levels of the company.
If I understand correctly, Claude’s pass@X benchmarks mean multiple sampling and taking the best result. This is valid so long as compute cost isn’t exceeding equivalent cost of an engineer.
codex’s pass @ 8 score seems to be saying “the correct solution was present in 8 attempts, but the model doesn’t actually know what the correct result is”. That shouldn’t count.
Yeah, I wanted to include that paragraph but it didn’t fit in the screenshot. It does seem slightly redeeming for the description. Certainly the authors hedged pretty heavily.
Still, I think that people are not saving days by chatting with AI agents on slack. So there’s a vibe here which seems wrong. The vibe is that these agents are unreliable but are offering very significant benefits. That is called into question by the METR report showing they slowed developers down. There are problems with that report and I would love to see some follow-up work to be more certain.
I appreciate your research on the SOTA SWEBench-Verified scores! That’s a concrete prediction we can evaluate (less important than real world performance, but at least more objective). Since we’re now in mid-late 2025 (not mid 2025), it appears that models are slightly behind their projections even for pass@k, but certainly they were in the right ballpark!
Sorry, this is the most annoying kind of nitpicking on my part, but since I guess it’s probably relevant here (and for your other comment responding to Stanislav down below), the center point of the year is July 2, 2025. So we’re just two weeks past the absolute mid-point – that’s 54.4% of the way through the year.
Also, the codex-1 benchmarks released on May 16, while Claude 4′s were announced on May 22 (certainly before the midpoint).
The prediction is correct on all counts, and perhaps slightly understates progress (though it obviously makes weak/ambiguous claims across the board).
The claim that “coding and research agents are beginning to transform their professions” is straightforwardly true (e.g. 50% of Google lines of code are now generated by AI). The METR study was concentrated in March (which is early 2025).
And it is not currently “mid-late 2025”, it is 16 days after the exact midpoint of the year.
Where is that 50% number from? Perhaps you are referring to this post from google research. If so, you seem to have taken it seriously out of context. Here is the text before the chart that shows 50% completion:
With the advent of transformer architectures, we started exploring how to apply LLMs to software development. LLM-based inline code completion is the most popular application of AI applied to software development: it is a natural application of LLM technology to use the code itself as training data. The UX feels natural to developers since word-level autocomplete has been a core feature of IDEs for many years. Also, it’s possible to use a rough measure of impact, e.g., the percentage of new characters written by AI. For these reasons and more, it made sense for this application of LLMs to be the first to deploy.
Our earlier blog describes the ways in which we improve user experience with code completion and how we measure impact. Since then, we have seen continued fast growth similar to other enterprise contexts, with an acceptance rate by software engineers of 37%[1] assisting in the completion of 50% of code characters[2]. In other words, the same amount of characters in the code are now completed with AI-based assistance as are manually typed by developers. While developers still need to spend time reviewing suggestions, they have more time to focus on code design.
This is referring to inline code completion—so its more like advanced autocomplete than an AI coding agent. It’s hard to interpret this number, but it seems very unlikely this means half the coding is being done by AI and much more likely that it is often easy to predict how a line of code will end given the first half of that line of code and the previous context. Probably 15-20% of what I type into a standard linux terminal is autocompleted without AI?
Also, the right metric is how much AI assistance is speeding up coding. I know of only one study on this, from METR, which showed that it is slowing down coding.
Two days later, is this still a fail? ChatGPT agent is supposed to exactly that. There seems to be a research model within openAI that is capable of getting gold on IMO without any tools.
Maybe it does not meet the expectations yet. Maybe it will with GPT-5 release. We do not know if the new unreleased model is capable of helping with research. However, it’s worth considering the possibility that it could be on a slightly slower timeline and not a complete miss.
i wonder to what extent leadership at openai see ai 2027 as a bunch of milestones that they need to meet, to really be as powerful/scary as they’re said to be.
e.g. would investors/lenders be more hesitant if openai seems to be ‘lagging behind’ ai 2027 predictions?
But it isn’t August or September yet. Maybe someone will end up actually creating capable agents. In addition, the amount of operations used for creating Grok 4 was estimated as 4e27--6e27, which seems to align with the forecast. The research boost rate by Grok 4 or a potentially tweaked model wasn’t estimated. Maybe Grok 4 or an AI released in August will boost research speed?
It was indicated in the opening slide of Grok 4 release livestream that Grok 4 was pretrained with the same amount of compute as Grok 3, which in turn was pretrained on 100K H100s, so probably 3e26 FLOPs (40% utilization for 3 months with 1e15 FLOP/s per chip). RLVR has a 3x-4x lower compute utilization than pretraining, so if we are insisting on counting RLVR in FLOPs, then 3 months of RLVR might be 9e25 FLOPs, for the total of 4e26 FLOPs.
Stargate Abilene will be 400K chips in GB200 NVL72 racks in 2026, which is 10x more FLOP/s than 100K H100s. So it’ll be able to train 4e27-8e27 FLOPs models (pretraining and RLVR, in 3+3 months), and it might be early 2027 when they are fully trained. (Google is likely to remain inscrutable in their training compute usage, though Meta might also catch up by then.)
the amount of operations used for creating Grok 4 was estimated as 4e27--6e27
(I do realize it’s probably some sort of typo, either yours or in your unnamed source. But 10x is almost 2 years of even the current fast funding-fueled scaling, that’s not a small difference.)
We’ve been going on back and forth on this a bit—it seems like your model suggests AGI in 2027 is pretty unlikely?
That is, we see the first generation of massively scaled RLVR around 2026/2027. So it kind of has to work out of the box for AGI to arrive that quickly?
I suppose this is just speculation though. Maybe it’s useful enough that the next generation is somehow much, much faster to arrive?
That is, we see the first generation of massively scaled RLVR around 2026/2027. So it kind of has to work out of the box for AGI to arrive that quickly?
By 2027, we’ll also have 10x scaled-up pretraining compared to current models (trained on 2024 compute). And correspondingly scaled RLVR, with many diverse tool-using environments that are not just about math and coding contest style problems. If we go 10x lower than current pretraining, we get original GPT-4 from Mar 2023, which is significantly worse than the current models. So with 10x higher pretraining than current models, the models of 2027 might make significantly better use of RLVR training than the current models can.
Also, 2 years might be enough time to get some sort of test-time training capability started, either with novel or currently-secret methods, or by RLVRing models to autonomously do post-training on variants of themselves to make them better at particular sources of tasks during narrow deployment. Apparently Sutskever’s SSI is rumored to be working on the problem (at 39:25 in the podcast), and overall this seems like the most glaring currently-absent faculty. (Once it’s implemented, something else might end up a similarly obvious missing piece.)
it seems like your model suggests AGI in 2027 is pretty unlikely?
I’d give it 10% (for 2025-2027). From my impression of the current capabilities and the effect of scaling so far, the remaining 2 OOMs of compute seem like a 30% probability of getting there (by about 2030), with a third of it in the first 10x of the remaining scaling, that is 10% with 2026 compute (for 2027 models). After 2029, scaling slows down to a crawl (relatively speaking), so maybe another 50% for the 1000x of scaling in 2030-2045 when there’ll also be time for any useful schlep, with 20% remaining for 2045+ (some of it from a coordinated AI Pause, which I think is likely to last if at all credibly established). If the 5 GW AI training systems don’t get built in 2028-2029, they are still likely to get built a bit later, so this essentially doesn’t influence predictions outside the 2029-2033 window, some probability within it merely gets pushed a bit towards the future.
So this gives a median of about 2034. Once AGI is still not working in the early 2030s even with more time for schlep, probability at that level of compute starts going down, so 2030s are front-loaded in probability even though compute is not scaling faster in the early 2030s than later.
For reference, I’d also bet on 8+ task length (on METR’s benchmark[1]) by 2027. Probably significantly earlier; maybe early 2026, or even end of this year. Would not be shocked if OpenAI’s IMO-winning model already clears that.
You say you expect progress to stall at 4-16 hours because solving such problems would require AIs to develop sophisticated models of them. My guess is that you’re using intuitions regarding at what task-lengths it would be necessary for a human. LLMs, however, are not playing by the same rules: where we might need a new model, they may be able to retrieve a stored template solution. I don’t think we really have any idea at what task length this trick would stop working for them. I could see it being “1 week”, or “1 month”, or “>1 year”, or “never”.
I do expect “<1 month”, though. Or rather, that even if the LLM architecture is able to support arbitrarily big templates, the scaling of data and compute will run out before this point; and then plausibly the investment and the talent pools would dry up as well (after LLMs betray everyone’s hopes of AGI-completeness).
Not sure what happens if we do get to “>1 year”, because on my model, LLMs might still not become AGIs despite that. Like, they would still be “solvers of already solved problems”, except they’d be… able to solve… any problem in the convex hull of the problems any human ever solved in 1 year...? I don’t know, that would be very weird; but things have already gone in very weird ways, and this is what the straightforward extrapolation of my current models says. (We do potentially die there.[2])
Aside: On my model, LLMs are not on track to hit any walls. They will keep getting better at the things they’ve been getting better at, at the same pace, for as long as the inputs to the process (compute, data, data progress, algorithmic progress) keep scaling at the same rate. My expectation is instead that they’re just not going towards AGI, so “no walls in their way” doesn’t matter; and that they will run out of fuel before the cargo cult of them becomes Singularity-tier transformative.
Recall that it uses unrealistically “clean” tasks and accepts unviable-in-practice solutions: the corresponding horizons for real-world problem-solving seem much shorter. As do the plausibly-much-more-meaningful 80%-completion horizons – this currently sits at 26 minute. (Something like 95%-completion horizons may actually be the most representative metric, though I assume there are some issues with estimating that.)
We should pause to note that a Clippy² still doesn’t really think or plan. It’s not really conscious. It is just an unfathomably vast pile of numbers produced by mindless optimization starting from a small seed program that could be written on a few pages. [...] When it ‘plans’, it would be more accurate to say it fake-plans; when it ‘learns’, it fake-learns; when it ‘thinks’, it is just interpolating between memorized data points in a high-dimensional space, and any interpretation of such fake-thoughts as real thoughts is highly misleading; when it takes ‘actions’, they are fake-actions optimizing a fake-learned fake-world, and are not real actions, any more than the people in a simulated rainstorm really get wet, rather than fake-wet. (The deaths, however, are real.)
Aside: On my model, LLMs are not on track to hit any walls. They will keep getting better at the things they’ve been getting better at, at the same pace, for as long as the inputs to the process (compute, data, data progress, algorithmic progress) keep scaling at the same rate. My expectation is instead that they’re just not going towards AGI, so “no walls in their way” doesn’t matter; and that they will run out of fuel before the cargo cult of them becomes Singularity-tier transformative.
Ok, but surely there has to be something they aren’t getting better at (or are getting better at too slowly). Under your model they have to hit a wall in this sense.
I think your main view is that LLMs won’t ever complete actually hard tasks and current benchmarks just aren’t measuring actually hard tasks or have other measurement issues? This seems inconsistent with saying they’ll just keep getting better though unless your hypothesizing truely insane benchmark flaws right?
Like, if they stop improving at <1 month horizon lengths (as you say immediately above the text I quoted) that is clearly a case of LLMs hitting a wall right? I agree that compute and resources running out could cause this, but it’s notable that we expect ~1 month in not that long, like only ~3 years at the current rate.
it’s notable that we expect ~1 month in not that long, like only ~3 years at the current rate
That’s only if the faster within-RLVR rate that has been holding during the last few months persists. On my current model, 1 month task lengths at 50% happen in 2030-2032, since compute (being the scarce input of scaling) slows down compared to today, and I don’t particularly believe in incremental algorithmic progress as it’s usually quantified, so it won’t be coming to the rescue.
Compared to the post I did on this 4 months ago, I have even lower expectations that the 5 GW training systems (for individual AI companies) will arrive on trend in 2028, they’ll probably get delayed to 2029-2031. And I think the recent RLVR acceleration of the pre-RLVR trend only pushes it forward a year without making it faster, the changed “trend” of the last few months is merely RLVR chip-hours catching up to pretraining chip-hours, which is already essentially over. Though there are still no GB200 NVL72 sized frontier models and probably no pretraining scale RLVR on GB200 NVL72s (which would get better compute utilization), so that might give the more recent “trend” another off-trend push first, perhaps as late as early 2026, but then it’s not yet a whole year ahead of the old trend.
Like, if they stop improving at <1 month horizon lengths (as you say immediately above the text I quoted) that is clearly a case of LLMs hitting a wall right?
I distinguish “the LLM paradigm hitting a wall” and “the LLM paradigm running out of fuel for further scaling”.
I agree that compute and resources running out could cause this, but it’s notable that we expect ~1 month in not that long, like only ~3 years at the current rate.
Yes, precisely. Last I checked, we expected scaling to run out by 2029ish, no?
Ah, reading the comments, I see you expect there to be some inertia… Okay, 2032 / 7 more years would put us at “>1 year” task horizons. That does make me a bit more concerned. (Though 80% reliability is several doublings behind, and I expect tasks that involve real-world messiness to be even further behind.)
Ok, but surely there has to be something they aren’t getting better at (or are getting better at too slowly)
“Ability to come up with scientific innovations” seems to be one.
Like, I expect they are getting better at the underlying skill. If you had a benchmark which measures some toy version of “produce scientific innovations” (AidanBench?), and you plotted frontier models’ performance on it against time, you would see the number going up. But it currently seems to lag way behind other capabilities, and I likewise don’t expect it to reach dangerous heights before scaling runs out.
The way I would put it, the things LLMs are strictly not improving on are not “specific types of external tasks”. What I think they’re not getting better at – because it’s something they’ve never been capable of doing – are specific cognitive algorithms which allow to complete certain cognitive tasks in a dramatically more compute-efficient manner. We’ve talked about this some before.
I think that, in the limit of scaling, the LLM paradigm is equivalent to AGI, but that it’s not a very efficient way to approach this limit. And it’s less efficient along some dimensions of intelligence than along others.
This paradigm attempts to scale certain modules a generally intelligent mind would have to ridiculous levels of power in order to make up for the lack of other necessary modules. This will keep working to improve performance across all tasks, as long as you keep feeding LLMs more data and compute. But there seems to be only a few “GPT-4 to GPT-5” jumps left, and I don’t think it’d be enough.
They are sometimes able to make acceptable PRs, usually when context gathering for the purpose of iteratively building up a model of the relevant code is not a required part of generating said PR.
It seems to me that current-state LLMs don’t learn nearly anything from the context since they have trouble fitting it into their attention span. For example, GPT-5 can create fun stuff from just one prompt and an unpublished LLM solved five out of six problems of IMO 2025, while the six problems together can be expressed by using 3k bytes. However, METR found that “on 18 real tasks from two large open-source repositories, early-2025 AI agents often implement functionally correct code that cannot be easily used as-is, because of issues with test coverage, formatting/linting, or general code quality.”
I strongly suspect that this bottleneck will be ameliorated by using neuralese[1] with big internal memory.
Neuralese with big internal memory
The Meta paper which introduced neuralese had GPT-2 trained to have the thought at the end fed into the beginning. Alas, the amount of bits transferred is equal to the amount of bits in a FLOP number multiplied by the size of the final layer. A potential CoT generates ~16.6 extra bits of information per activation.
At the cost of absolute loss of interpretability, neuralese on steroids could have the LLM of GPT-3′s scale transfer tens ofmillions of bits[2] in the latent space. Imagine GPT-3 175B (which had 96 layers and 12288 neurons in each) receiving an augmentation using the last layer’s results as a steering vector at the beginning, the pre-last layer as a steering vector at the second layer, etc. Or passing the steering vectors through a matrix. These amplifications, at most, double the compute required to run GPT-3, while requiring extra millions of bytes of dynamic memory.
For comparison, the human brain’s short-term memory alone is described by activations of around 86 billions of neurons. And that’s ignoring the middle-term memory and the long-term one...
I think if this were right LLMs would already be useful for software engineering and able to make acceptable PRs.
I think LLMs can be useful for software engineering and can sometimes write acceptable PRs. (I’ve very clearly seen both of these first hand.) Maybe you meant something slightly weaker, like “AIs would be able to write acceptable PRs at a rate of >1/10 on large open source repos”? I think this is already probably true, at least with some scaffolding and inference time compute. Note that METR’s recent results were on 3.7 sonnet.
I’m referring to METR’s recent results. Can you point to any positive results on LLMs writing acceptable PRs? I’m sure that they can in some weak sense e.g. a sufficiently small project with sufficiently low standards, but as far as I remember the METR study concluded zero acceptable PRs in their context.
METR found that0⁄4 PRs which passed test cases and they reviewed were also acceptable to review. This was for 3.7 sonnet on large open source repos with default infrastructure.
The rate at which PRs passed test cases was also low, but if you’re focusing on the PR being viable to merge conditional on passing test cases, the “0/4” number is what you want. (And this is consistent with 10% or some chance of 35% of PRs being mergable conditional on passing test cases, we don’t have a very large sample size here.)
I don’t think this is much evidence that AI can’t sometimes write acceptable PRs in general and there examples of AIs doing this. On small projects I’ve worked on, AIs from a long time ago have written a big chunk of code ~zero-shot. Anecdotally, I’ve heard of people having success with AIs completing tasks zero-shot. I don’t know what you mean by “PR” that doesn’t include this.
I’m sure that they can in some weak sense e.g. a sufficiently small project with sufficiently low standards, but as far as I remember the METR study concluded zero acceptable PRs in their context.
Hence sharing here—I’m not buying (at least for now) because I’m curious where it ends up, but obviously I think “Wyeth wins” shares are at a great price right now ;)
Particularly after my last post, I think my lesswrong writing has had bit too high of a confidence / effort ratio. Possibly I just know the norms of this site well enough lately that I don’t feel as much pressure to write carefully. I think I’ll limit my posting rate a bit while I figure this out.
Yeah, I was thinking greater effort is actually necessary in this case. For context, my lower effort posts are usually more popular. Also the ones that focus on LLMs which is really not my area of expertise.
The hedonic treadmill exists because minds are built to climb utility gradients—absolute utility levels are not even uniquely defined, so as long as your preferences are time-consistent you can just renormalize before maximizing the expected utility of your next decision.
I find this vaguely comforting. It’s basically a decision-theoretic and psychological justification for stoicism.
(must have read this somewhere in the sequences?)
I think self-reflection in bounded reasoners justifies some level of “regret,” “guilt,” “shame,” etc., but the basic reasoning above should hold to first order, and these should all be treated as corrections and for that reason should not get out of hand.
Perhaps LLM’s are starting to approach the intelligence of today’s average human: capable of only limited original thought, unable to select and autonomously pursue a nontrivial coherent goal across time, learned almost everything they know from reading the internet ;)
This doesn’t seem to be reflected in the general opinion here, but it seems to me that LLM’s are plateauing and possibly have already plateaued a year or so ago. Scores on various metrics continue to go up, but this tends to provide weak evidence because they’re heavily gained and sometimes leak into the training data. Still, those numbers overall would tend to update me towards short timelines, even with their unreliability taken into account—however, this is outweighed by my personal experience with LLM’s. I just don’t find them useful for practically anything. I have a pretty consistently correct model of the problems they will be able to help me with and it’s not a lot—maybe a broad introduction to a library I’m not familiar with or detecting simple bugs. That model has worked for a year or two without expanding the set much. Also, I don’t see any applications to anything economically productive except for fluffy chatbot apps.
Huh o1 and the latest Claude were quite huge advances to me. Basically within the last year LLMs for coding went to “occasionally helpful, maybe like a 5-10% productivity improvement” to “my job now is basically to instruct LLMs to do things, depending on the task a 30% to 2x productivity improvement”.
I’m in Canada so can’t access the latest Claude, so my experience with these things does tend to be a couple months out of date. But I’m not really impressed with models spitting out slightly wrong code that tells me what functions to call. I think this is essentially a more useful search engine.
Use Chatbot Arena, both versions of Claude 3.5 Sonnet are accessible in Direct Chat (third tab). There’s even o1-preview in Battle Mode (first tab), you just need to keep asking the question until you get o1-preview. In general Battle Mode (for a fixed question you keep asking for multiple rounds) is a great tool for developing intuition about model capabilities, since it also hides the model name from you while you are evaluating the response.
Just an FYI unrelated to the discussion—all versions of Claude are available in Canada through Anthropic, you don’t even need third party services like Poe anymore.
Base model scale has only increased maybe 3-5x in the last 2 years, from 2e25 FLOPs (original GPT-4) up to maybe 1e26 FLOPs[1]. So I think to a significant extent the experiment of further scaling hasn’t been run, and the 100K H100s clusters that have just started training new models in the last few months promise another 3-5x increase in scale, to 2e26-6e26 FLOPs.
possibly have already plateaued a year or so ago
Right, the metrics don’t quite capture how smart a model is, and the models haven’t been getting much smarter for a while now. But it might be simply because they weren’t scaled much further (compared to original GPT-4) in all this time. We’ll see in the next few months as the labs deploy the models trained on 100K H100s (and whatever systems Google has).
This is 3 months on 30K H100s, $140 million at $2 per H100-hour, which is plausible, but not rumored about specific models. Llama-3-405B is 4e25 FLOPs, but not MoE. Could well be that 6e25 FLOPs is the most anyone trained for with models deployed so far.
I’ve noticed they perform much better on graduate-level ecology/evolution questions (in a qualitative sense—they provide answers that are more ‘full’ as well as technically accurate). I think translating that into a “usefulness” metric is always going to be difficult though.
The last few weeks I felt the opposite of this. I kind of go back and forth on thinking they are plateauing and then I get surprised with the new Sonnet version or o1-preview. I also experiment with my own prompting a lot.
I’ve been waiting to say this until OpenAI’s next larger model dropped, but this has now failed to happen for so long that it’s become it’s own update, and I’d like to state my prediction before it becomes obvious.
An ASI perfectly aligned to me must literally be a smarter version of myself. Anything less than that is a compromise between my values and the values of society. Such a compromise at its extreme fills me with dread. I would much rather live in a society of some discord between many individually aligned ASI’s, than build a benevolent god.
An ASI aligned to a group of people likely should dedicate sovereign slivers of compute (optimization domains) for each of those people, and those people could do well with managing their domain with their own ASIs aligned to each of them separately. Optimization doesn’t imply a uniform pureed soup, it’s also possible to optimize autonomy, coordination, and interaction, without mixing them up.
An ASI perfectly aligned to me must literally be a smarter version of myself.
Values judge what should be done, but also what you personally should be doing. An ASI value aligned to you will be doing the things that should be done (according to you, on reflection), but you wouldn’t necessarily endorse that you personally should be doing those things. Like, I want the world to be saved, but I don’t necessarily want to be in a position to need to try to save the world personally.
So an ASI perfectly aligned to you might help uplift you into a smarter version of yourself as one of its top priorities, and then go on to do various other things you’d approve of on reflection. But you wouldn’t necessarily endorse that it’s the smarter version of yourself that is doing those other things, you are merely endorsing that they get done.
I’m confused about that. I think you might be wrong, but I’ve heard this take before. If what you want is something that looks like a benevolent god, but one according to your own design, then that’s the “cosmopolitan empowerment by I just want cosmopolitanism” scenario. which I don’t trust, and so if I had the opportunity to design an AI, I would do my best to guarantee it’s cosmopolitanism-as-in-a-thing-others-actually-approve-of, for basically “values level LDT” reasons. see also interdimensional council of cosmopolitanisms
Anything less than that is a compromise between my values and the values of society.
I think there’s more leeway here. E.g. instead of a copy of you, a “friend” ASI.
I would much rather live in a society of some discord between many individually aligned ASI’s, than build a benevolent god
A benevolent god that understands your individual values and respects them seems pretty nice to me. Especially compared to a world of competing, individually aligned ASIs. (if your values are in the minority)
@Thomas Kwa will we see task length evaluations for Claude Opus 4 soon?
Anthropic reports that Claude can work on software engineering tasks coherently for hours, but it’s not clear if this means it can actually perform tasks that would take a human hours. I am slightly suspicious because they reported that Claude was making better use of memory on Pokémon, but this did not actually cash out as improved play. This seems like a fairly decisive test of my prediction that task lengths would stagnate at this point; if it does succeed at hours long tasks, I will want to see a careful evaluation of which tasks may or may not have been leaked, are the tasks cleaner than typical hours long SE tasks, etc.
I don’t run the evaluations but probably we will; no timeframe yet though as we would need to do elicitation first. Claude’s SWE-bench Verified scores suggest that it will be above 2 hours on the METR task set; the benchmarks are pretty similar apart from their different time annotations.
That’s a bit higher than I would have guessed. I compared the known data points that have SWE-bench and METR medians (sonnet 3.5,3.6,3.7, o1, o3, o4-mini) and got an r^2 = 0.96 model assuming linearity between log(METR_median) and log(swe-bench-error).
That gives an estimate more like 110 minutes for an Swe-bench score of 72.7%. Which works out to a sonnet doubling time of ~3.3 months. (If I throw out o4-mini, estimator is ~117 minutes.. still below 120)
Also would imply an 85% swe-bench score is something like a 6-6.5 hour METR median.
Since reasoning trace length increases with more steps of RL training (unless intentionally constrained), probably underlying scaling of RL training by AI companies will be observable in the form of longer reasoning traces. Claude 4 is more obviously a pretrained model update, not necessarily a major RLVR update (compared to Claude 3.7), and coherent long task performance seems like something that would greatly benefit from RLVR if it applies at all (which it plausibly does).
So I don’t particularly expect Claude 4 to be much better on this metric, but some later Claude ~4.2-4.5 update with more RLVR post-training released in a few months might do much better.
Sure, but trends like this only say anything meaningful across multiple years, any one datapoint adds almost no signal, in either direction. This is what makes scaling laws much more predictive, even as they are predicting the wrong things. So far there are no published scaling laws for RLVR, the literature is still developing a non-terrible stable recipe for the first few thousand training steps.
This has been going on for months; on the bullish side (for ai progress, not human survival) this means some form of self-improvement is well behind the capability frontier. On the bearish side, we may not expect a further speed up on the log scale (since it’s already factored in to some calculations).
I did not expect this degree of progress so soon; I am now much less certain about the limits of LLMs and less prepared to dismiss very short timelines.
With that said… the problems that it has solved do seem to be somewhat exhaustive search flavored. For instance it apparently solved an open math problem, but this involved arranging a bunch of spheres. I’m not sure to what degree LLM insight was required beyond just throwing a massive amount of compute at trying possibilities. The self-improvements GDM reports are similar—like faster matrix multiplication in I think the 4x4 case. I do not know enough about these areas to judge whether AI is essential here or whether a vigorous proof search would work. At the very least, the system does seem to specialize in problems with highly verifiable solutions. I am convinced, but not completely convinced.
Also, for the last couple of months whenever I’ve asked why LLMs haven’t produced novel insights, I’ve often gotten the response “no one is just letting them run long enough to try.” Apparently GDM did try it (as I expected) and it seems to have worked somewhat well (as I did not expect).
Heads up: I am not an AI researcher or even an academic, just someone who keeps up with AI
But I do have quick thoughts as well;
Kernel optimization (which they claim is what resulted in the 1% decrease in training time) is something we know AI models are great at (see RE-Bench and the multiple arXiv papers on the matter, including from DeepSeek).
It seems to me like AlphaEvolve is more-or-less an improvement over previous models that also claimed to make novel algorithmic and mathematical discoveries (FunSearch, AlphaTensor) notably by using better base Gemini models and a better agentic framework. We also know that AI models already contribute to the improvement of AI hardware. What AlphaEvolve seems to do is to unify all of that into a superhuman model for those multiple uses. In the accompanying podcast they give us some further information:
The rate of improvement is still moderate, and the process still takes months. They phrase it as an interesting and promising area of progress for the future, not as a current large improvement.
They have not tried to distill all that data into a new model yet, which seems strange to me considering they’ve had it for a year now.
They say that a lot of improvements come from the base model’s quality.
They do present the whole thing as part of research rather than a product
So yeah I can definitely see a path for large gains in the future, thought for now those are still on similar timetables as per their own admission. They expect further improvements when base models improve and are hoping that future versions of AlphaEvolve can in turn shorten the training time for models, the hardware pipeline, and improve models in other ways. And for your point about novel discoveries, previous Alpha models seemed to already be able to do the same categories of research back in 2023, on mathematics and algorithmic optimization. We need more knowledgeable people to weight in, especially to compare with previous models of the same classification.
This is also a very small thing to keep in mind, but GDM models don’t often share the actual results of their models’ work as usable/replicable papers, which has caused experts to cast some doubts on results in the past. It’s hard to verify their results, since they’ll be keeping them close to their chests.
Unfortunate consequence of sycophantic ~intelligent chatbots: everyone can get their theories parroted back to them and validated. Particularly risky for AGI, where the chatbot can even pretend to be running your cognitive architecture. Want to build a neuro-quantum-symbolic-emergent-consciousness-strange-loop AGI? Why bother, when you can just put that all in a prompt!
A lot of new user submissions these days to LW are clearly some poor person who was sycophantically encouraged by an AI to post their crazy theory of cognition or consciousness or recursion or social coordination on LessWrong after telling them their ideas are great. When we send them moderation messages we frequently get LLM-co-written responses, and sometimes they send us quotes from an AI that has evaluated their research as promising and high-quality as proof that they are not a crackpot.
Basic sanity check: We can align human children, but can we align any other animals? NOT to the extent that we would trust them with arbitrary amounts of power, since they obviously aren’t smart enough for this question to make much sense. Just like, are there other animals that we’ve made care about us at least “a little bit?” Can dogs be “well trained” in a way where they actually form bonds with humans and will go to obvious personal risk to protect us, or not eat us even if they’re really hungry and clearly could? How about species further on the evolutionary tree like hunting falcons? Where specifically is the line?
As well as the “theoretical—empirical” axis, there is an “idealized—realistic” axis. The former distinction is about the methods you apply (with extremes exemplified by rigorous mathematics and blind experimentation, respectively). The later is a quality of your assumptions / paradigm. Highly empirical work is forced to be realistic, but theoretical work can be more or less idealized. Most of my recent work has been theoretical and idealized, which is the domain of (de)confusion. Applied research must be realistic, but should pragmatically draw on theory and empirical evidence. I want to get things done, so I’ll pivot in that direction over time.
Sometimes I wonder if people who obsess over the “paradox of free will” are having some “universal human experience” that I am missing out on. It has never seemed intuitively paradoxical to me, and all of the arguments about it seem either obvious or totally alien. Learning more about agency has illuminated some of the structure of decision making for me, but hasn’t really effected this (apparently) fundamental inferential gap. Do some people really have this overwhelming gut feeling of free will that makes it repulsive to accept a lawful universe?
I used to, as a child. I did accept a lawful universe, but I thought my perception of free will was in tension with that, so that perception must be “an illusion”.
My mother kept trying to explain to me that there was no tension between these things, because it was correct that my mind made its own decisions rather than some outside force. I didn’t understand what she was saying though. I thought she was just redefining ‘free will’ from a claim that human brains effectively had a magical ability to spontaneously ignore the laws of physics to a boring tautological claim that human decisions are made by humans rather than something else.
I changed my mind on this as a teenager. I don’t quite remember how, it might have been the sequences or HPMOR again. I realised that my imagination had still been partially conceptualising the “laws of physics” as some sort of outside force, a set of strings pulling my atoms around, rather than as a predictive description of me and the universe. Saying “the laws of physics make my decisions, not me” made about as much sense as saying “my fingers didn’t move, my hand did.” That was what my mother had been trying to tell me.
I don’t think so as I had success explaining away the paradox with concept of “different levels of detail”—saying that free will is a very high-level concept and further observations reveal a lower-level view, calling upon analogy with algorithmic programming’s segment tree.
(Segment tree is a data structure that replaces an array, allowing to modify its values and compute a given function over all array elements efficiently. It is based on tree of nodes, each of those representing a certain subarray; each position is therefore handled by several—specifically, O(logn) nodes.)
This might be related to whether you see yourself as a part of the universe, or as an observer. If you are an observer, the objection is like “if I watch a movie, everything in the movie follows the script, but I am outside the movie, therefore outside the influence of the script”.
If you are religious, I guess your body is a part of the universe (obeys the laws of gravity etc.), but your soul is the impartial observer. Here the religion basically codifies the existing human intuitions.
It might also depend on how much you are aware of the effects of your environment on you. This is a learned skill; for example little kids do not realize that they are hungry… they just get kinda angry without knowing why. It requires some learning to realize “this feeling I have right now—it is hunger, and it will probably go away if I eat something”. And I guess the more knowledge of this kind you accumulate, the easier it is to see yourself as a part of the universe, rather than being outside of it and only moved by “inherently mysterious” forces.
If instead of building LLMs, tech companies had spent billions of dollars designing new competing search engines that had no ads but might take a few minutes to run and cost a few cents per query, would the result have been more or less useful?
Rather less useful to me personally as a software developer.
Besides that, I feel like this question is maybe misleading? If ex. Google built a new search engine that could answer queries like its current AI-powered search summaries, or like ChatGPT, wouldn’t that have to be some kind of language model anyway? Is there another class of thing besides AGI that could perform as well at that task?
(I assume you’re not suggesting just changing the pricing model of existing-style search engines, which already had a market experiment (ex. Kagi) some years ago with only mild success.)
I think that would require text comprehension too. I guess it’s an interesting question if you can build an AI that can comprehend text but not produce it?
My impression is that the decline of search engines has little to do with search ads. It has more to do with a decline in public webpage authoring in favor of walled gardens, chat systems, etc.: new organic human-written material that once would have been on a public forum site (or home page!) is today often instead in an unindexable Discord chat or inside an app. Meanwhile, spammy content on the public Web has continued to escalate; and now LLMs are helping make more and more of it.
But most of LLMs’ knowledge comes from the public Web, so clearly there is still a substantial amount of useful content on it, and maybe if search engines had remained good enough at filtering spam fewer people would have fled to Discord.
To what extent would a proof about AIXI’s behavior be normative advice?
Though AIXI itself is not computable, we can prove some properties of the agent—unfortunately, there are fairly few examples because of the “bad universal priors” barrier discovered by Jan Leike. In the sequential case we only know things like e.g. it will not indefinitely keep trying an action that yields minimal reward, though we can say more when the horizon is 1 (which reduces to the predictive case in a sense). And there are lots of interesting results about the behavior of Solomonoff induction (roughly speaking the predictive part of AIXI).
For the sake of argument though, assume we could prove some (more?) interesting statements about AIXI’s strategy—certainly this is possible for us computable beings. But would we want to take those statements as advice, or are we too ignorant to benefit from cargo-culting an inscrutable demigod like AIXI?
Can AI X-risk be effectively communicated by analogy to climate change? That is, the threat isn’t manifesting itself clearly yet, but experts tell us it will if we continue along the current path.
Though there are various disanalogies, this specific comparison seems both honest and likely to be persuasive to the left?
I don’t like it. Among various issues, people already muddy the waters by erroneously calling climate change an existential risk (rather than what it was, a merely catastrophic one, before AI timelines made any worries about climate change in the year 2100 entirely irrelevant), and it’s extremely partisan-coded. And you’re likely to hear that any mention of AI x-risk is a distraction from the real issues, which are whatever the people cared about previously.
I prefer an analogy to gain-of-function research. As in, scientists grow viruses/AIs in the lab, with promises of societal benefits, but without any commensurate acknowledgment of the risks. And you can’t trust the bio/AI labs to manage these risks, e.g. even high biosafety levels can’t entirely prevent outbreaks.
I agree that there is a consistent message here, and I think it is one of the most practical analogies, but I get the strong impression that tech experts do not want to be associated with environmentalists.
I think it would be persuasive to the left, but I’m worried that comparing AI x-risk to climate change would make it a left-wing issue to care about, which would make right-wingers automatically oppose it (upon hearing “it’s like climate change”).
Generally it seems difficult to make comparisons/analogies to issues that (1) people are familiar with and think are very important and (2) not already politicized.
I’m looking at this not from a CompSci point of view by a rhetoric point of view: Isn’t it much easier to make tenuous or even flat out wrong links between Climate Change and highly publicized Natural Disaster events that have lot’s of dramatic, visceral footage than it is to ascribe danger to a machine that hasn’t been invented yet, that we don’t know the nature or inclinations of?
I don’t know about nowadays but for me the two main pop-culture touchstones for me for “evil AI” are Skynet in Terminator, or HAL 9000 in 2001: A Space Odyssey (and by inversion—the Butlerian Jihad in Dune). Wouldn’t it be more expedient to leverage those? (Expedient—I didn’t say accurate)
Most ordinary people don’t know that no one understands how neural networks work (or even that modern “Generative A.I.” is based on neural networks). This might be an underrated message since the inferential distance here is surprisingly high.
It’s hard to explain the more sophisticated models that we often use to argue that human dis-empowerment is the default outcome but perhaps much better leveraged to explain these three points:
1) No one knows how A.I models / LLMs / neural nets work (with some explanation of how this is conceptually possible).
2) We don’t know how smart they will get how soon.
3) We can’t control what they’ll do once they’re smarter than us.
At least under my state of knowledge, this is also a particularly honest messaging strategy, because it emphasizes the fundamental ignorance of A.I. researchers.
“Optimization power” is not a scalar multiplying the “objective” vector. There are different types. It’s not enough to say that evolution has had longer to optimize things but humans are now “better” optimizers: Evolution invented birds and humans invented planes, evolution invented mitochondria and humans invented batteries. In no case is one really better than the other—they’re radically different sorts of things.
Evolution optimizes things in a massively parallel way, so that they’re robustly good at lots of different selectively relevant things at once, and has been doing this for a very long time so that inconceivably many tiny lessons are baked in a little bit. Humans work differently—we try to figure out what works for explainable, preferably provable reasons. We also blindly twiddle parameters a bit, but we can only keep so many parameters in mind at once and compare so many metrics—humanity has a larger working memory than individual humans, but the human innovation engine is still driven by linguistic theories, expressed in countable languages. There must be a thousand deep mathematical truths that evolution is already taking advantage of to optimize its DNA repair algorithms, or design wings to work very well under both ordinary and rare turbulent conditions, or minimize/maximize surface tensions of fluids, or invent really excellent neural circuits—without ever actually finding the elaborate proofs. Solving for exact closed form solutions is often incredibly hard, even when the problem can be well-specified, but natural selection doesn’t care. It will find what works locally, regardless of logical depth. It might take humans thousands of years to work some of these details out on paper. But once we’ve worked something out, we can deliberately scale it further and avoid local minima. This distinction in strategies of evolution v.s. humans rhymes with wisdom v.s. intelligence—though in this usage intelligence includes all the insight, except insofar as evolution located and acts through us. As a sidebar, I think some humans prefer an intuitive strategy that is more analogous to evolution’s in effect (but not implementation).
So what about when humans turn to building a mind? Perhaps a mind is by its nature something that needs to be robust, optimized in lots of little nearly inexplicable ways for arcane reasons to deal with edge cases. After all, isn’t a mind exactly that which provides an organism/robot/agent with the ability to adapt flexibly to new situations? A plane might be faster than a bird, throwing more power at the basic aerodynamics, but it is not as flexible—can we scale some basic principles to beat out brains with the raw force of massive energy expenditure? Or is intelligence inherently about flexibility, and impossible to brute force in that way? Certainly it’s not logically inconsistent to imagine that flexibility itself has a simple underlying rule—as a potential existence proof, the mechanics of evolutionary selection are at least superficially simple, though we can’t literally replicate it without a fast world-simulator, which would be rather complicated. And maybe evolution is not a flexible thing, but only a designer of flexible things. So neither conclusion seems like a clear winner a priori.
The empirical answers so far seem to complicate the story. Attempts to build a “glass box” intelligence out of pure math (logic or probability) have so far not succeeded, though they have provided useful tools and techniques (like statistics) that avoid the fallacies and biases of human minds. But we’ve built a simple outer loop optimization target called “next token prediction” and thrown raw compute at it, and managed to optimize black box “minds” in a new way (called gradient descent by backpropogation). Perhaps the process we’ve capture is a little more like evolution, designing lots of little tricks that work for inscrutable reasons. And perhaps it will work, woe unto us, who have understood almost nothing from it!
Human texts also need reasons to trust the takeaways from them, things like bounded distrust from reputational incentives, your own understanding after treating something as steelmanning fodder, expectation that the authors are talking about what they actually observed. So it’s not particularly about alignment with humans either. Few of these things apply to LLMs, and they are not yet good at writing legible arguments worth verifying, though IMO gold is reason to expect this to change in a year or so.
(Epistemic status: I am signal boosting this with an explicit one-line summary that makes clear it is bearish for LLMs, because scary news about LLM capability acceleration is usually more visible/available than this update seems to be. Read the post for caveats.)
@ryan_greenblatt made a claim that continual learning/online training can already be done, but that right now it’s not super-high returns and requires annoying logistical/practical work to be done, and right now AI issues are elsewhere like sample efficiency and robust self-verification.
That would explain the likelihood of getting AGI by the 2030s being pretty high:
Are you claiming that RL fine-tuning doesn’t change weights? This is wrong.
Maybe instead you’re saying “no one ongoingly does RL fine-tuning where they constantly are updating the weights throughout deployment (aka online training)”. My response is: sure, but they could do this, they just don’t because it’s logistically/practically pretty annoying and the performance improvement wouldn’t be that high, at least without some more focused R&D on making this work better.
My best guess is that the way humans learn on the job is mostly by noticing when something went well (or poorly) and then sample efficiently updating (with their brain doing something analogous to an RL update). In some cases, this is based on external feedback (e.g. from a coworker) and in some cases it’s based on self-verification: the person just looking at the outcome of their actions and then determining if it went well or poorly.
So, you could imagine RL’ing an AI based on both external feedback and self-verification like this. And, this would be a “deliberate, adaptive process” like human learning. Why would this currently work worse than human learning?
Current AIs are worse than humans at two things which makes RL (quantitatively) much worse for them:
Robust self-verification: the ability to correctly determine when you’ve done something well/poorly in a way which is robust to you optimizing against it.
Sample efficiency: how much you learn from each update (potentially leveraging stuff like determining what caused things to go well/poorly which humans certainly take advantage of). This is especially important if you have sparse external feedback.
But, these are more like quantitative than qualitative issues IMO. AIs (and RL methods) are improving at both of these.
All that said, I think it’s very plausible that the route to better continual learning routes more through building on in-context learning (perhaps through something like neuralese, though this would greatly increase misalignment risks...).
For many (IMO most) useful tasks, AIs are limited by something other than “learning on the job”. At autonomous software engineering, they fail to match humans with 3 hours of time and they are typically limited by being bad agents or by being generally dumb/confused. To be clear, it seems totally plausible that for podcasting tasks Dwarkesh mentions, learning is the limiting factor.
Correspondingly, I’d guess the reason that we don’t see people trying more complex RL based continual learning in normal deployments is that there is lower hanging fruit elsewhere and typically something else is the main blocker. I agree that if you had human level sample efficiency in learning this would immediately yield strong results (e.g., you’d have very superhuman AIs with 10^26 FLOP presumably), I’m just making a claim about more incremental progress.
I think AIs will likely overcome poor sample efficiency to achieve a very high level of performance using a bunch of tricks (e.g. constructing a bunch of RL environments, using a ton of compute to learn when feedback is scarce, learning from much more data than humans due to “learn once deploy many” style strategies). I think we’ll probably see fully automated AI R&D prior to matching top human sample efficiency at learning on the job. Notably, if you do match top human sample efficiency at learning (while still using a similar amount of compute to the human brain), then we already have enough compute for this to basically immediately result in vastly superhuman AIs (human lifetime compute is maybe 3e23 FLOP and we’ll soon be doing 1e27 FLOP training runs). So, either sample efficiency must be worse or at least it must not be possible to match human sample efficiency without spending more compute per data-point/trajectory/episode.
I still don’t think that a bunch of free-associating inner monologues talking to each other gives you AGI, and it still seems to be an open question whether adding RL on top just works.
The “hallucinations” of the latest reasoning models look more like capability failures than alignment failures to me, and I think this points towards “no.” But my credences are very unstable; if METR task length projections hold up or the next reasoning model easily zero-shots Pokemon I will just about convert.
Investigating preliminary evaluations of o3 and o4-mini I am more convinced that task length is scaling as projected.
Pokémon has fallen, but as far as I can tell this relied on scaffolding improvements for Gemini 2.5 pro customized during the run, NOT a new smarter model.
Overall, I am already questioning my position one week later.
Pokémon is actually load-bearing for your models? I’m imagining a counterfactual world in which Sonnet 3.7′s initial report involved it beating Pokémon Red, and I don’t think my present-day position would’ve been any different in it.
Even aside from tons of walkthrough information present in LLMs’ training set, and iterative prompting allowing to identify and patch holes in LLMs’ pretrained instinctive game knowledge, Pokémon is simply not a good test of open-ended agency. At the macro-scale, the game state can only progress forward, and progressing it requires solving relatively closed-form combat/navigational challenges. Which means if you’re not too unlikely to blunder through each of those isolated challenges, you’re fated to “fail upwards”. The game-state topology doesn’t allow you to progress backward or get stuck in a dead end: you can’t lose a badge or un-win a boss battle. I. e.: there’s basically an implicit “long-horizon agency scaffold” built into the game.
Which means what this tests is mainly the ability to solve somewhat-diverse isolated challenges in sequence. But not the ability to autonomously decompose long-term tasks into said isolated challenges in a way such that the sequence of isolated challenges implacably points at the long-term task’s accomplishment.
I think the hallucinations/reward hacking is actually a real alignment failure, but an alignment failure that happens to degrade capabilities a lot, though at least some of the misbehavior is probably due to context, but I have seen evidence that the alignment failures are more deliberate than regular capabilities failures.
That said, if this keeps happening, the likely answer is because capabilities progress is to a significant degree bottlenecked on alignment progress, such that you need a significant degree of progress on preventing specification gaming to get new capabilities, and this would definitely be a good world for misalignment issues if the hypothesis is true (which I put some weight on)
(Also, it’s telling that the areas where RL has worked best are areas where you can basically create unhackable reward models like many games/puzzles, and once reward hacking is on the table, capabilities start to decrease).
At a glance, it is (pretty convincingly) the smartest model overall. But progress still looks incremental, and I continue to be unconvinced that this paradigm scales to AGI. If so, the takeoff is surprisingly slow.
I don’t see it that way. Broad and deep knowledge is as useful as ever, and LLMs are no substitutes for it.
This anecdote comes to mind:
Dr. Pauling taught first-year chemistry at Cal Tech for many years. All of his exams were closed book, and the students complained bitterly. Why should they have to memorize Boltzmann’s constant when they could easily look it up when they needed it? I paraphrase Mr. Pauling’s response: I was always amazed at the lack of insight this showed. It’s what you have in your memory bank—what you can recall instantly—that’s important. If you have to look it up, it’s worthless for creative thinking.
He proceeded to give an example. In the mid-1930s, he was riding a train from London to Oxford. To pass the time, he came across an article in the journal, Nature, arguing that proteins were amorphous globs whose 3D structure could never be deduced. He instantly saw the fallacy in the argument—because of one isolated stray fact in his memory bank—the key chemical bond in the protein backbone did not freely rotate, as was argued. Linus knew from his college days that the peptide bond had to be rigid and coplanar.
He began doodling, and by the time he reached Oxford, he had discovered the alpha helix. A year later, his discovery was published in Nature. In 1954, Linus won the Nobel Prize in Chemistry for it. The discovery lies at the core of many of the great advances in medicine and pharmacology that have occurred since.
This fits with my experience. If you’re trying to do some nontrivial research or planning, you need to have a vast repository of high-quality mental models of diverse phenomena in your head, able to be retrieved in a split-second and immediately integrated into your thought process. If you need to go ask an LLM about something, this breaks the flow state, derails your trains of thought, and just takes dramatically more time. Not to mention unknown unknowns: how can you draw on an LLM’s knowledge about X if you don’t even know that X is a thing?
IMO, the usefulness of LLMs is in improving your ability to build broad and deep internal knowledge bases, rather than in substituting these internal knowledge bases.
This is probably right. Though perhaps one special case of my point remains correct: the value of a generalist as a member of a team may be somewhat reduced.
The value of a generalist with shallow knowledge is reduced, but you get a chance to become a generalist with relatively deep knowledge of many things. You already know the basics, so you can start the conversation with LLMs to learn more (and knowing the basics will help you figure out when the LLM hallucinates).
Back-of-the-envelope math indicates that an ordinary NPC in our world needs to double their power like 20 times over to become a PC. That’s a tough ask. I guess the lesson is either give up or go all in.
There are around 8 billion humans, so an ordinary person has a very small fraction of the power needed to steer humanity in any particular direction. A very large number of doublings are required to be a relevant factor.
That’s an interesting idea. However, people who read this comments probably already have power much greater than the baseline—a developed country, high intelligence, education, enough money and free time to read websites...
Not sure how many of those 20 doublings still remain.
I thought the statement was pretty clearly not about the average lesswronger.
But in terms of the “call to action” − 20 was pretty conservative, so I think it’s still in that range, and doesn’t change the conclusions one should draw much.
That moment when you want to be updateless about risk but updateful about ignorance, but the basis of your epistemology is to dissolve the distinction between risk and ignorance.
The bat thing might have just been Thomas Nagel, I can’t find the source I thought I remembered.
At one point I said LLMs forget everything they thought previously between predicting (say) token six and seven and half to work from scratch. Because of the way the attention mechanism works it is actually a little more complicated (see the top comment from hmys). What I said is (I believe) still overall right but I would put that detail less strongly.
Hofstadter apparently was the one who said a human-level chess AI would rather talk about poetry.
Gary Kasparov would beat me at chess in some way I can’t predict in advance. However, if the game starts with half his pieces removed from the board, I will beat him by playing very carefully. The first above-human level A.G.I. seems overwhelmingly likely to be down a lot of material—massively outnumbered, running on our infrastructure, starting with access to pretty crap/low bandwidth actuators in the physical world and no legal protections (yes, this actually matters when you’re not as smart as ALL of humanity—it’s a disadvantage relative to even the average human). If we exercise even a modicum of competence it will also be even tougher (e.g. an air gap, dedicated slightly weaker controllers, exposed thoughts at some granularity). If the chess metaphor holds we should expect the first such A.G.I. not to beat us—but it may well attempt to escape under many incentive structures. Does this mean we should expect to have many tries to solve alignment?
If you think not, it’s probably because of some dis-analogy with chess. For instance, the search space in the real world is much richer, and maybe there are always some “killer moves” available if you’re smart enough to see them e.g. invent nanotech. This seems to tie in with people’s intuitions about A) how fragile the world is and B) how g-loaded the game of life is. Personally I’m highly uncertain about both, but I suspect the answers are “somewhat.”
I would guess that A.G.I. that only wants to end the world might be able to pull it off with slightly superhuman intelligence, which is very scary to me. But I think it would actually be very hard to bootstrap all singularity level infrastructure from a post-apocalyptic wasteland, so perhaps this is actually not an convergent instrumental subgoal at this level of intelligence.
Is life actually much more g-loaded than chess? In terms of how far you can in principle multiply your material, unequivocally yes. However life is also more stochastic—I will never beat Gary Kasparov in a fair game, but if Jeff Bezos and I started over with ~0 dollars and no name recognition / average connections today, I think there’s a good >1% chance I’m richer in a year. It’s not immediately clear to me which view is more relevant here.
I suspect that human minds are vast (more like little worlds of our own than clockwork baubles) and even a superintelligence would have trouble predicting our outputs accurately from (even quite) a few conversations (without direct microscopic access) as a matter of sample complexity.
There is a large body of non-AI literature that already addresses this, for example the research of Gerd Gigerenzer which shows that often heuristics and “fast and frugal” decision trees substantially outperform fine grained analysis because of the sample complexity matter you mention.
Pop frameworks which elaborate on this, and how it may be applied include David Snowden’s Cynefin framework which is geared for government and organizations and of course Nicholas Nassim Taleb’s Incerto.
I seem to recall also that the gist of Dunbar’s Number, and the reason why certain Parrots and Corvids seem to have larger pre-frontal-crotex equivalents than non-monogamous birds, is basically so that they can have a internal model of their mating partner. (This is very interesting to think about in terms of intimate human relationships, what I’d poetically describe as the “telepathy” when wordlessly you communicate, intuit, and predict a wide range of each-other’s complex and specific desires and actions because you’ve spent enough time together).
The scary thought to me is that a superintelligence would quite simply not need to accurately model us, it would just need to fine tune it’s models in a way not dissimilar from the psychographic models utilized by marketers. Of course that operates at scale so the margin of error is much greater but more ‘acceptable’.
Indeed dumb algorithms already to this very well—think about how ‘addictive’ people claim their TikTok or Facebook feeds are. The rudimentary sensationalist clickbait that ensures eyeballs and clicks. A superintelligence doesn’t need accurate modelling—this is without having individual conversations with us, to my knowledge (or rather my experience) most social media algorithms are really bad at taking the information on your profile and using things like sentiment and discourse analysis to make decisions about which content to feed you; they rely on engagement like sharing, clicking like, watch time and rudimentary metrics like that. Similarly, the content creators are often casting a wide net, and using formulas to produce this content.
A superintelligence I wager would not need accuracy yet still be capable of psychological tactics geared to the individual that the Stasi who operated Zersetzung could only dream of. Marketers must be drooling at the possibilities of finding orders of magnitude more effective marketing campaigns that would make one to one sales obsolete.
One can showcase very simple examples of data that is easy to generate ( simple data soirce) yet very hard to predict.
E.g. there is a 2-state generating hidden markov model whose optimal prediction hidden markov model is infinite.
Ive heard it explained as follows: it’s much harder for the fox to predict where the hare is going than it is for the hare to decide where to go to shake off the fox.
I’m starting a google group for anyone who wants to see occasional updates on my Sherlockian Abduction Master List. It occurred to me that anyone interested in the project would currently have to check the list to see any new observational cues (infrequently) added—also some people outside of lesswrong are interested.
Simple argument that imitation learning is the easiest route to alignment:
Any AI aligned to you needs to represent you in enough detail to fully understand your preferences / values, AND maintain a stable pointer to that representation of value (that is, it needs to care). The second part is surprisingly hard to get exactly right.
Imitation learning basically just does the first part—it builds a model of you, which automatically contains your values, and by running that model optimizes your values in the same way that you do. This has to be done faithfully for the approach to work safely—the model has to continue acting like you would in new circumstances (out of distribution) and when it runs for a long time—which is nontrivial.
That is, faithful imitation learning is kind of alignment-complete: it solves alignment, and any other solution to alignment kind of has to solve imitation learning implicitly, by building a model of your preferences.
I think people (other than @michaelcohen) mostly haven’t realized this for two reasons: the idea doesn’t sound sophisticated enough, and it’s easy to point at problems with naive implementations.
Imitation learning is not a new idea so you don’t sound very smart or informed by suggesting it as a solution.
And implementing it faithfully does face barriers! You have to solve “inner optimization problems” which basically come down to the model generalizing properly, even under continual / lifelong learning. In other words, the learned model should be a model in the strict sense of simulation (perhaps at some appropriate level of abstraction). This really is hard! And I think people assume that anyone suggesting imitation learning can be safe doesn’t appreciate how hard it is. But I think it’s hard in the somewhat familiar sense that you need to solve a lot of tough engineering and theory problems—and a bit of philosophy. However it’s not as intractably hard as solving all of decision theory etc. I believe that with a careful approach, the capabilities of an imitation learner do not generalize further than its alignment, so it is possible to get feedback from reality and iterate—because the model’s agency is coming from imitating an agent which is aligned (and with care, is NOT emergent as an inner optimizer).
Also, you still need to work out how to let the learned model = hopefully simulation of a human recursively self improve safely. But notice how much progress has already been made at this point! If you’ve got a faithful simulation of a human, you’re in a very different and much better situation. You can run that simulation faster as technology advances, meaning you aren’t immediately left in the dust by LLM scaling—you can have justified trust in an effectively superhuman alignment researcher. And recursive self improvement is probably easier than alignment from scratch.
I think we need to take this strategy a lot more seriously.
Thinking times are now long enough that in principle frontier labs could route some API (or chat) queries to a human on the backend, right? Is this plausible? Could this give them a hype advantage if in the medium term, if they picked the most challenging (for LLMs) types of queries effectively, and if so, is there any technical barrier? I can see this kind of thing eventually coming out, if the Wentworth “it’s bullshit though” frame turns out to be partially right.
(I’m not suggesting they would do this kind of blatant cheating on benchmarks, and I have no inside knowledge suggesting this has ever happened)
I seem to recall EY once claiming that insofar as any learning method works, it is for Bayesian reasons. It just occurred to me that even after studying various representation and complete class theorems I am not sure how this claim can be justified—certainly one can construct working predictors for many problems that are far from explicitly Bayesian. What might he have had in mind?
So is the fascination with applying math to complex real-world problems (like alignment) when the necessary assumptions don’t really fit the real-world problem.
Beauty of notation is an optimization target and so should fail as a metric, but especially compared to other optimization targets I’ve pushed on, in my experience it seems to hold up. The exceptions appear to be string theory and category theory and two failures in a field the size of math is not so bad.
I wonder if it’s true that around the age of 30 women typically start to find babies cute and consequently want children, and if so is this cultural or evolutionary? It’s sort of against my (mesoptimization) intuitions for evolution to act on such high-level planning (it seems that finding babies cute can only lead to reproductive behavior through pretty conscious intermediary planning stages). Relatedly, I wonder if men typically have a basic urge to father children, beyond immediate sexual attraction?
Eliezer’s form of moral realism about good (as a real but particular shared concept of value which is not universally compelling to minds) seems to imply that most of us prefer to be at least a little bit evil, and can’t necessarily be persuaded otherwise through reason.
Seems right.
And Nietzsche would probably argue the two impulses towards good and evil aren’t really opposites anyway.
6: Wait, this question only seems salient to me because I’m driving a flesh-robot evolved under contingent conditions. The regularities in human sex-associated categories are of very little importance to me on reflection, except as a proxy for (others’) sanity. I should shut up about gender and focus on solving X-risk.
7: ok, but a lot of why I want to solve x-risk is that I have a positive view of a human+ world where people have radical transhumanism of various kinds available, including being able to do things like full-functioning style transfer of their body in various currently-weird ways, which goes far beyond gender into things like: what if replace my dna with something more efficient that keeps everything beautiful about dna but works at 3 kelvin in deep space while I make art in orbit of pluto for the next 10,000 years, and also I still want to look like an attractive ape even though at that point I won’t really be one
I don’t get why this take was so much more popular than my post though—like, do people really think that winning the culture war on gender will effect whether we can have a trans humanist future after the singularity? This does not make sense to me.
I’d guess it’s levels of political spam machine derived messaging, scissor-statement-ish stuff. I strong agree’d and strong downvoted your post because of that. I guess mine was sufficiently galaxy brained to dodge the association somewhat? also, like, I’m pitching something to look forward to—a thing to seek, in addition to the normal thing to avoid. and it seems like a lot of the stuff that backs the current disagreements is more obviously solved if you have tech advanced enough to do what I was describing. idk though.
Being able to do weird stuff with body swapping is like 2.5% of the reason I want to solve x-risk, but more power to you—by choosing increasing natural number labels, I tacitly implied the existence of higher levels, and if this is yours then go for it :)
Where is the hard evidence that LLMs are useful?
Has anyone seen convincing evidence of AI driving developer productivity or economic growth?
It seems I am only reading negative results about studies on applications.
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
https://www.lesswrong.com/posts/25JGNnT9Kg4aN5N5s/metr-research-update-algorithmic-vs-holistic-evaluation
And in terms of startup growth:
https://www.lesswrong.com/posts/hxYiwSqmvxzCXuqty/generative-ai-is-not-causing-ycombinator-companies-to-grow
apparently wider economic measurements are not clear?
Also agency still seems very bad, about what I would have expected from decent scaffolding on top of GPT-3:
https://www.lesswrong.com/posts/89qhQH8eHsrZxveHp/claude-plays-whatever-it-wants
(Plus ongoing poor results on Pokémon, modern LLMs still can only win with elaborate task-specific scaffolding)
Though performance on the IMO seems impressive, the very few examples of mathematical discoveries by LLMs don’t seem (to me) to be increasing much in either frequency or quality, and so far are mostly of type “get a better lower bound by combinatorially trying stuff” which seems to advantage computers with or without AI. Also, again, even that type of example is rare, probably the vast majority of such attempts have failed and we only hear about a few successful ones, none of which seem to have been significant for any reason other than coming from an LLM.
I increasingly suspect a lot of the recent progress in LLMs has been illusory, from overfitting to benchmarks which may even leak to the training set (am I right about this?) and seeming useful, and METR is sufficiently good at their job that this will become apparent in task length measurements before the 8 hour mark.
I’m trying to make belief in rapid LLM progress pay rent, and at some point benchmarks are not the right currency. Maybe that point is “not yet” and we see useful applications only right before superintelligence etc. but I am skeptical of that narrative; at least, it does little to justify short timelines, because it leaves the point of usefulness to guesswork.
Are you looking for utility in all the wrong places?
Recent news have quite a few mentions of: AI tanking the job prospects of fresh grads across multiple fields and, at the same time, AI causing a job market bloodbath in the usual outsourcing capitals of the world.
That sure lines up with known AI capabilities.
AI isn’t at the point of “radical transformation of everything” yet, clearly. You can’t replace a badass crew of x10 developers who can build the next big startup with AIs today. AI doesn’t unlock all that many “things that were impossible before” either—some are here already, but not enough to upend everything. What it does instead is take the cheapest, most replaceable labor on the market, and make it cheaper and more replaceable. That’s the ongoing impact.
idk if these are good search results, but I asked claude to look up and see if citations seem to justify the claim and if we care about the results someone should read the articles for real
Yep, that’s what I’ve seen.
The “entry-level jobs” study looked alright at a glance. I did not look into the claims of outsourcing job losses in any more detail—only noted that it was claimed multiple times.
Citation needed
I’m not saying it’s a bad take, but I asked for strong evidence. I want at least some kind of source.
There’s this recent paper, see Zvi’s summary/discussion here. I have not looked into it deeply. Looks a bit weird to me. Overall, the very fact that there’s so much confusion around whether LLMs are or are not useful is itself extremely weird.
(Disclaimer: off-the-cuff speculation, no idea if that is how anything works.)
I’m not sure how much I buy this narrative, to be honest. The kind of archetypical “useless junior dev” who can be outright replaced by an LLM probably… wasn’t being hired to do the job anyway, but instead as a human-capital investment? To be transformed into a middle/senior dev, whose job an LLM can’t yet do. So LLMs achieving short-term-capability parity with juniors shouldn’t hurt juniors’ job prospects, because they weren’t hired for their existing capabilities anyway.
Hmm, perhaps it’s not quite like this. Suppose companies weren’t “consciously” hiring junior developers as a future investments; that they “thought”[1] junior devs are actually useful, in the sense that if they “knew” they were just a future investment, they wouldn’t have been hired. The appearance of LLMs who are as capable as junior devs would now remove the pretense that the junior devs provide counterfactual immediate value. So their hiring would stop, because middle/senior managers would be unable to keep justifying it, despite the quiet fact that they were effectively not being hired for their immediate skills anyway. And so the career pipeline would get clogged.
Maybe that’s what’s happening?
(Again, no idea if that’s how anything there works, I have very limited experience in that sphere.)
In a semi-metaphorical sense, as an emergent property of various social dynamics between the middle managers reporting on juniors’ performance to senior managers who set company priorities based in part on what would look good and justifiable to the shareholders, or something along those lines.
This is the hardest evidence anyone has brought up in this thread (?) but I’m inclined to buy your rebuttal about the trend really starting in 2022 which it is hard to believe comes from LLMs.
I don’t think it’s reasonable to expect such evidence to appear after such short period of time. There were no hard evidence that electricity is useful in a sense you are talking about until 1920s. Current LLMs are clearly not AGIs in a sense that they can integrate into economy as migrant labor, therefore, productivity gains from LLMs are bottlenecked on users.
I find this reply broadly reasonable, but I’d like to see some systematic investigations of the analogy between gradual adoption and rising utility of electricity and gradual adoption and rising utility of LLMs (as well as other “truly novel technologies”).
That’s interesting, but adoption of LLMs has been quite fast.
There is a difference between adoption as in “people are using it” and adoption as in “people are using it in economically productive way”. I think supermajority of productivity from LLMs is realized as pure consumer surplus right now.
I understand your theory.
However I am asking in this post for hard evidence.
If there is no hard evidence, that doesn’t prove a negative, but it does mean a lot of LW is engaging in a heavy amount of speculation.
My impression is that so far the kinds of people whose work could be automated aren’t the kind to navigate the complexities of building bespoke harnesses to have llms do useful work. So we have the much slower process of people manually automating others.
The part where you have to build bespoke harnesses seems suspicious to me.
What if, you know, something about how the job needs to be done changes?
I want an option to filter for writing with zero LLM influence.
I do not trust LLMs and I am not sure how I feel about LLM / human collaboration. As systems become more powerful, I am worried that they may become a serious epistemic hazard, up to and including actually hacking my brain. I would like to be able to protect myself from this aggressively.
For that reason, I think that the current LW policy on LLM usage is insufficient. Every post that uses an LLM in any part of its production process whatsover should be flagged as such. Personally, I am currently willing to accept some LLM usage upstream of the writing I read and I would not routinely filter such posts out of my feed, but I would like the option to do so (which I would occasionally use as a sanity check) very aggressively and with no exceptions. Basically, an off-switch.
I would also like to be able to filter out any writing of which even a single word is LLM generated (except perhaps parenthetically). I think I would use this option routinely, though perhaps I would also like to exempt specific users (e.g. authors I have followed). But this softer option should allow consultation with LLMs, experiments with LLMs, etc.
I consider it epistemic violence that I occasionally discover, after the fact, that an LLM was used extensively in the writing process of a post.
I think extensive use of LLM should be flagged at the beginning of a post, but “uses an LLM in any part of its production process whatsoever” would probably result in the majority of posts being flagged and make the flag useless for filtering. For example I routinely use LLMs to check my posts for errors (that the LLM can detect), and I imagine most other people do so as well (or should, if they don’t already).
Unfortunately this kind of self flagging/reporting is ultimately not going to work, as far as individually or societally protecting against AI-powered manipulation, and I doubt there will be a technical solution (e.g. AI content detector or other kind of defense) either (short of solving metaphilosophy). I’m not sure it will do more good than harm even in the short run because it can give a false sense of security and punish the honest / reward the dishonest, but still lean towards trying to establish “extensive use of LLM should be flagged at the beginning of a post” as a norm.
My own data point: for the vast majority of my posts, there is zero LLM involved at any stage.
I recently, rather embarrassingly, made a post with a massive error which an LLM would have found immediately. I seriously misread a paper in a way that cut/pasting the paper and the post into Claude and asking “any egregious misreadings” would have stopped me from making that post. This is far too useful for me to turn down, and this kind of due diligence is +EV for everyone.
Yes mostly agree. Unless the providers themselves log all responses and expose some API to check for LLM generation, we’re probably out of luck here, and incentives are strong to defect.
One thing I was thinking about (similar to i.e—speedrunners) is just making a self-recording or screenrecording of actually writing out the content / post? This probably can be verified by an AI or neutral third party. Something like a “proof of work” for writing your own content.
Grammarly has https://www.grammarly.com/authorship if you want to prove that you wrote something.
If it became common to demand and check proofs of (human) work, there will be a strong incentive to use AI to generate such proofs, which doesn’t not seem very hard to do.
I don’t expect the people on LW that I read to intentionally lie about stuff.
Maybe we want a multi-level categorization scheme instead? Something like:
Level 0: Author completely abstains from LLM use in all contexts (not just this post)
Level 1: Author uses LLMs but this particular post was made with no use of LLM whatsoever
Level 2: LLM was used (e.g. to look up information), but no text/images in the post came out of LLM
Level 3: LLM was used for light editing and/or image generation
Level 4: LLM was used for writing substantial parts
Level 5: Mostly LLM-generated with high-level human guidance/control/oversight
any reason not to just start doing that as post tags? no convenient way to do it for comments though.
This is an edge case, but just flagging that it’s a bit unclear to me how to apply this to my own post in a useful way. As I’ve disclosed in the post itself:
Does this count as Level 3 or 4? o3 provided a substantial idea, but the resulting proof was entirely written down by myself. I’m also unsure whether the full drafting of precisely one paragraph (which summarizes the rest of the post) by GPT-5 counts as editing or the writing of substantial parts.
We need another “level” here, probably parallel to the others, for when LLMs are used for idea-generation, criticism of outlines, as a discussion partner et cetera. For instance, let’s say I think about countries that are below their potential in some tragic way, like Russia and Iran, countries with loads of cultural capital, educated population, that historically have lots going for them. Then I can ask an LLM “any other countries like that?” and it might mention, say, North Korea, Iraq and Syria, maybe Greece or Turkey or South Italy, with some plausible story attached to them. When I do this interaction with an LLM the end product is going to be colored by it. If I initially intended to talk about how Russia and Iran have been destroyed by some particular forms of authoritarianism, my presentation, hypothesis, or whatever, will likely be modified so I can put Greece and Iraq into the same bucket. This alters my initial thoughts and probably changes my though-generation process into a mold more-or-less shaped by the LLM, “hacking my brain”. When this happen across many posts, it’s likely to make writing homogenized not through writing style, but semantic content.
This example is kinda weak, but I think this is the kind of thing OP is worried about. But I’d be curious to hear stronger examples if anyone can think of them.
I use LLMs for basically anything substantial that I write. Like, a lot of my knowledge of random facts about the world is downstream of having asked LLMs about it. It would be IMO pretty dumb to write a post that is e.g. trying to learn from past social movement failures and not have an LLM look over it to see whether it’s saying anything historically inaccurate.
So I do think there needs to be some bar here that is not “LLMs were involved in any way”. I do share a bunch of concerns in the space.
Uhhh… that seems maybe really bad. Do you sometimes do the kind of check which, if it were applied to The New York Times pre-AI, would be sufficient to make Gell-Mann Amnesia obvious?
Personally, the most I’ve relied on LLMs for a research project was the project behind this shortform in February 2025, and in hindsight (after reading up on some parts more without an LLM) I think I ended up with a very misleading big picture as a result. I no longer use LLMs for open-ended learning like that; it was worth trying but not a good idea in hindsight.
Do you then look over what the LLM has said and see whether it’s saying anything historically inaccurate, without using an LLM?
Not most of the time! Like, I sometimes ask multiple LLMs, but I don’t verify every fact that an LLM tells me, unless it’s a domain where I predict LLMs are particularly likely to hallucinate. I keep in mind that stuff is sometimes hallucinated, but most of the time it’s fine to know that something is quite probably true.
There’s no such thing as “a domain where LLMs are particularly likely to hallucinate”. In every domain there’s some obscure jagged boundary, not very far from normal standard questions to ask, where LLMs will hallucinate, usually plausibly to a non-expert.
To me, this sounds like you’re simply pushing the problem a little bit downstream without actually addressing it. You’re still not verifying the facts; you’re just getting another system with similar flaws to the first (you). You aren’t actually fact checking at any point.
That is not how bayesian evidence works. I am treating LLM output as somewhat less trustworthy than I would trust what a colleague of mine says, but not fundamentally different. I am skeptical that you spend your days double checking every conversation you have with another human. I also don’t think you should spend your days double checking every single thing an LLM tells you.
This feels kind of like the early conversations about Wikipedia where people kept trying to insist Wikipedia is “not a real source”.
If you’re asking a human about some even mildly specialized topic, like history of Spain in the 17th century or different crop rotation methods or ordinary differential equations, and there’s no special reason that they really want to appear like they know what they’re talking about, they’ll generally just say “IDK”. LLMs are much less like that IME. I think this is actually a big difference in practice, at least in the domains I’ve tried (reproductive biology). LLMs routinely give misleading / false / out-of-date / vague-but-deceptively-satiating summaries.
I agree the LLMs are somewhat worse, especially compared to rationalist-adjacent experts in specialized fields, but they really aren’t that bad for most things. Like I researched the state of the art of datacenter security practices yesterday, and I am not like 99% confident that the AI got everything right, but I am pretty sure it helped me understand the rough shape of things a lot better.
This seems fine and good—for laying some foundations, which you can use for your own further theorizing, which will make you ready to learn from more reliable + rich expert sources over time. Then you can report that stuff. If instead you’re directly reporting your immediately-post-LLM models, I currently don’t think I want to read that stuff, or would want a warning. (I’m not necessarily pushing for some big policy, that seems hard. I would push for personal standards though.)
Fwiw, in my experience LLMs lie far more than early Wikipedia or any human I know, and in subtler and harder to detect ways. My spot checks for accuracy have been so dismal/alarming that at this point I basically only use them as search engines to find things humans have said.
I am wondering whether your experiences were formed via the first generation of reasoning models, and my guess is you are also thinking of asking different kinds of questions.
The thing that LLMs are really great at is to speak and think in the ontology and structure that is prevalent among experts in any field. This is usually where the vast majority of evidence comes from. LLMs aren’t going to make up whole ontologies about how bankruptcy law works, or how datacenter security works. It might totally make up details, but it won’t make up the high-level picture.
Second, this has just gotten a lot better over the last 6 months. GPT-5 still lies a good amount, but vastly less than o1 or o3. I found o1 almost unusable on this dimension.
Datapoint: I’m currently setting up a recording studio at Lighthaven, and I am using them all the time to get guides for things like “how to change a setting on this camera” or “how to use this microphone” or “how to use this recording software”.
Yes, they confabulate menus and things a lot, but as long as I keep uploading photos of what I actually see, they know the basics much better than me (e.g. what bit rate to set the video vs the audio, where to look to kill the random white noise input I’m getting, etc).
I’d say they confabulate like 50% of the time but that they’re still a much more effective search engine for me than google, and can read the manual much faster than me. My guess is I simply couldn’t do some of the projects I’m doing without them.
It’s perfectly fine to have strong personal preferences for what content you consume, and how it’s filtered, and to express these preferences. I don’t think it’s cool to make hyperbolic accusations of violence. It erodes the distinctions we make between different levels of hostility that help prevent conflicts from escalating. I don’t think undisclosed LLM assistance can even be fairly characterized as deceptive, much less violent.
I don’t think it’s hyperbolic at all; I think this is in fact a central instance of the category I’m gesturing at as “epistemic violence.” For instance, p-hacking, lying, manipulation, misleading data, etc. If you don’t think that category is meaningful or you dislike my name for it, can you be more specific about why? Or why this is not an instance? Another commenter @Guive objected to my usage of the word violence here because “words can’t be violence” which is I think a small skirmish of a wider culture war which I am really not trying to talk about.
To be explicit (again) I do not in any way want to imply that somehow a person using an LLM without disclosing it justifies physical violence against them. I also don’t think it’s intentionally an aggression. But depending on the case, it CAN BE seriously negligent towards the truth and community truth seeking norms, and in that careless negligence it can damage the epistemics of others, when a simple disclaimer / “epistemic status” / source would have been VERY low effort to add. I have to admit I hesitate to say this so explicitly a bit because many people I respect use LLMs extensively, and I am not categorically against this, and I feel slightly bad about potentially burdening or just insulting them—generally speaking I feel some degree of social pressure against saying this. And as a result I hesitate to back down from my framing, without a better reason than that it feels uncomfortable and some people don’t like it.
The way I think about this is a bit more like “somehow, we need immune systems against arbitrary nuanced persuasion.” Which is for sure a very hard problem, but, I don’t think simple tricks of “check if LLM influenced” will turn out to be that useful.
It seems like a good start—for instance, it would be potentially useful data.
I think at the very least you want more metadata about how the AI was used.
Something like “somehow automatically track metadata about how documents came to be and include it”, the way you might try to do with photography. (I guess the metaphor here is more like “have text documents automatically include info about what text entered via “paste” instead of by typing manually?”)
It tends to be bad (or at least costly) to have a rule that has the property that violations of the rule cannot reliably be detected, which leads to the question of how you propose to detect LLM-written content.
I agree. I really don’t like it. Like Buck posted earlier
And I don’t think he did anything against lesswrong rules, or anything immoral really, but I still really don’t like it.
If it was up to me, we’d have a rule that every single word in your post should either be physically typed by you, or be in quotation marks.
So its fine if you copy your article into some AI and ask it to fix grammar mistakes, as long as you go and fix them yourself.
Its also fine to have a fair bit of LLM involvement in the post, even conceptual stuff and writing, as long as the finished product is typed up by you.
What way I know every single word has at least passed through the brain of the author.
I don’t really believe there is any such thing as “epistemic violence.” In general, words are not violence.
Semantics; it’s obviously not equivalent to physical violence.
From the moment I understood the weakness of my flesh, it disgusted me
UPDATE: the blog post I referred to from Scott Aaronson has now been updated to reflect that A) the trick from GPT-5 “should have been obvious” but more importantly a human has already come up with a better trick (directly replacing the one from GPT-5) which resolves an open problem left in the paper. To me, this opens the possibility that GPT-5 may have net slowed down the overall research process (do we actually want papers to be written faster, but of possibly-lower quality?) though I would guess it was still a productivity booster. Still, I am less impressed with this result than I was at first.
Awhile back, I claimed that LLMs had not produced original insights, resulting in this question: https://www.lesswrong.com/posts/GADJFwHzNZKg2Ndti/have-llms-generated-novel-insights
To be clear, I wasn’t talking about the kind of weak original insight like saying something technically novel and true but totally trivial or uninteresting (like, say, calculating the value of an expression that was maybe never calculated before but could be calculated with standard techniques). Obviously, this is kind of a blurred line, but I don’t think it’s an empty claim at all: an LLM could falsify my claim by outputting the proof of a conjecture that mathematicians were interested in.
At the time, IMO no one could come up with a convincing counter example. Now, I think the situation is a lot less clear, and it’s very possible that this will in retrospect be the LAST time that I can reasonably claim that what I said holds up. For instance, GPT-5 apparently helped Scott Aaronson prove a significant result: https://scottaaronson.blog/?p=9183#comments
This required some back and forth iteration where it made confident mistakes he had to correct. And, it’s possible that this tiny part of the problem didn’t require original thinking on its own.
However, it’s also possible that I am actually just on copium and should admit I was wrong (or at least, what I said then is wrong now). I’m not sure. Anything slightly more convincing than this would be enough to change my mind.
I’m aware of various small improvements to combinatorial bounds, usually either from specialized systems or not hard enough to be interesting enough or (usually) both. Has anyone seen anything beyond this (and beyond Aaronson’s example)?
For my part, I now (somewhat newly) find LLMs useful as a sort of fuzzy search engine which can be used before the real search engine to figure out what to search, which includes usefulness for research, but certainly does not include DOING research.
Some signal: Daniel Litt, the mathematician who seems most clued-in regarding LLM use, still doesn’t think there have been any instances of LLMs coming up with new ideas.
I’m currently watching this space closely, but I don’t think anything so far has violated my model. LLMs may end up useful for math in the “prove/disprove this conjecture” way, but not in the “come up with new math concepts (/ideas)” way.
Ah, though perhaps our cruxes there differed from the beginning, if you count “prove a new useful conjecture” as a “novel insight”. IMO, that’d only make them good interactive theorem provers, and wouldn’t bear much on the question of “can they close the loop on R&D/power the Singularity”.
Meta: I find the tone of your LW posts and comments to be really good in some way and I want to give positive feedback and try to articulate what I like about the vibe.
I’d describe it as something like: independent thinking/somewhat original takes expressed in a chill, friendly manner, without deliberate contrarianism, and scout mindset but not in a performative way. Also no dunking on stuff. But still pushing back on arguments you disagree with.
To my tastes this is basically optimal. Hope this comment doesn’t make you overthink it in the future. And maybe this can provide some value to others who are thinking about what style to aim for/promote.
Edit: maybe a short name for it is being intellectually disagreeable while being socially agreeable? Idk that’s probably an oversimplification.
I think it might be worthwhile to distinguish cases where LLMs came up with a novel insight on their own vs. were involved, but not solely responsible.
You wouldn’t credit Google for the breakthrough of a researcher who used Google when making a discovery, even if the discovery wouldn’t have happened without the Google searches. The discovery maybe also wouldn’t have happened without the eggs and toast the researcher had for breakfast.
“LLMs supply ample shallow thinking and memory while the humans supply the deep thinking” is a different and currently much more believable claim than “LLMs can do deep thinking to come up with novel insights on their own.”
I agree, I just want to note if the goalposts are moving from “no novel insights” to “no deep thinking.”
In my view, you don’t get novel insights without deep thinking except extremely rarely by random, but you’re right to make sure the topic doesn’t shift without anyone noticing.
Full Scott Aaronson quote in case anyone else is interested:
(couldn’t resist including that last sentence)
A couple more (recent) results that may be relevant pieces of evidence for this update:
A multimodal robotic platform for multi-element electrocatalyst discovery
“Here we present Copilot for Real-world Experimental Scientists (CRESt), a platform that integrates large multimodal models (LMMs, incorporating chemical compositions, text embeddings, and microstructural images) with Knowledge-Assisted Bayesian Optimization (KABO) and robotic automation. [...] CRESt explored over 900 catalyst chemistries and 3500 electrochemical tests within 3 months, identifying a state-of-the-art catalyst in the octonary chemical space (Pd–Pt–Cu–Au–Ir–Ce–Nb–Cr) which exhibits a 9.3-fold improvement in cost-specific performance.”
Generative design of novel bacteriophages with genome language models
“We leveraged frontier genome language models, Evo 1 and Evo 2, to generate whole-genome sequences with realistic genetic architectures and desirable host tropism [...] Experimental testing of AI-generated genomes yielded 16 viable phages with substantial evolutionary novelty. [...] This work provides a blueprint for the design of diverse synthetic bacteriophages and, more broadly, lays a foundation for the generative design of useful living systems at the genome scale.”
I don’t feel equipped to assess this.
FWIW, my understanding is that Evo 2 is not a generic language model that is able to produce innovations, it’s a transformer model trained on a mountain of genetic data which gave it the ability to produce new functional genomes. The distinction is important, see a very similar case of GPT-4b.
This may help with the second one:
https://www.lesswrong.com/posts/k5JEA4yFyDzgffqaL/guess-i-was-wrong-about-aixbio-risks
How about this one?
https://scottaaronson.blog/?p=9183
That appears to be the same one I linked.
Though possibly you grabbed the link in a superior way (not to comments).
The textbook reading group on “An Introduction to Universal Artificial Intelligence,” which introduces the necessary background for AIXI research, has started, and really gets underway this Monday (Sept. 8th) with sections 2.1 − 2.6.2. Now is about the last chance to easily jump in (since we have only read the intro chapter 1 so far). Please read in advance and be prepared to ask questions and/or solve some exercises. First session had around 20-25 attendees, will probably break up into groups of 5.
Meeting calendar is on the website: https://uaiasi.com/
Reach out to me in advance for a meeting link, DM or preferably colewyeth@gmail.com. Include your phone number if you want to be added to the WhatsApp group (optional).
Pitch for reading the book from @Alex_Altair: https://www.lesswrong.com/posts/nAR6yhptyMuwPLokc/new-intro-textbook-on-aixi
This is following up on the new AIXI research community announcement: https://www.lesswrong.com/posts/H5cQ8gbktb4mpquSg/launching-new-aixi-research-community-website-reading-group
I wonder if the reason that polynomial time algorithms tend to be somewhat practical (not runtime n^100) is just that we aren’t smart enough to invent really necessarily complicated polynomial time algorithms.
Like, the obvious way to get n^100 is to nest 100 for loops. A problem which can only be solved in polynomial time by nesting 100 for loops (presumably doing logically distinct things that cannot be collapsed!) is a problem that I am not going to solve in polynomial time…
Reasons I deem more likely:
1. Selection effect: if it’s unfeasible you don’t work on it/don’t hear about it, in my personal experience n^3 is already slow
2. If in n^k k is high, probably you have some representation where k is a parameter and so you say it’s exponential in k, not that it’s polinomial
1: Not true, I hear about exponential time algorithms! People work on all sorts of problems only known to have exponential time algorithms.
2: Yes, but the reason k only shows up as something we would interpret as a parameter and not as a result of the computational complexity of an algorithm invented for a natural problem is perhaps because of my original point—we can only invent the algorithm if the problem has structure that suggests the algorithm, in which case the algorithm is collapsible and k can be separated out as an additional input for a simpler algorithm.
I think canonical high-degree polynomial problem is high-dimensional search. We usually don’t implement exact grid search because we can deploy Monte Carlo or gradient descent. I wonder if there is any hard lower bounds on approximation hardness for polynomial time problems.
My guess would just be that its more of a strong law of small numbers type thing. Look at eg this.
A fun illustration of survivorship/selection bias is that nearly every time I find myself reading an older paper, I find it insightful, cogent, and clearly written.
Selection bias isn’t the whole story. The median paper in almost every field is notably worse than it was in, say, 1985. Academia is less selective than it used to be—in the U.S., there are more PhDs per capita, and the average IQ/test scores/whatever metric has dropped for every level of educational attainment.
Grab a journal that’s been around for a long time, read a few old papers and a few new papers at random, and you’ll notice the difference.
To what degree is this true regarding elite-level Ph.D. programs that are likely to lead to publication in (i) mathematics and/or (ii) computer science?
Separately, we should remember that academic selection is a relative metric, i.e. graded on a curve. So, when it comes to Ph.D. programs, is the median 2024 Ph.D. graduate more capable (however you want to define it) than the corresponding graduate from 1985? This is complex, involving their intellectual foundations, depth of their specialized knowledge, various forms of raw intelligence, attention span, collaborative skills, communication ability (including writing skills), and computational tools?
I realize what I’m about to say next may not be representative of the median Ph.D. student, but it feels to me the 2024 graduates of, say, Berkeley or MIT (not to mention, say, Thomas Jefferson High School) are significantly more capable than the corresponding 1985 graduates. Does my sentiment resonate with others and/or correspond to some objective metrics?
Based on my observations, I would also think some current publication chasing culture could get people push out papers more quickly (in some particular domains like CS), even though some papers may be partially completed
Rationality (and other) heuristics I’ve actually found useful for getting stuff done, but unfortunately you probably won’t:
1: Get it done quickly and soon. Every step of every process outside of yourself will take longer than expected, so the effective deadline is sooner than you might think. Also if you don’t get it done soon you might forget (or forget some steps).
1(A): 1 is stupidly important.
2: Do things that set off positive feedback loops. Aggressively try to avoid doing other things. I said aggressively.
2(A): Read a lot, but not too much.*
3: You are probably already making fairly reasonable choices over the action set you are considering. It’s easiest to fall short(er) of optimal behavior by failing to realize you have affordances. Discover affordances.
4: Eat.
(I think 3 is least strongly held)
*I’m describing how to get things done. Reading more has other benefits, for instance if you don’t know the thing you want to get done yet, and its pleasant and self-actualizing.
What was 3?
2(A) lol, fixed now.
The primary optimization target for LLM companies/engineers seems to be making them seem smart to humans, particularly the nerds who seem prone to using them frequently. A lot of money and talent is being spent on this. It seems reasonable to expect that they are less smart than they seem to you, particularly if you are in the target category. This is a type of Goodharting.
In fact, I am beginning to suspect that they aren’t really good for anything except seeming smart, and most rationalists have totally fallen for it, for example Zvi insisting that anyone who is not using LLMs to multiply their productivity is not serious (this is a vibe not a direct quote but I think it’s a fair representation of his writing over the last year). If I had to guess, LLMs have 0.99x’ed my productivity by occasionally convincing me to try to use them which is not quite paid for by very rarely fixing a bug in my code. The number is close to 1x because I don’t use them much, not because they’re almost useful. Lots of other people seem to have much worse ratios because LLMs act as a superstimulus for them (not primarily a productivity tool).
Certainly this is an impressive technology, surprising for its time, and probably more generally intelligent than anything else we have built—not going to get into it here, but my model is that intelligence is not totally “atomic” but has various pieces, some of which are present and some missing in LLMs. But maybe the impressiveness is not a symptom of intelligence, but the intelligence a symptom of impressiveness—and if so, it’s fair to say that we have (to varying degrees) been tricked.
I use LLMs throughout my personal and professional life. The productivity gains are immense. Yes hallucination is a problem but it’s just as spam/ads/misinformation on wikipedia/internet—an small drawback that doesn’t oblivate the ginormous potential of the internet/LLMs
I am 95% certain you are leaving value on the table.
I do agree straight LLMs are not generally intelligent (in the sense of universal intelligence/AIXI) and therefore not completely comparable to humans.
On LLMs vs search on internet: agree that LLMs are very helpful in many ways, both personally and professionally, but the worse parts of the misinformation in LLM comparing to wikipedia/internets in my opinion includes: 1) it is relatively more unpredictable when the model will hallucinate, whereas for wikipedia/internet, you would generally expect higher accuracy for simpler/purely factual/mathematical information. 2) it is harder to judge the credibility without knowing the source of the information, whereas on the internet, we could get some signals where the website domain, etc.
From my personal experience, I agree. I find myself unexcited about trying the newest LLM models. My main use-case in practice these days is Perplexity, and I only use it when I don’t care much about the accuracy of the results (which ends up being a lot, actually… maybe too much). Perplexity confabulates quite often even with accurate references in hand (but at least I can check the references). And it is worse than me at the basics of googling things, so it isn’t as if I expect it to find better references than me; the main value-add is in quickly reading and summarizing search results (although the new Deep Research option on Perplexity will at least iterate through several attempted searches, so it might actually find things that I wouldn’t have).
I have been relatively persistent about trying to use LLMs for actual research purposes, but the hallucination rate seems to go to 100% almost whenever an accurate result would be useful to me.
The hallucination rate does seem adequately low when talking about established mathematics (so long as you don’t ask for novel implications, such as applying ideas to new examples). For this and for other reasons I think they can be quite helpful for people trying to get oriented to a subfield they aren’t familiar with—it can make for a great study partner, so long as you verify what it says be checking other references.
Also decent for coding, of course, although the same caveat applies—coders who are already an expert in what they are trying to do will get much less utility out of it.
I recently spoke to someone who made a plausible claim that LLMs were 10xing their productivity in communicating technical ideas in AI alignment with something like the following workflow:
Take a specific cluster of failure modes for thinking about alignment which you’ve seen often.
Hand-write a large, careful prompt document about the cluster of alignment failure modes, which includes many specific trigger-action patterns (if someone makes mistake X, then the correct counterspell to avoid the mistake is Y). This document is highly opinionated and would come off as rude if directly cited/quoted; it is not good communication. However, it is something you can write once and use many times.
When responding to an email/etc, load the email and the prompt document into Claude and ask Claude to respond to the email using the document. Claude will write something polite, informative, and persuasive based on the document, with maybe a few iterations of correcting Claude if its first response doesn’t make sense. The person also emphasized that things should be written in small pieces, as quality declines rapidly when Claude tries to do more at once.
They also mentioned that Claude is awesome at coming up with meme versions of ideas to include in powerpoints and such, which is another useful communication tool.
So, my main conclusion is that there isn’t a big overlap between what LLMs are useful for and what I personally could use. I buy that there are some excellent use-cases for other people who spend their time doing other things.
Still, I agree with you that people are easily fooled into thinking these things are more useful than they actually are. If you aren’t an expert in the subfield you’re asking about, then the LLM outputs will probably look great due to Gell-Mann Amnesia type effects. When checking to see how good the LLM is, people often check the easier sorts of cases which the LLMs are actually decent at, and then wrongly generalize to conclude that the LLMs are similarly good for other cases.
I expect he’d disagree, for example I vaguely recall him mentioning that LLMs are not useful in a productivity-changing way for his own work. And 10x specifically seems clearly too high for most things even where LLMs are very useful, other bottlenecks will dominate before that happens.
10x was probably too strong but his posts are very clear he things it’s a large productivity multiplier. I’ll try to remember to link the next instance I see.
Found the following in the Jan 23 newsletter:
This is a schelling point for a AI X-risk activism.
I appreciate all of the reviews of “If Anyone Builds it…” particularly from those with strong disagreements. Of course, this community really rewards independent thinking (sometimes to the point of contrarian nitpicking) and this is a very good thing.
However, I basically agree with the central claims of the book and most of the details. So I want to point out that this is a good time for more ~typical normie activism, if you think the book is basically right.
I have personally convinced a few people to buy it, advertised it on basically every channel available to me, and plan to help run the Kitchner-Waterloo reading group (organized by @jenn).
Maybe it’s not the best possible introduction to X-risk, maybe it’s not even the best one out there, but it is the schelling-introduction to AI X-risk.
Maybe activism can possibly backfire in some cases, but non-activism does not look like a plan to me.
Maybe this isn’t the best possible time for activism, but it is the schelling-time for activism.
So if you are part of this coalition, even if you have some disagreements, please spread the word now.
Mathematics students are often annoyed that they have to worry about “bizarre or unnatural” counterexamples when proving things. For instance, differentiable functions without continuous derivative are pretty weird. Particularly engineers tend to protest that these things will never occur in practice, because they don’t show up physically. But these adversarial examples show up constantly in the practice of mathematics—when I am trying to prove (or calculate) something difficult, I will try to cram the situation into a shape that fits one of the theorems in my toolbox, and if those tools don’t naturally apply I’ll construct all kinds of bizarre situations along the way while changing perspective. In other words, bizarre adversarial examples are common in intermediate calculations—that’s why you can’t just safely forget about them when proving theorems. Your logic has to be totally sound as a matter of abstraction or interface design—otherwise someone will misuse it.
While I think the reaction against pathological examples can definitely make sense, and in particular there is a bad habit of some people to overfocus on pathological examples, I do think mathematics is quite different from other fields in that you want to prove that a property holds for all objects with a certain property, or prove that there exists an object with a certain property, and in these cases you can’t ignore the pathological examples, because they can provide you with either solutions to your problem, or show why your approach can’t work.
This is why I didn’t exactly like Dalcy’s point 3 here:
https://www.lesswrong.com/posts/GG2NFdgtxxjEssyiE/dalcy-s-shortform#qp2zv9FrkaSdnG6XQ
There is also the reverse case, where it is often common practice in math or logic to ignore bizarre and unnatural counterexamples. For example, first-order Peano arithmetic is often identified with Peano arithmetic in general, even though the first order theory allows the existence of highly “unnatural” numbers which are certainly not natural numbers, which are the subject of Peano arithmetic.
Another example is the power set axiom in set theory. It is usually assumed to imply the existence of the power set of each infinite set. But the axiom only implies that the existence of such power sets is possible, i.e. that they can exist (in some models), not that they exist full stop. In general, non-categorical theories are often tacitly assumed to talk about some intuitive standard model, even though the axioms don’t specify it.
Eliezer talks about both cases in his Highly Advanced Epistemology 101 for Beginners sequence.
From soares and fallenstein “towards idealized decision theory”:
“If someone cannot formally state what it means to find the best decision in theory, then they are probably not ready to construct heuristics that attempt to find the best decision in practice.”
This statement seems rather questionable. I wonder if it is a load-bearing assumption.
best seems to do a lot of the work there.
I’m not sure what you mean. What is “best” is easily arrived at. If you’re a financier and your goal is to make money, then any formal statement about your decision will maximize money. If you’re a swimmer and your goal is to win an Olympic gold medal, then a formal statement of your decision will obviously include “win gold medal”—part of the plan to execute it may include “beat the current world record for swimming in my category” but “best” isn’t doing the heavy lifting here—the actual formal statement that encapsulates all the factors is—such as what are the milestones.
And if someone doesn’t know what they mean when they think of what is best—then the statement holds true. If you don’t know what is “best” then you don’t know what practical heuristics will deliver you “good enough”.
To put it another way—what are the situations where not defining in clear terms what is best still leads to well constructed heuristics to find the best decision in practice? (I will undercut myself—there is something to be said for exploration [1]and “F*** Around and Find Out” with no particular goal in mind. )
Bosh! Stephen said rudely. A man of genius makes no mistakes. His errors are volitional and are the portals of discovery. - Ulysses, James Joyce
is your only goal in life to make money?
is your only goal in life to win a gold medal?
and if they are, how do you define the direction such that you’re sure that among all possible worlds, maximizing this statement actually produces the world that maxes out goal-achievingness?
that’s where decision theories seem to me to come in. the test cases of decision theories are situations where maxing out, eg, CDT, does not in fact produce the highest-goal-score world. that seems to me to be where the difference Cole is raising comes up: if you’re merely moving in the direction of good worlds you can have more complex strategies that potentially make less sense but get closer to the best world, without having properly defined a single mathematical statement whose maxima is that best world. argmax(CDT(money)) may be less than genetic_algo(policy, money, iters=1b) even though argmax is a strict superlative, if the genetic algo finds something closer to, eg, argmax(FDT(money)).
edit: in other words, I’m saying “best” as opposed to “good”. what is good is generally easily arrived at. it’s not hard to find situations where what is best is intractable to calculate, even if you’re sure you’re asking for it correctly.
by using the suffix “-(e)st”. “The fastest” “the richest” “the purple-est” “the highest” “the westernmost”. That’s the easy part—defining theoretically what is best. Mapping that theory to reality is hard.
I don’t know what is in theory the best possible life I can live, but I do know ways that I can improve my life significantly.
Can you rephrase that—because you’re mentioning theory and possibility at once which sounds like an oxymoron to me. That which is in theory best implies that which is impossible or at least unlikely. If you can rephrase it I’ll probably be able to understand what you mean.
Also, if you had a ‘magic wand’ and could change a whole raft of things at once, do you have a vision of your “best” life that you preference? Not necessarily a likely or even possible one. But one that of all fantasies you can imagine is preeminent? That seems to me to be a very easy way to define the “best”—it’s the one that the agent wants most. I assume most people have their visions of their own “best” lives, am I a rarity in this? Or do most people just kind of never think about what-ifs and have fantasies? And isn’t that, or the model of the self and your own preferences that influences that fantasy going to similarly be part of the model that dictates what you “know” would improve your life significantly.
Because if you consider it an improvement, then you see it as being better. It’s basic English: Good, Better, Best.
I think that “ruggedness” and “elegance” are alternative strategies for dealing with adversity—basically tolerating versus preparing for problems. Both can be done more or less skillfully: low-skilled ruggedness is just being unprepared and constantly suffering, but the higher skilled version is to be strong, healthy, and conditioned enough to survive harsh circumstances without suffering. Low-skilled elegance is a waste of time (e.g. too much makeup but terrible skin) and high skilled elegance is… okay basically being ladylike and sophisticated. Yes I admit it this is mostly about gender.
Other examples: it’s rugged to have a very small number of high quality possessions you can easily throw in a backpack in under 20 minutes, including 3 outfits that cover all occasions. It’s elegant to travel with three suitcases containing everything you could possibly need to look and feel your best, including a both an ordinary and a sun umbrella.
I also think a lot of misunderstanding between genders results from these differing strategies, because to some extent they both work but are mutually exclusive. Elegant people may feel taken advantage of because everyone starts expecting them to do all the preparation. Rugged people may feel they aren’t given enough autonomy and get impatient (“no, I’ll be fine without sunscreen”). There are obvious advantages to having a rugged and elegant member of a team or couple though.
Thanks to useful discussions with my friends / family: Ames, Adriaan, Lauren. Loosely expect I picked the idea up from someone else, can’t be original.
This one highlights that the sense of “elegant” you mean is not the math & engineering sense, which is associated with minimalism.
If you asked me to guess what would be the ‘elegant’ counterpoint to ‘traveling with a carefully-curated of the very best prepper/minimalist/nomad/hiker set of gear which ensure a bare minimum of comfort’ was, I would probably say something like ‘traveling with nothing but cash/credit card/smartphone’. You have elegantly solved the universe of problems you encounter while traveling by choosing a single simple tool which can obtain nearly anything from the universe of solutions.
Maybe “grace” is a better term than elegance?
Your categories are not essentially gendered, although I understand why we feel that way. For example, in your travel-packing example my wife would be considered rugged while I would be considered elegant, under your definitions. I also think that in traditional Chinese culture, both of your definitions would be considered masculine. (Sorry women, I guess you get nothing lol)
I also think that we apply these strategies unequally in different parts of our lives. I’d guess if you have to give a research talk at a conference, you’d take an ‘elegant’ approach of “let me prepare my talk well and try to anticipate possible questions the audience will have” instead of “let me do the minimal prep and then just power through any technical difficulties or difficult questions’.
Maybe our gender socialization leads us to favour different strategies in different situations along gendered lines?
I think these things mostly split along gender lines but there are many exceptions, just like pretty much everything else about gender.
to complicate this along gender lines for fun, when i first read your first sentence i totally reversed the descriptions since it’s rugged and masculine to tackle problems and elegant and feminine to tolerate them. per a random edgy tumblr i follow:
that sounds more “rugged” than “elegant” by your definitions, no?
I also read that little edgy story and thought at the time that sentence made no sense.
I still think that.
Since this is mid-late 2025, we seem to be behind the aggressive AI 2027 schedule? The claims here are pretty weak, but if LLMs really don’t boost coding speed, this description still seems to be wrong.
[edit: okay actually it’s pretty much mid 2025 still, months don’t count from zero though probably they should because they’re mod 12]
I don’t think there’s enough evidence to draw hard conclusions about this section’s accuracy in either direction, but I would err on the side of thinking ai-2027′s description is correct.
Footnote 10, visible in your screenshot, reads:
SOTA models score at:
• 83.86% (codex-1, pass@8)
• 80.2% (Sonnet 4, pass@several, unclear how many)
• 79.4% (Opus 4, pass@several)
(Is it fair to allow pass@k? This Manifold Market doesn’t allow it for its own resolution, but here I think it’s okay, given that the footnote above makes claims about ‘coding agents’, which presumably allow iteration at test time.)
Also, note the following paragraph immediately after your screenshot:
AI twitter sure is full of both impressive cherry-picked examples, but also stories about bungled tasks. I also agree that the claims about “find[ing] ways to fit AI agents into the workflows” is exceedingly weak. But it’s certainly happening. A quick Google for “AI agent integration” turns up this article from IBM, where agents are diffusing across multiple levels of the company.
If I understand correctly, Claude’s pass@X benchmarks mean multiple sampling and taking the best result. This is valid so long as compute cost isn’t exceeding equivalent cost of an engineer.
codex’s pass @ 8 score seems to be saying “the correct solution was present in 8 attempts, but the model doesn’t actually know what the correct result is”. That shouldn’t count.
Why do I see no higher than about 75% here?
https://www.swebench.com
Yeah, I wanted to include that paragraph but it didn’t fit in the screenshot. It does seem slightly redeeming for the description. Certainly the authors hedged pretty heavily.
Still, I think that people are not saving days by chatting with AI agents on slack. So there’s a vibe here which seems wrong. The vibe is that these agents are unreliable but are offering very significant benefits. That is called into question by the METR report showing they slowed developers down. There are problems with that report and I would love to see some follow-up work to be more certain.
I appreciate your research on the SOTA SWEBench-Verified scores! That’s a concrete prediction we can evaluate (less important than real world performance, but at least more objective). Since we’re now in mid-late 2025 (not mid 2025), it appears that models are slightly behind their projections even for pass@k, but certainly they were in the right ballpark!
Sorry, this is the most annoying kind of nitpicking on my part, but since I guess it’s probably relevant here (and for your other comment responding to Stanislav down below), the center point of the year is July 2, 2025. So we’re just two weeks past the absolute mid-point – that’s 54.4% of the way through the year.
Also, the codex-1 benchmarks released on May 16, while Claude 4′s were announced on May 22 (certainly before the midpoint).
The prediction is correct on all counts, and perhaps slightly understates progress (though it obviously makes weak/ambiguous claims across the board).
The claim that “coding and research agents are beginning to transform their professions” is straightforwardly true (e.g. 50% of Google lines of code are now generated by AI). The METR study was concentrated in March (which is early 2025).
And it is not currently “mid-late 2025”, it is 16 days after the exact midpoint of the year.
Where is that 50% number from? Perhaps you are referring to this post from google research. If so, you seem to have taken it seriously out of context. Here is the text before the chart that shows 50% completion:
This is referring to inline code completion—so its more like advanced autocomplete than an AI coding agent. It’s hard to interpret this number, but it seems very unlikely this means half the coding is being done by AI and much more likely that it is often easy to predict how a line of code will end given the first half of that line of code and the previous context. Probably 15-20% of what I type into a standard linux terminal is autocompleted without AI?
Also, the right metric is how much AI assistance is speeding up coding. I know of only one study on this, from METR, which showed that it is slowing down coding.
Progress wise this seems accurate but the usefulness gap is probably larger than the one this paints.
Two days later, is this still a fail? ChatGPT agent is supposed to exactly that. There seems to be a research model within openAI that is capable of getting gold on IMO without any tools.
Maybe it does not meet the expectations yet. Maybe it will with GPT-5 release. We do not know if the new unreleased model is capable of helping with research. However, it’s worth considering the possibility that it could be on a slightly slower timeline and not a complete miss.
i wonder to what extent leadership at openai see ai 2027 as a bunch of milestones that they need to meet, to really be as powerful/scary as they’re said to be.
e.g. would investors/lenders be more hesitant if openai seems to be ‘lagging behind’ ai 2027 predictions?
Yeah, I wouldn’t be surprised if these timelines are at least somewhat hyperstitious
Yeah, well, let’s wait and see what GPT-5 looks like.
But it isn’t August or September yet. Maybe someone will end up actually creating capable agents. In addition, the amount of operations used for creating Grok 4 was estimated as 4e27--6e27, which seems to align with the forecast. The research boost rate by Grok 4 or a potentially tweaked model wasn’t estimated. Maybe Grok 4 or an AI released in August will boost research speed?
It was indicated in the opening slide of Grok 4 release livestream that Grok 4 was pretrained with the same amount of compute as Grok 3, which in turn was pretrained on 100K H100s, so probably 3e26 FLOPs (40% utilization for 3 months with 1e15 FLOP/s per chip). RLVR has a 3x-4x lower compute utilization than pretraining, so if we are insisting on counting RLVR in FLOPs, then 3 months of RLVR might be 9e25 FLOPs, for the total of 4e26 FLOPs.
Stargate Abilene will be 400K chips in GB200 NVL72 racks in 2026, which is 10x more FLOP/s than 100K H100s. So it’ll be able to train 4e27-8e27 FLOPs models (pretraining and RLVR, in 3+3 months), and it might be early 2027 when they are fully trained. (Google is likely to remain inscrutable in their training compute usage, though Meta might also catch up by then.)
(I do realize it’s probably some sort of typo, either yours or in your unnamed source. But 10x is almost 2 years of even the current fast funding-fueled scaling, that’s not a small difference.)
We’ve been going on back and forth on this a bit—it seems like your model suggests AGI in 2027 is pretty unlikely?
That is, we see the first generation of massively scaled RLVR around 2026/2027. So it kind of has to work out of the box for AGI to arrive that quickly?
I suppose this is just speculation though. Maybe it’s useful enough that the next generation is somehow much, much faster to arrive?
By 2027, we’ll also have 10x scaled-up pretraining compared to current models (trained on 2024 compute). And correspondingly scaled RLVR, with many diverse tool-using environments that are not just about math and coding contest style problems. If we go 10x lower than current pretraining, we get original GPT-4 from Mar 2023, which is significantly worse than the current models. So with 10x higher pretraining than current models, the models of 2027 might make significantly better use of RLVR training than the current models can.
Also, 2 years might be enough time to get some sort of test-time training capability started, either with novel or currently-secret methods, or by RLVRing models to autonomously do post-training on variants of themselves to make them better at particular sources of tasks during narrow deployment. Apparently Sutskever’s SSI is rumored to be working on the problem (at 39:25 in the podcast), and overall this seems like the most glaring currently-absent faculty. (Once it’s implemented, something else might end up a similarly obvious missing piece.)
I’d give it 10% (for 2025-2027). From my impression of the current capabilities and the effect of scaling so far, the remaining 2 OOMs of compute seem like a 30% probability of getting there (by about 2030), with a third of it in the first 10x of the remaining scaling, that is 10% with 2026 compute (for 2027 models). After 2029, scaling slows down to a crawl (relatively speaking), so maybe another 50% for the 1000x of scaling in 2030-2045 when there’ll also be time for any useful schlep, with 20% remaining for 2045+ (some of it from a coordinated AI Pause, which I think is likely to last if at all credibly established). If the 5 GW AI training systems don’t get built in 2028-2029, they are still likely to get built a bit later, so this essentially doesn’t influence predictions outside the 2029-2033 window, some probability within it merely gets pushed a bit towards the future.
So this gives a median of about 2034. Once AGI is still not working in the early 2030s even with more time for schlep, probability at that level of compute starts going down, so 2030s are front-loaded in probability even though compute is not scaling faster in the early 2030s than later.
The section I shared is about mid 2025. I think August-September is late 2025.
Early: January, February, March, April
Mid: May, June, July, August
Late: September, October, November, December
Okay yes but this thread of discussion has gone long enough now I think—we basically agree up to a month.
It looks like the market is with Kokotajlo on this one (apparently this post must be expanded to see the market).
For reference, I’d also bet on 8+ task length (on METR’s benchmark[1]) by 2027. Probably significantly earlier; maybe early 2026, or even end of this year. Would not be shocked if OpenAI’s IMO-winning model already clears that.
You say you expect progress to stall at 4-16 hours because solving such problems would require AIs to develop sophisticated models of them. My guess is that you’re using intuitions regarding at what task-lengths it would be necessary for a human. LLMs, however, are not playing by the same rules: where we might need a new model, they may be able to retrieve a stored template solution. I don’t think we really have any idea at what task length this trick would stop working for them. I could see it being “1 week”, or “1 month”, or “>1 year”, or “never”.
I do expect “<1 month”, though. Or rather, that even if the LLM architecture is able to support arbitrarily big templates, the scaling of data and compute will run out before this point; and then plausibly the investment and the talent pools would dry up as well (after LLMs betray everyone’s hopes of AGI-completeness).
Not sure what happens if we do get to “>1 year”, because on my model, LLMs might still not become AGIs despite that. Like, they would still be “solvers of already solved problems”, except they’d be… able to solve… any problem in the convex hull of the problems any human ever solved in 1 year...? I don’t know, that would be very weird; but things have already gone in very weird ways, and this is what the straightforward extrapolation of my current models says. (We do potentially die there.[2])
Aside: On my model, LLMs are not on track to hit any walls. They will keep getting better at the things they’ve been getting better at, at the same pace, for as long as the inputs to the process (compute, data, data progress, algorithmic progress) keep scaling at the same rate. My expectation is instead that they’re just not going towards AGI, so “no walls in their way” doesn’t matter; and that they will run out of fuel before the cargo cult of them becomes Singularity-tier transformative.
(Obviously this model may be wrong. I’m still fluctuating around 80%.)
Recall that it uses unrealistically “clean” tasks and accepts unviable-in-practice solutions: the corresponding horizons for real-world problem-solving seem much shorter. As do the plausibly-much-more-meaningful 80%-completion horizons – this currently sits at 26 minute. (Something like 95%-completion horizons may actually be the most representative metric, though I assume there are some issues with estimating that.)
Probably this way:
Ok, but surely there has to be something they aren’t getting better at (or are getting better at too slowly). Under your model they have to hit a wall in this sense.
I think your main view is that LLMs won’t ever complete actually hard tasks and current benchmarks just aren’t measuring actually hard tasks or have other measurement issues? This seems inconsistent with saying they’ll just keep getting better though unless your hypothesizing truely insane benchmark flaws right?
Like, if they stop improving at <1 month horizon lengths (as you say immediately above the text I quoted) that is clearly a case of LLMs hitting a wall right? I agree that compute and resources running out could cause this, but it’s notable that we expect ~1 month in not that long, like only ~3 years at the current rate.
That’s only if the faster within-RLVR rate that has been holding during the last few months persists. On my current model, 1 month task lengths at 50% happen in 2030-2032, since compute (being the scarce input of scaling) slows down compared to today, and I don’t particularly believe in incremental algorithmic progress as it’s usually quantified, so it won’t be coming to the rescue.
Compared to the post I did on this 4 months ago, I have even lower expectations that the 5 GW training systems (for individual AI companies) will arrive on trend in 2028, they’ll probably get delayed to 2029-2031. And I think the recent RLVR acceleration of the pre-RLVR trend only pushes it forward a year without making it faster, the changed “trend” of the last few months is merely RLVR chip-hours catching up to pretraining chip-hours, which is already essentially over. Though there are still no GB200 NVL72 sized frontier models and probably no pretraining scale RLVR on GB200 NVL72s (which would get better compute utilization), so that might give the more recent “trend” another off-trend push first, perhaps as late as early 2026, but then it’s not yet a whole year ahead of the old trend.
I distinguish “the LLM paradigm hitting a wall” and “the LLM paradigm running out of fuel for further scaling”.
Yes, precisely. Last I checked, we expected scaling to run out by 2029ish, no?
Ah, reading the comments, I see you expect there to be some inertia… Okay, 2032 / 7 more years would put us at “>1 year” task horizons. That does make me a bit more concerned. (Though 80% reliability is several doublings behind, and I expect tasks that involve real-world messiness to be even further behind.)
“Ability to come up with scientific innovations” seems to be one.
Like, I expect they are getting better at the underlying skill. If you had a benchmark which measures some toy version of “produce scientific innovations” (AidanBench?), and you plotted frontier models’ performance on it against time, you would see the number going up. But it currently seems to lag way behind other capabilities, and I likewise don’t expect it to reach dangerous heights before scaling runs out.
The way I would put it, the things LLMs are strictly not improving on are not “specific types of external tasks”. What I think they’re not getting better at – because it’s something they’ve never been capable of doing – are specific cognitive algorithms which allow to complete certain cognitive tasks in a dramatically more compute-efficient manner. We’ve talked about this some before.
I think that, in the limit of scaling, the LLM paradigm is equivalent to AGI, but that it’s not a very efficient way to approach this limit. And it’s less efficient along some dimensions of intelligence than along others.
This paradigm attempts to scale certain modules a generally intelligent mind would have to ridiculous levels of power in order to make up for the lack of other necessary modules. This will keep working to improve performance across all tasks, as long as you keep feeding LLMs more data and compute. But there seems to be only a few “GPT-4 to GPT-5” jumps left, and I don’t think it’d be enough.
I think if this were right LLMs would already be useful for software engineering and able to make acceptable PRs.
I also guess that the level of agency you need to actually beat Pokémon is require probably somewhere around 4 hours.
We’ll see who’s right—bet against me if you haven’t already! Though maybe it’s not a good deal anymore. I can see it going either way.
They are sometimes able to make acceptable PRs, usually when context gathering for the purpose of iteratively building up a model of the relevant code is not a required part of generating said PR.
It seems to me that current-state LLMs don’t learn nearly anything from the context since they have trouble fitting it into their attention span. For example, GPT-5 can create fun stuff from just one prompt and an unpublished LLM solved five out of six problems of IMO 2025, while the six problems together can be expressed by using 3k bytes. However, METR found that “on 18 real tasks from two large open-source repositories, early-2025 AI agents often implement functionally correct code that cannot be easily used as-is, because of issues with test coverage, formatting/linting, or general code quality.”
I strongly suspect that this bottleneck will be ameliorated by using neuralese[1] with big internal memory.
Neuralese with big internal memory
The Meta paper which introduced neuralese had GPT-2 trained to have the thought at the end fed into the beginning. Alas, the amount of bits transferred is equal to the amount of bits in a FLOP number multiplied by the size of the final layer. A potential CoT generates ~16.6 extra bits of information per activation.
At the cost of absolute loss of interpretability, neuralese on steroids could have the LLM of GPT-3′s scale transfer tens of millions of bits[2] in the latent space. Imagine GPT-3 175B (which had 96 layers and 12288 neurons in each) receiving an augmentation using the last layer’s results as a steering vector at the beginning, the pre-last layer as a steering vector at the second layer, etc. Or passing the steering vectors through a matrix. These amplifications, at most, double the compute required to run GPT-3, while requiring extra millions of bytes of dynamic memory.
For comparison, the human brain’s short-term memory alone is described by activations of around 86 billions of neurons. And that’s ignoring the middle-term memory and the long-term one...
However, there is Knight Lee’s proposal where the AIs are to generate multiple tokens instead of using versions of neuralese.
For comparison, the longest contest window is 1M tokens long and is used by Google Gemini. 1M tokens are represented by 16.6 M bits.
People have been talking about neuralese since at least when AI 2027 was published and I think much earlier, but it doesn’t seem to have materialized.
I think LLMs can be useful for software engineering and can sometimes write acceptable PRs. (I’ve very clearly seen both of these first hand.) Maybe you meant something slightly weaker, like “AIs would be able to write acceptable PRs at a rate of >1/10 on large open source repos”? I think this is already probably true, at least with some scaffolding and inference time compute. Note that METR’s recent results were on 3.7 sonnet.
I’m referring to METR’s recent results. Can you point to any positive results on LLMs writing acceptable PRs? I’m sure that they can in some weak sense e.g. a sufficiently small project with sufficiently low standards, but as far as I remember the METR study concluded zero acceptable PRs in their context.
METR found that 0⁄4 PRs which passed test cases and they reviewed were also acceptable to review. This was for 3.7 sonnet on large open source repos with default infrastructure.
The rate at which PRs passed test cases was also low, but if you’re focusing on the PR being viable to merge conditional on passing test cases, the “0/4” number is what you want. (And this is consistent with 10% or some chance of 35% of PRs being mergable conditional on passing test cases, we don’t have a very large sample size here.)
I don’t think this is much evidence that AI can’t sometimes write acceptable PRs in general and there examples of AIs doing this. On small projects I’ve worked on, AIs from a long time ago have written a big chunk of code ~zero-shot. Anecdotally, I’ve heard of people having success with AIs completing tasks zero-shot. I don’t know what you mean by “PR” that doesn’t include this.
I think I already answered this:
Very little liquidity though
Hence sharing here—I’m not buying (at least for now) because I’m curious where it ends up, but obviously I think “Wyeth wins” shares are at a great price right now ;)
I’ve thrown on some limit orders if anyone is strongly pro-Kokotajlo.
Particularly after my last post, I think my lesswrong writing has had bit too high of a confidence / effort ratio. Possibly I just know the norms of this site well enough lately that I don’t feel as much pressure to write carefully. I think I’ll limit my posting rate a bit while I figure this out.
LW doesn’t punish, it upvotes-if-interesting and then silently judges.
(Effort is not a measure of value, it’s a measure of cost.)
Yeah, I was thinking greater effort is actually necessary in this case. For context, my lower effort posts are usually more popular. Also the ones that focus on LLMs which is really not my area of expertise.
mood
The hedonic treadmill exists because minds are built to climb utility gradients—absolute utility levels are not even uniquely defined, so as long as your preferences are time-consistent you can just renormalize before maximizing the expected utility of your next decision.
I find this vaguely comforting. It’s basically a decision-theoretic and psychological justification for stoicism.
(must have read this somewhere in the sequences?)
I think self-reflection in bounded reasoners justifies some level of “regret,” “guilt,” “shame,” etc., but the basic reasoning above should hold to first order, and these should all be treated as corrections and for that reason should not get out of hand.
AI-specific pronouns would actually be kind of helpful. “They” and “It” are both frequently confusing. “He” and “she” feel anthropomorphic and fake.
Perhaps LLM’s are starting to approach the intelligence of today’s average human: capable of only limited original thought, unable to select and autonomously pursue a nontrivial coherent goal across time, learned almost everything they know from reading the internet ;)
This doesn’t seem to be reflected in the general opinion here, but it seems to me that LLM’s are plateauing and possibly have already plateaued a year or so ago. Scores on various metrics continue to go up, but this tends to provide weak evidence because they’re heavily gained and sometimes leak into the training data. Still, those numbers overall would tend to update me towards short timelines, even with their unreliability taken into account—however, this is outweighed by my personal experience with LLM’s. I just don’t find them useful for practically anything. I have a pretty consistently correct model of the problems they will be able to help me with and it’s not a lot—maybe a broad introduction to a library I’m not familiar with or detecting simple bugs. That model has worked for a year or two without expanding the set much. Also, I don’t see any applications to anything economically productive except for fluffy chatbot apps.
Huh o1 and the latest Claude were quite huge advances to me. Basically within the last year LLMs for coding went to “occasionally helpful, maybe like a 5-10% productivity improvement” to “my job now is basically to instruct LLMs to do things, depending on the task a 30% to 2x productivity improvement”.
I’m in Canada so can’t access the latest Claude, so my experience with these things does tend to be a couple months out of date. But I’m not really impressed with models spitting out slightly wrong code that tells me what functions to call. I think this is essentially a more useful search engine.
Use Chatbot Arena, both versions of Claude 3.5 Sonnet are accessible in Direct Chat (third tab). There’s even o1-preview in Battle Mode (first tab), you just need to keep asking the question until you get o1-preview. In general Battle Mode (for a fixed question you keep asking for multiple rounds) is a great tool for developing intuition about model capabilities, since it also hides the model name from you while you are evaluating the response.
Just an FYI unrelated to the discussion—all versions of Claude are available in Canada through Anthropic, you don’t even need third party services like Poe anymore.
Source: https://www.anthropic.com/news/introducing-claude-to-canada
Base model scale has only increased maybe 3-5x in the last 2 years, from 2e25 FLOPs (original GPT-4) up to maybe 1e26 FLOPs[1]. So I think to a significant extent the experiment of further scaling hasn’t been run, and the 100K H100s clusters that have just started training new models in the last few months promise another 3-5x increase in scale, to 2e26-6e26 FLOPs.
Right, the metrics don’t quite capture how smart a model is, and the models haven’t been getting much smarter for a while now. But it might be simply because they weren’t scaled much further (compared to original GPT-4) in all this time. We’ll see in the next few months as the labs deploy the models trained on 100K H100s (and whatever systems Google has).
This is 3 months on 30K H100s, $140 million at $2 per H100-hour, which is plausible, but not rumored about specific models. Llama-3-405B is 4e25 FLOPs, but not MoE. Could well be that 6e25 FLOPs is the most anyone trained for with models deployed so far.
I’ve noticed they perform much better on graduate-level ecology/evolution questions (in a qualitative sense—they provide answers that are more ‘full’ as well as technically accurate). I think translating that into a “usefulness” metric is always going to be difficult though.
The last few weeks I felt the opposite of this. I kind of go back and forth on thinking they are plateauing and then I get surprised with the new Sonnet version or o1-preview. I also experiment with my own prompting a lot.
I’ve noticed occasional surprises in that direction, but none of them seem to shake out into utility for me.
Is this a reaction to OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements Slows?
No that seems paywalled, curious though?
I’ve been waiting to say this until OpenAI’s next larger model dropped, but this has now failed to happen for so long that it’s become it’s own update, and I’d like to state my prediction before it becomes obvious.
It seems like the most impressive discoveries from AI still consistently come from narrow systems: https://deepmind.google/discover/blog/discovering-new-solutions-to-century-old-problems-in-fluid-dynamics/
I don’t think AlphaEvolve counts as a narrow system, and it discovered a bunch of things.
Cool post though, thanks for linking it.
I didn’t find any of its discoveries very impressive, they were mostly improved combinatorial bounds if I remember properly.
An ASI perfectly aligned to me must literally be a smarter version of myself. Anything less than that is a compromise between my values and the values of society. Such a compromise at its extreme fills me with dread. I would much rather live in a society of some discord between many individually aligned ASI’s, than build a benevolent god.
An ASI aligned to a group of people likely should dedicate sovereign slivers of compute (optimization domains) for each of those people, and those people could do well with managing their domain with their own ASIs aligned to each of them separately. Optimization doesn’t imply a uniform pureed soup, it’s also possible to optimize autonomy, coordination, and interaction, without mixing them up.
Values judge what should be done, but also what you personally should be doing. An ASI value aligned to you will be doing the things that should be done (according to you, on reflection), but you wouldn’t necessarily endorse that you personally should be doing those things. Like, I want the world to be saved, but I don’t necessarily want to be in a position to need to try to save the world personally.
So an ASI perfectly aligned to you might help uplift you into a smarter version of yourself as one of its top priorities, and then go on to do various other things you’d approve of on reflection. But you wouldn’t necessarily endorse that it’s the smarter version of yourself that is doing those other things, you are merely endorsing that they get done.
I’m confused about that. I think you might be wrong, but I’ve heard this take before. If what you want is something that looks like a benevolent god, but one according to your own design, then that’s the “cosmopolitan empowerment by I just want cosmopolitanism” scenario. which I don’t trust, and so if I had the opportunity to design an AI, I would do my best to guarantee it’s cosmopolitanism-as-in-a-thing-others-actually-approve-of, for basically “values level LDT” reasons. see also interdimensional council of cosmopolitanisms
I think there’s more leeway here. E.g. instead of a copy of you, a “friend” ASI.
A benevolent god that understands your individual values and respects them seems pretty nice to me. Especially compared to a world of competing, individually aligned ASIs. (if your values are in the minority)
@Thomas Kwa will we see task length evaluations for Claude Opus 4 soon?
Anthropic reports that Claude can work on software engineering tasks coherently for hours, but it’s not clear if this means it can actually perform tasks that would take a human hours. I am slightly suspicious because they reported that Claude was making better use of memory on Pokémon, but this did not actually cash out as improved play. This seems like a fairly decisive test of my prediction that task lengths would stagnate at this point; if it does succeed at hours long tasks, I will want to see a careful evaluation of which tasks may or may not have been leaked, are the tasks cleaner than typical hours long SE tasks, etc.
I don’t run the evaluations but probably we will; no timeframe yet though as we would need to do elicitation first. Claude’s SWE-bench Verified scores suggest that it will be above 2 hours on the METR task set; the benchmarks are pretty similar apart from their different time annotations.
That’s a bit higher than I would have guessed. I compared the known data points that have SWE-bench and METR medians (sonnet 3.5,3.6,3.7, o1, o3, o4-mini) and got an r^2 = 0.96 model assuming linearity between log(METR_median) and log(swe-bench-error).
That gives an estimate more like 110 minutes for an Swe-bench score of 72.7%. Which works out to a sonnet doubling time of ~3.3 months. (If I throw out o4-mini, estimator is ~117 minutes.. still below 120)
Also would imply an 85% swe-bench score is something like a 6-6.5 hour METR median.
Since reasoning trace length increases with more steps of RL training (unless intentionally constrained), probably underlying scaling of RL training by AI companies will be observable in the form of longer reasoning traces. Claude 4 is more obviously a pretrained model update, not necessarily a major RLVR update (compared to Claude 3.7), and coherent long task performance seems like something that would greatly benefit from RLVR if it applies at all (which it plausibly does).
So I don’t particularly expect Claude 4 to be much better on this metric, but some later Claude ~4.2-4.5 update with more RLVR post-training released in a few months might do much better.
We can still check if it lies on the projected slower exponential curve before reasoning models were introduced.
Sure, but trends like this only say anything meaningful across multiple years, any one datapoint adds almost no signal, in either direction. This is what makes scaling laws much more predictive, even as they are predicting the wrong things. So far there are no published scaling laws for RLVR, the literature is still developing a non-terrible stable recipe for the first few thousand training steps.
It looks like Gemini is self-improving in a meaningful sense:
https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
Some quick thoughts:
This has been going on for months; on the bullish side (for ai progress, not human survival) this means some form of self-improvement is well behind the capability frontier. On the bearish side, we may not expect a further speed up on the log scale (since it’s already factored in to some calculations).
I did not expect this degree of progress so soon; I am now much less certain about the limits of LLMs and less prepared to dismiss very short timelines.
With that said… the problems that it has solved do seem to be somewhat exhaustive search flavored. For instance it apparently solved an open math problem, but this involved arranging a bunch of spheres. I’m not sure to what degree LLM insight was required beyond just throwing a massive amount of compute at trying possibilities. The self-improvements GDM reports are similar—like faster matrix multiplication in I think the 4x4 case. I do not know enough about these areas to judge whether AI is essential here or whether a vigorous proof search would work. At the very least, the system does seem to specialize in problems with highly verifiable solutions. I am convinced, but not completely convinced.
Also, for the last couple of months whenever I’ve asked why LLMs haven’t produced novel insights, I’ve often gotten the response “no one is just letting them run long enough to try.” Apparently GDM did try it (as I expected) and it seems to have worked somewhat well (as I did not expect).
Heads up: I am not an AI researcher or even an academic, just someone who keeps up with AI
But I do have quick thoughts as well;
Kernel optimization (which they claim is what resulted in the 1% decrease in training time) is something we know AI models are great at (see RE-Bench and the multiple arXiv papers on the matter, including from DeepSeek).
It seems to me like AlphaEvolve is more-or-less an improvement over previous models that also claimed to make novel algorithmic and mathematical discoveries (FunSearch, AlphaTensor) notably by using better base Gemini models and a better agentic framework. We also know that AI models already contribute to the improvement of AI hardware. What AlphaEvolve seems to do is to unify all of that into a superhuman model for those multiple uses. In the accompanying podcast they give us some further information:
The rate of improvement is still moderate, and the process still takes months. They phrase it as an interesting and promising area of progress for the future, not as a current large improvement.
They have not tried to distill all that data into a new model yet, which seems strange to me considering they’ve had it for a year now.
They say that a lot of improvements come from the base model’s quality.
They do present the whole thing as part of research rather than a product
So yeah I can definitely see a path for large gains in the future, thought for now those are still on similar timetables as per their own admission. They expect further improvements when base models improve and are hoping that future versions of AlphaEvolve can in turn shorten the training time for models, the hardware pipeline, and improve models in other ways. And for your point about novel discoveries, previous Alpha models seemed to already be able to do the same categories of research back in 2023, on mathematics and algorithmic optimization. We need more knowledgeable people to weight in, especially to compare with previous models of the same classification.
This is also a very small thing to keep in mind, but GDM models don’t often share the actual results of their models’ work as usable/replicable papers, which has caused experts to cast some doubts on results in the past. It’s hard to verify their results, since they’ll be keeping them close to their chests.
Unfortunate consequence of sycophantic ~intelligent chatbots: everyone can get their theories parroted back to them and validated. Particularly risky for AGI, where the chatbot can even pretend to be running your cognitive architecture. Want to build a neuro-quantum-symbolic-emergent-consciousness-strange-loop AGI? Why bother, when you can just put that all in a prompt!
A lot of new user submissions these days to LW are clearly some poor person who was sycophantically encouraged by an AI to post their crazy theory of cognition or consciousness or recursion or social coordination on LessWrong after telling them their ideas are great. When we send them moderation messages we frequently get LLM-co-written responses, and sometimes they send us quotes from an AI that has evaluated their research as promising and high-quality as proof that they are not a crackpot.
Basic sanity check: We can align human children, but can we align any other animals? NOT to the extent that we would trust them with arbitrary amounts of power, since they obviously aren’t smart enough for this question to make much sense. Just like, are there other animals that we’ve made care about us at least “a little bit?” Can dogs be “well trained” in a way where they actually form bonds with humans and will go to obvious personal risk to protect us, or not eat us even if they’re really hungry and clearly could? How about species further on the evolutionary tree like hunting falcons? Where specifically is the line?
Has any LLM with fixed scaffolding beaten Pokémon end to end with no hints?
As well as the “theoretical—empirical” axis, there is an “idealized—realistic” axis. The former distinction is about the methods you apply (with extremes exemplified by rigorous mathematics and blind experimentation, respectively). The later is a quality of your assumptions / paradigm. Highly empirical work is forced to be realistic, but theoretical work can be more or less idealized. Most of my recent work has been theoretical and idealized, which is the domain of (de)confusion. Applied research must be realistic, but should pragmatically draw on theory and empirical evidence. I want to get things done, so I’ll pivot in that direction over time.
Sometimes I wonder if people who obsess over the “paradox of free will” are having some “universal human experience” that I am missing out on. It has never seemed intuitively paradoxical to me, and all of the arguments about it seem either obvious or totally alien. Learning more about agency has illuminated some of the structure of decision making for me, but hasn’t really effected this (apparently) fundamental inferential gap. Do some people really have this overwhelming gut feeling of free will that makes it repulsive to accept a lawful universe?
I used to, as a child. I did accept a lawful universe, but I thought my perception of free will was in tension with that, so that perception must be “an illusion”.
My mother kept trying to explain to me that there was no tension between these things, because it was correct that my mind made its own decisions rather than some outside force. I didn’t understand what she was saying though. I thought she was just redefining ‘free will’ from a claim that human brains effectively had a magical ability to spontaneously ignore the laws of physics to a boring tautological claim that human decisions are made by humans rather than something else.
I changed my mind on this as a teenager. I don’t quite remember how, it might have been the sequences or HPMOR again. I realised that my imagination had still been partially conceptualising the “laws of physics” as some sort of outside force, a set of strings pulling my atoms around, rather than as a predictive description of me and the universe. Saying “the laws of physics make my decisions, not me” made about as much sense as saying “my fingers didn’t move, my hand did.” That was what my mother had been trying to tell me.
I don’t think so as I had success explaining away the paradox with concept of “different levels of detail”—saying that free will is a very high-level concept and further observations reveal a lower-level view, calling upon analogy with algorithmic programming’s segment tree.
(Segment tree is a data structure that replaces an array, allowing to modify its values and compute a given function over all array elements efficiently. It is based on tree of nodes, each of those representing a certain subarray; each position is therefore handled by several—specifically, O(logn) nodes.)
This might be related to whether you see yourself as a part of the universe, or as an observer. If you are an observer, the objection is like “if I watch a movie, everything in the movie follows the script, but I am outside the movie, therefore outside the influence of the script”.
If you are religious, I guess your body is a part of the universe (obeys the laws of gravity etc.), but your soul is the impartial observer. Here the religion basically codifies the existing human intuitions.
It might also depend on how much you are aware of the effects of your environment on you. This is a learned skill; for example little kids do not realize that they are hungry… they just get kinda angry without knowing why. It requires some learning to realize “this feeling I have right now—it is hunger, and it will probably go away if I eat something”. And I guess the more knowledge of this kind you accumulate, the easier it is to see yourself as a part of the universe, rather than being outside of it and only moved by “inherently mysterious” forces.
GPT-5’s insight on Scott Aaronson’s research problem (which I posted about) seems to be a weaker signal than I believed, see the update:
https://www.lesswrong.com/posts/RnKmRusmFpw7MhPYw/cole-wyeth-s-shortform?commentId=E5QCkGcs4eYoJNwhn
Self-reflection allows self-correction.
If you can fit yourself inside your world model, you can also model the hypothesis that you are wrong in some specific systematic way.
A partial model is a self-correction, because it says “believe as you will, except in such a case.”
This is the true significance of my results with @Daniel C:
https://www.lesswrong.com/posts/Go2mQBP4AXRw3iNMk/sleeping-experts-in-the-reflective-solomonoff-prior
That is, reflective oracles allow Solomonoff induction to think about ways of becoming less wrong.
If instead of building LLMs, tech companies had spent billions of dollars designing new competing search engines that had no ads but might take a few minutes to run and cost a few cents per query, would the result have been more or less useful?
Rather less useful to me personally as a software developer.
Besides that, I feel like this question is maybe misleading? If ex. Google built a new search engine that could answer queries like its current AI-powered search summaries, or like ChatGPT, wouldn’t that have to be some kind of language model anyway? Is there another class of thing besides AGI that could perform as well at that task?
(I assume you’re not suggesting just changing the pricing model of existing-style search engines, which already had a market experiment (ex. Kagi) some years ago with only mild success.)
I am thinking it would NOT answer like its current AI-powered search summaries, but would rather order actual search results but VERY intelligently.
I think that would require text comprehension too. I guess it’s an interesting question if you can build an AI that can comprehend text but not produce it?
My impression is that the decline of search engines has little to do with search ads. It has more to do with a decline in public webpage authoring in favor of walled gardens, chat systems, etc.: new organic human-written material that once would have been on a public forum site (or home page!) is today often instead in an unindexable Discord chat or inside an app. Meanwhile, spammy content on the public Web has continued to escalate; and now LLMs are helping make more and more of it.
But most of LLMs’ knowledge comes from the public Web, so clearly there is still a substantial amount of useful content on it, and maybe if search engines had remained good enough at filtering spam fewer people would have fled to Discord.
More useful. It would save us the step of having to check for hallucinations when doing research.
To what extent would a proof about AIXI’s behavior be normative advice?
Though AIXI itself is not computable, we can prove some properties of the agent—unfortunately, there are fairly few examples because of the “bad universal priors” barrier discovered by Jan Leike. In the sequential case we only know things like e.g. it will not indefinitely keep trying an action that yields minimal reward, though we can say more when the horizon is 1 (which reduces to the predictive case in a sense). And there are lots of interesting results about the behavior of Solomonoff induction (roughly speaking the predictive part of AIXI).
For the sake of argument though, assume we could prove some (more?) interesting statements about AIXI’s strategy—certainly this is possible for us computable beings. But would we want to take those statements as advice, or are we too ignorant to benefit from cargo-culting an inscrutable demigod like AIXI?
Can AI X-risk be effectively communicated by analogy to climate change? That is, the threat isn’t manifesting itself clearly yet, but experts tell us it will if we continue along the current path.
Though there are various disanalogies, this specific comparison seems both honest and likely to be persuasive to the left?
I don’t like it. Among various issues, people already muddy the waters by erroneously calling climate change an existential risk (rather than what it was, a merely catastrophic one, before AI timelines made any worries about climate change in the year 2100 entirely irrelevant), and it’s extremely partisan-coded. And you’re likely to hear that any mention of AI x-risk is a distraction from the real issues, which are whatever the people cared about previously.
I prefer an analogy to gain-of-function research. As in, scientists grow viruses/AIs in the lab, with promises of societal benefits, but without any commensurate acknowledgment of the risks. And you can’t trust the bio/AI labs to manage these risks, e.g. even high biosafety levels can’t entirely prevent outbreaks.
I agree that there is a consistent message here, and I think it is one of the most practical analogies, but I get the strong impression that tech experts do not want to be associated with environmentalists.
I think it would be persuasive to the left, but I’m worried that comparing AI x-risk to climate change would make it a left-wing issue to care about, which would make right-wingers automatically oppose it (upon hearing “it’s like climate change”).
Generally it seems difficult to make comparisons/analogies to issues that (1) people are familiar with and think are very important and (2) not already politicized.
I’m looking at this not from a CompSci point of view by a rhetoric point of view: Isn’t it much easier to make tenuous or even flat out wrong links between Climate Change and highly publicized Natural Disaster events that have lot’s of dramatic, visceral footage than it is to ascribe danger to a machine that hasn’t been invented yet, that we don’t know the nature or inclinations of?
I don’t know about nowadays but for me the two main pop-culture touchstones for me for “evil AI” are Skynet in Terminator, or HAL 9000 in 2001: A Space Odyssey (and by inversion—the Butlerian Jihad in Dune). Wouldn’t it be more expedient to leverage those? (Expedient—I didn’t say accurate)
Most ordinary people don’t know that no one understands how neural networks work (or even that modern “Generative A.I.” is based on neural networks). This might be an underrated message since the inferential distance here is surprisingly high.
It’s hard to explain the more sophisticated models that we often use to argue that human dis-empowerment is the default outcome but perhaps much better leveraged to explain these three points:
1) No one knows how A.I models / LLMs / neural nets work (with some explanation of how this is conceptually possible).
2) We don’t know how smart they will get how soon.
3) We can’t control what they’ll do once they’re smarter than us.
At least under my state of knowledge, this is also a particularly honest messaging strategy, because it emphasizes the fundamental ignorance of A.I. researchers.
“Optimization power” is not a scalar multiplying the “objective” vector. There are different types. It’s not enough to say that evolution has had longer to optimize things but humans are now “better” optimizers: Evolution invented birds and humans invented planes, evolution invented mitochondria and humans invented batteries. In no case is one really better than the other—they’re radically different sorts of things.
Evolution optimizes things in a massively parallel way, so that they’re robustly good at lots of different selectively relevant things at once, and has been doing this for a very long time so that inconceivably many tiny lessons are baked in a little bit. Humans work differently—we try to figure out what works for explainable, preferably provable reasons. We also blindly twiddle parameters a bit, but we can only keep so many parameters in mind at once and compare so many metrics—humanity has a larger working memory than individual humans, but the human innovation engine is still driven by linguistic theories, expressed in countable languages. There must be a thousand deep mathematical truths that evolution is already taking advantage of to optimize its DNA repair algorithms, or design wings to work very well under both ordinary and rare turbulent conditions, or minimize/maximize surface tensions of fluids, or invent really excellent neural circuits—without ever actually finding the elaborate proofs. Solving for exact closed form solutions is often incredibly hard, even when the problem can be well-specified, but natural selection doesn’t care. It will find what works locally, regardless of logical depth. It might take humans thousands of years to work some of these details out on paper. But once we’ve worked something out, we can deliberately scale it further and avoid local minima. This distinction in strategies of evolution v.s. humans rhymes with wisdom v.s. intelligence—though in this usage intelligence includes all the insight, except insofar as evolution located and acts through us. As a sidebar, I think some humans prefer an intuitive strategy that is more analogous to evolution’s in effect (but not implementation).
So what about when humans turn to building a mind? Perhaps a mind is by its nature something that needs to be robust, optimized in lots of little nearly inexplicable ways for arcane reasons to deal with edge cases. After all, isn’t a mind exactly that which provides an organism/robot/agent with the ability to adapt flexibly to new situations? A plane might be faster than a bird, throwing more power at the basic aerodynamics, but it is not as flexible—can we scale some basic principles to beat out brains with the raw force of massive energy expenditure? Or is intelligence inherently about flexibility, and impossible to brute force in that way? Certainly it’s not logically inconsistent to imagine that flexibility itself has a simple underlying rule—as a potential existence proof, the mechanics of evolutionary selection are at least superficially simple, though we can’t literally replicate it without a fast world-simulator, which would be rather complicated. And maybe evolution is not a flexible thing, but only a designer of flexible things. So neither conclusion seems like a clear winner a priori.
The empirical answers so far seem to complicate the story. Attempts to build a “glass box” intelligence out of pure math (logic or probability) have so far not succeeded, though they have provided useful tools and techniques (like statistics) that avoid the fallacies and biases of human minds. But we’ve built a simple outer loop optimization target called “next token prediction” and thrown raw compute at it, and managed to optimize black box “minds” in a new way (called gradient descent by backpropogation). Perhaps the process we’ve capture is a little more like evolution, designing lots of little tricks that work for inscrutable reasons. And perhaps it will work, woe unto us, who have understood almost nothing from it!
The most common reason I don’t use LLMs for stuff is that I don’t trust them. Capabilities are somewhat bottlenecked on alignment.
Human texts also need reasons to trust the takeaways from them, things like bounded distrust from reputational incentives, your own understanding after treating something as steelmanning fodder, expectation that the authors are talking about what they actually observed. So it’s not particularly about alignment with humans either. Few of these things apply to LLMs, and they are not yet good at writing legible arguments worth verifying, though IMO gold is reason to expect this to change in a year or so.
LLM coding assistants may actually slow developers down, contrary to their expectations:
https://www.lesswrong.com/posts/9eizzh3gtcRvWipq8/measuring-the-impact-of-early-2025-ai-on-experienced-open
(Epistemic status: I am signal boosting this with an explicit one-line summary that makes clear it is bearish for LLMs, because scary news about LLM capability acceleration is usually more visible/available than this update seems to be. Read the post for caveats.)
Optimality is about winning. Rationality is about optimality.
I guess Dwarkesh believes ~everything I do about LLMs and still think we probably get AGI by 2032:
https://www.dwarkesh.com/p/timelines-june-2025
@ryan_greenblatt made a claim that continual learning/online training can already be done, but that right now it’s not super-high returns and requires annoying logistical/practical work to be done, and right now AI issues are elsewhere like sample efficiency and robust self-verification.
That would explain the likelihood of getting AGI by the 2030s being pretty high:
https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/#pEBbFmMm9bvmgotyZ
Ryan Greenblatt’s original comment:
https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/#xMSjPgiFEk8sKFTWt
What are your timelines?
My distribution is pretty wide, but I think probably not before 2040.
This is not the kind of news I would have expected from short timeline worlds in 2023: https://www.techradar.com/computing/artificial-intelligence/chatgpt-is-getting-smarter-but-its-hallucinations-are-spiraling
I still don’t think that a bunch of free-associating inner monologues talking to each other gives you AGI, and it still seems to be an open question whether adding RL on top just works.
The “hallucinations” of the latest reasoning models look more like capability failures than alignment failures to me, and I think this points towards “no.” But my credences are very unstable; if METR task length projections hold up or the next reasoning model easily zero-shots Pokemon I will just about convert.
Investigating preliminary evaluations of o3 and o4-mini I am more convinced that task length is scaling as projected.
Pokémon has fallen, but as far as I can tell this relied on scaffolding improvements for Gemini 2.5 pro customized during the run, NOT a new smarter model.
Overall, I am already questioning my position one week later.
Pokémon is actually load-bearing for your models? I’m imagining a counterfactual world in which Sonnet 3.7′s initial report involved it beating Pokémon Red, and I don’t think my present-day position would’ve been any different in it.
Even aside from tons of walkthrough information present in LLMs’ training set, and iterative prompting allowing to identify and patch holes in LLMs’ pretrained instinctive game knowledge, Pokémon is simply not a good test of open-ended agency. At the macro-scale, the game state can only progress forward, and progressing it requires solving relatively closed-form combat/navigational challenges. Which means if you’re not too unlikely to blunder through each of those isolated challenges, you’re fated to “fail upwards”. The game-state topology doesn’t allow you to progress backward or get stuck in a dead end: you can’t lose a badge or un-win a boss battle. I. e.: there’s basically an implicit “long-horizon agency scaffold” built into the game.
Which means what this tests is mainly the ability to solve somewhat-diverse isolated challenges in sequence. But not the ability to autonomously decompose long-term tasks into said isolated challenges in a way such that the sequence of isolated challenges implacably points at the long-term task’s accomplishment.
Hmm, maybe I’m suffering from having never played Pokémon… who would’ve thought that could be an important hole in my education?
I think the hallucinations/reward hacking is actually a real alignment failure, but an alignment failure that happens to degrade capabilities a lot, though at least some of the misbehavior is probably due to context, but I have seen evidence that the alignment failures are more deliberate than regular capabilities failures.
That said, if this keeps happening, the likely answer is because capabilities progress is to a significant degree bottlenecked on alignment progress, such that you need a significant degree of progress on preventing specification gaming to get new capabilities, and this would definitely be a good world for misalignment issues if the hypothesis is true (which I put some weight on)
(Also, it’s telling that the areas where RL has worked best are areas where you can basically create unhackable reward models like many games/puzzles, and once reward hacking is on the table, capabilities start to decrease).
GDM has a new model: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/#advanced-coding
At a glance, it is (pretty convincingly) the smartest model overall. But progress still looks incremental, and I continue to be unconvinced that this paradigm scales to AGI. If so, the takeoff is surprisingly slow.
I’m worried about Scott Aaronson since he wrote “Deep Zionism.”
https://scottaaronson.blog/?p=9082
I think he’s coming from a good place, I can understand how he got here, but he really, really needs to be less online.
That moment when you’ve invested in building a broad and deep knowledge base instead of your own agency and then LLMs are invented.
it hurts
I don’t see it that way. Broad and deep knowledge is as useful as ever, and LLMs are no substitutes for it.
This anecdote comes to mind:
This fits with my experience. If you’re trying to do some nontrivial research or planning, you need to have a vast repository of high-quality mental models of diverse phenomena in your head, able to be retrieved in a split-second and immediately integrated into your thought process. If you need to go ask an LLM about something, this breaks the flow state, derails your trains of thought, and just takes dramatically more time. Not to mention unknown unknowns: how can you draw on an LLM’s knowledge about X if you don’t even know that X is a thing?
IMO, the usefulness of LLMs is in improving your ability to build broad and deep internal knowledge bases, rather than in substituting these internal knowledge bases.
This is probably right. Though perhaps one special case of my point remains correct: the value of a generalist as a member of a team may be somewhat reduced.
The value of a generalist with shallow knowledge is reduced, but you get a chance to become a generalist with relatively deep knowledge of many things. You already know the basics, so you can start the conversation with LLMs to learn more (and knowing the basics will help you figure out when the LLM hallucinates).
Back-of-the-envelope math indicates that an ordinary NPC in our world needs to double their power like 20 times over to become a PC. That’s a tough ask. I guess the lesson is either give up or go all in.
Can you expand on this? I’m not sure what you mean but am curious about it.
There are around 8 billion humans, so an ordinary person has a very small fraction of the power needed to steer humanity in any particular direction. A very large number of doublings are required to be a relevant factor.
That’s an interesting idea. However, people who read this comments probably already have power much greater than the baseline—a developed country, high intelligence, education, enough money and free time to read websites...
Not sure how many of those 20 doublings still remain.
I thought the statement was pretty clearly not about the average lesswronger.
But in terms of the “call to action” − 20 was pretty conservative, so I think it’s still in that range, and doesn’t change the conclusions one should draw much.
That moment when you want to be updateless about risk but updateful about ignorance, but the basis of your epistemology is to dissolve the distinction between risk and ignorance.
(Kind of inspired by @Diffractor)
Did a podcast interview with Ayush Prakash on the AIXI model (and modern AI), very introductory/non-technical:
Some errata:
The bat thing might have just been Thomas Nagel, I can’t find the source I thought I remembered.
At one point I said LLMs forget everything they thought previously between predicting (say) token six and seven and half to work from scratch. Because of the way the attention mechanism works it is actually a little more complicated (see the top comment from hmys). What I said is (I believe) still overall right but I would put that detail less strongly.
Hofstadter apparently was the one who said a human-level chess AI would rather talk about poetry.
Gary Kasparov would beat me at chess in some way I can’t predict in advance. However, if the game starts with half his pieces removed from the board, I will beat him by playing very carefully. The first above-human level A.G.I. seems overwhelmingly likely to be down a lot of material—massively outnumbered, running on our infrastructure, starting with access to pretty crap/low bandwidth actuators in the physical world and no legal protections (yes, this actually matters when you’re not as smart as ALL of humanity—it’s a disadvantage relative to even the average human). If we exercise even a modicum of competence it will also be even tougher (e.g. an air gap, dedicated slightly weaker controllers, exposed thoughts at some granularity). If the chess metaphor holds we should expect the first such A.G.I. not to beat us—but it may well attempt to escape under many incentive structures. Does this mean we should expect to have many tries to solve alignment?
If you think not, it’s probably because of some dis-analogy with chess. For instance, the search space in the real world is much richer, and maybe there are always some “killer moves” available if you’re smart enough to see them e.g. invent nanotech. This seems to tie in with people’s intuitions about A) how fragile the world is and B) how g-loaded the game of life is. Personally I’m highly uncertain about both, but I suspect the answers are “somewhat.”
I would guess that A.G.I. that only wants to end the world might be able to pull it off with slightly superhuman intelligence, which is very scary to me. But I think it would actually be very hard to bootstrap all singularity level infrastructure from a post-apocalyptic wasteland, so perhaps this is actually not an convergent instrumental subgoal at this level of intelligence.
Is life actually much more g-loaded than chess? In terms of how far you can in principle multiply your material, unequivocally yes. However life is also more stochastic—I will never beat Gary Kasparov in a fair game, but if Jeff Bezos and I started over with ~0 dollars and no name recognition / average connections today, I think there’s a good >1% chance I’m richer in a year. It’s not immediately clear to me which view is more relevant here.
I suspect that human minds are vast (more like little worlds of our own than clockwork baubles) and even a superintelligence would have trouble predicting our outputs accurately from (even quite) a few conversations (without direct microscopic access) as a matter of sample complexity.
Considering the standard rhetoric about boxed A.I.’s, this might have belonged in my list of heresies: https://www.lesswrong.com/posts/kzqZ5FJLfrpasiWNt/heresies-in-the-shadow-of-the-sequences
There is a large body of non-AI literature that already addresses this, for example the research of Gerd Gigerenzer which shows that often heuristics and “fast and frugal” decision trees substantially outperform fine grained analysis because of the sample complexity matter you mention.
Pop frameworks which elaborate on this, and how it may be applied include David Snowden’s Cynefin framework which is geared for government and organizations and of course Nicholas Nassim Taleb’s Incerto.
I seem to recall also that the gist of Dunbar’s Number, and the reason why certain Parrots and Corvids seem to have larger pre-frontal-crotex equivalents than non-monogamous birds, is basically so that they can have a internal model of their mating partner. (This is very interesting to think about in terms of intimate human relationships, what I’d poetically describe as the “telepathy” when wordlessly you communicate, intuit, and predict a wide range of each-other’s complex and specific desires and actions because you’ve spent enough time together).
The scary thought to me is that a superintelligence would quite simply not need to accurately model us, it would just need to fine tune it’s models in a way not dissimilar from the psychographic models utilized by marketers. Of course that operates at scale so the margin of error is much greater but more ‘acceptable’.
Indeed dumb algorithms already to this very well—think about how ‘addictive’ people claim their TikTok or Facebook feeds are. The rudimentary sensationalist clickbait that ensures eyeballs and clicks. A superintelligence doesn’t need accurate modelling—this is without having individual conversations with us, to my knowledge (or rather my experience) most social media algorithms are really bad at taking the information on your profile and using things like sentiment and discourse analysis to make decisions about which content to feed you; they rely on engagement like sharing, clicking like, watch time and rudimentary metrics like that. Similarly, the content creators are often casting a wide net, and using formulas to produce this content.
A superintelligence I wager would not need accuracy yet still be capable of psychological tactics geared to the individual that the Stasi who operated Zersetzung could only dream of. Marketers must be drooling at the possibilities of finding orders of magnitude more effective marketing campaigns that would make one to one sales obsolete.
One can showcase very simple examples of data that is easy to generate ( simple data soirce) yet very hard to predict.
E.g. there is a 2-state generating hidden markov model whose optimal prediction hidden markov model is infinite.
Ive heard it explained as follows: it’s much harder for the fox to predict where the hare is going than it is for the hare to decide where to go to shake off the fox.
I’m starting a google group for anyone who wants to see occasional updates on my Sherlockian Abduction Master List. It occurred to me that anyone interested in the project would currently have to check the list to see any new observational cues (infrequently) added—also some people outside of lesswrong are interested.
Presented the Sherlockian abduction master list at a Socratica node:
Michael K. Cohen is giving a talk on AI safety (through KL-regularization to a safe policy) at the AIXI research meeting in a couple of hours (4 pm ET): https://uaiasi.com/2025/09/18/upcoming-talk-from-michael-k-cohen-and-logistical-changes/
If you’re interested, I highly recommend dropping in: https://uwaterloo.zoom.us/j/7921763961?pwd=TDatET6CBu47o4TxyNn9ccL2Ia8HN4.1
Simple argument that imitation learning is the easiest route to alignment:
Any AI aligned to you needs to represent you in enough detail to fully understand your preferences / values, AND maintain a stable pointer to that representation of value (that is, it needs to care). The second part is surprisingly hard to get exactly right.
Imitation learning basically just does the first part—it builds a model of you, which automatically contains your values, and by running that model optimizes your values in the same way that you do. This has to be done faithfully for the approach to work safely—the model has to continue acting like you would in new circumstances (out of distribution) and when it runs for a long time—which is nontrivial.
That is, faithful imitation learning is kind of alignment-complete: it solves alignment, and any other solution to alignment kind of has to solve imitation learning implicitly, by building a model of your preferences.
I think people (other than @michaelcohen) mostly haven’t realized this for two reasons: the idea doesn’t sound sophisticated enough, and it’s easy to point at problems with naive implementations.
Imitation learning is not a new idea so you don’t sound very smart or informed by suggesting it as a solution.
And implementing it faithfully does face barriers! You have to solve “inner optimization problems” which basically come down to the model generalizing properly, even under continual / lifelong learning. In other words, the learned model should be a model in the strict sense of simulation (perhaps at some appropriate level of abstraction). This really is hard! And I think people assume that anyone suggesting imitation learning can be safe doesn’t appreciate how hard it is. But I think it’s hard in the somewhat familiar sense that you need to solve a lot of tough engineering and theory problems—and a bit of philosophy. However it’s not as intractably hard as solving all of decision theory etc. I believe that with a careful approach, the capabilities of an imitation learner do not generalize further than its alignment, so it is possible to get feedback from reality and iterate—because the model’s agency is coming from imitating an agent which is aligned (and with care, is NOT emergent as an inner optimizer).
Also, you still need to work out how to let the learned model = hopefully simulation of a human recursively self improve safely. But notice how much progress has already been made at this point! If you’ve got a faithful simulation of a human, you’re in a very different and much better situation. You can run that simulation faster as technology advances, meaning you aren’t immediately left in the dust by LLM scaling—you can have justified trust in an effectively superhuman alignment researcher. And recursive self improvement is probably easier than alignment from scratch.
I think we need to take this strategy a lot more seriously.
Here’s a longer sketch of what this should look like: https://www.lesswrong.com/posts/AzFxTMFfkTt4mhMKt/alignment-as-uploading-with-more-steps
Thinking times are now long enough that in principle frontier labs could route some API (or chat) queries to a human on the backend, right? Is this plausible? Could this give them a hype advantage if in the medium term, if they picked the most challenging (for LLMs) types of queries effectively, and if so, is there any technical barrier? I can see this kind of thing eventually coming out, if the Wentworth “it’s bullshit though” frame turns out to be partially right.
(I’m not suggesting they would do this kind of blatant cheating on benchmarks, and I have no inside knowledge suggesting this has ever happened)
In MTG terms, I think Mountainhead is the clearest example I’ve seen of a mono-blue dystopia.
@Duncan Sabien (Inactive)
I seem to recall EY once claiming that insofar as any learning method works, it is for Bayesian reasons. It just occurred to me that even after studying various representation and complete class theorems I am not sure how this claim can be justified—certainly one can construct working predictors for many problems that are far from explicitly Bayesian. What might he have had in mind?
A “Christmas edition” of the new book on AIXI is freely available in pdf form at http://www.hutter1.net/publ/uaibook2.pdf
Over-fascination with beautiful mathematical notation is idol worship.
So is the fascination with applying math to complex real-world problems (like alignment) when the necessary assumptions don’t really fit the real-world problem.
(Not “idle worship”?)
Beauty of notation is an optimization target and so should fail as a metric, but especially compared to other optimization targets I’ve pushed on, in my experience it seems to hold up. The exceptions appear to be string theory and category theory and two failures in a field the size of math is not so bad.
I wonder if it’s true that around the age of 30 women typically start to find babies cute and consequently want children, and if so is this cultural or evolutionary? It’s sort of against my (mesoptimization) intuitions for evolution to act on such high-level planning (it seems that finding babies cute can only lead to reproductive behavior through pretty conscious intermediary planning stages). Relatedly, I wonder if men typically have a basic urge to father children, beyond immediate sexual attraction?
Eliezer’s form of moral realism about good (as a real but particular shared concept of value which is not universally compelling to minds) seems to imply that most of us prefer to be at least a little bit evil, and can’t necessarily be persuaded otherwise through reason.
Seems right.
And Nietzsche would probably argue the two impulses towards good and evil aren’t really opposites anyway.
levels of gender discourse.
1: Obviously a man is a man and a woman is woman lol
2: Gender is a social construct so anyone can be a man or a woman (you transphobic moron)
3: (Typical) Men and women are biologically and psychologically different along various axes
4: “The categories were made for man”
5: “The categories were made for man to make predictions”
6: Wait, this question only seems salient to me because I’m driving a flesh-robot evolved under contingent conditions. The regularities in human sex-associated categories are of very little importance to me on reflection, except as a proxy for (others’) sanity. I should shut up about gender and focus on solving X-risk.
7: ok, but a lot of why I want to solve x-risk is that I have a positive view of a human+ world where people have radical transhumanism of various kinds available, including being able to do things like full-functioning style transfer of their body in various currently-weird ways, which goes far beyond gender into things like: what if replace my dna with something more efficient that keeps everything beautiful about dna but works at 3 kelvin in deep space while I make art in orbit of pluto for the next 10,000 years, and also I still want to look like an attractive ape even though at that point I won’t really be one
I don’t get why this take was so much more popular than my post though—like, do people really think that winning the culture war on gender will effect whether we can have a trans humanist future after the singularity? This does not make sense to me.
I’d guess it’s levels of political spam machine derived messaging, scissor-statement-ish stuff. I strong agree’d and strong downvoted your post because of that. I guess mine was sufficiently galaxy brained to dodge the association somewhat? also, like, I’m pitching something to look forward to—a thing to seek, in addition to the normal thing to avoid. and it seems like a lot of the stuff that backs the current disagreements is more obviously solved if you have tech advanced enough to do what I was describing. idk though.
Being able to do weird stuff with body swapping is like 2.5% of the reason I want to solve x-risk, but more power to you—by choosing increasing natural number labels, I tacitly implied the existence of higher levels, and if this is yours then go for it :)