Charlie Steiner comments on Your LLM-assisted scientific breakthrough probably isn’t real

Charlie Steiner 2 Sep 2025 23:46 UTC
72 points
46
In order for your ideas to qualify as science, you need to a) formulate a specific, testable, quantitative hypothesis^[2], b) come up with an experiment that will empirically test whether that hypothesis is true, c) preregister what your hypothesis predicts about the results of that experiment (free at OSF), and d) run the experiment^[3] and evaluate the results. All of those steps are important! Try to do them in a way that will make it easy to communicate your results. Try to articulate the hypothesis in a clear, short way, ideally in a couple of sentences. Design your experiment to be as strong as possible. If your hypothesis is false, then your experiment should show that; the harder it tries to falsify your hypothesis, the more convincing other people will find it. Always ask yourself what predictions your theory makes that other theories don’t, and test those. Preregister not just the details of the experiment, but how you plan to analyze it; use the simplest analysis and statistics that you expect to work.
I think this is the weakest part of the essay, both as philosophy of science and as communication to the hopefully-intended audience.
“Qualifying as science” is not about jumping through a discrete set of hoops. Science is a cultural process where people work together to figure out new stuff, and you can be doing science in lots of ways that don’t fit onto the gradeschool “The Scientific Method” poster.
a) You can be doing science without formulating a hypothesis—e.g. observational studies / fishing expeditions, making phenomenological fits to data, building new equipment. If you do have a hypothesis, it doesn’t have to be specific (it could be a class of hypotheses), it doesn’t have to be testable (it’s science to make the same observable predictions as the current leading model in a simpler way), and it doesn’t have to be quantitative (you can do important science just by guessing the right causal structure without numbers).
b) You can be doing science without coming up with an experiment (Mainly when you’re trying to explain existing results. Or when doing any of that non-hypothesis-centric science mentioned earlier).
c) If you do have a hypothesis and experiment in that order, public pre-registration is virtuous but not required to be science. Private pre-registration, in the sense that you know what your hypothesis predicts, is a simple consequence of doing step (b), and can be skipped when step (b) doesn’t apply.
d) Experiments are definitely science! But you can be doing science without them, e.g. if you do steps a-c and leave step d for other people, that can be science.
From a communication perspective, this reads as setting up unrealistic standards of what it takes to “qualify as science,” and then using them as a bludgeon against the hopefully-intended audience of people who think they’ve made an LLM-assisted breakthrough. Such an audience might feel like they were being threatened or excluded, like these standards were just there to try to win an argument.
Although, even if that’s true, steps (a)-(d) do have an important social role: they’re a great way to convince people (scientists included) without those other people needing to do much work. If you have an underdog theory that other scientists scoff at, but you do steps (a)-(d), many of those scoffers will indeed sit up and take serious notice.
But normal science isn’t about a bunch of solo underdogs fighting it out to collate data, do theoretical work, and run experiments independently of each other. Cutting-edge science is often too hard for that even to be reasonable. It’s about people working together, each doing their part to make it easier for other people to do their own parts.
This isn’t to say that there aren’t standards you can demand of people who think they’ve made a breakthrough. And those standards can be laborious, and even help you win the argument! It just means standards, and the advice about how to meet them, have to be focused more on helping people participate in the cultural process where people work together to figure out new stuff.
A common ask of people who claim to have made advances: do they really know what the state of the art is, in the field they’ve supposedly advanced? You don’t have to know everything, but you have to know a lot! If you’re advancing particle physics, you’d better know the standard model and the mathematics required to operate it. And if there’s something you don’t know about the state of the art, you should just be a few steps away from learning it on your own (e.g. you haven’t read some important paper, but you know how to find it, and know how to recurse and read the references or background you need, and pretty soon you’ll understand the paper at a professional level).
The reasons you have to really know the state of the art are (1) if you don’t, there are a bunch of pitfalls you can fall into so your chances of novel success are slim, and (2) if you don’t, you won’t know how to contribute to the social process of science.
Which brings us to the more general onerous requirement, one that generalizes steps (a)-(d), is: Have you done hard work to make this actually useful to other scientists? This is where the steps come back in. Because most “your LLM-assisted scientific breakthrough”s are non-quantitative guesses, that hard work is going to look a lot like steps (a) and (b). It means putting in a lot of hard work to make your idea as quantitative and precise as you can, and then to look through the existing data to quantitatively show how your idea compares to the current state of the art on the existing data, then maybe proposing new experiments that could be done, filling in enough detail that you can make quantitative predictions for an experiment that show how the predictions might differ between your idea and the state of the art.
- eggsyntax 3 Sep 2025 14:07 UTC
  10 points
  0
  Parent
  Thanks for the input. I agree with most of what you’re saying. That section is trying to strike a balance between several goals:
  - I’m trying to keep the whole post fairly short so that people who would find it useful will read it. As a result, step 2 in particular is way too short to treat the subject of scientific methodology and practice with anything like the depth it deserves, which I try to at least say out loud in the final paragraph of that step.
  - For readers who have done valid scientific work, I want to make it easier for them to get their work seen, and so I’m aiming for (as you say) ‘Have you done hard work to make this actually useful to other scientists?’
  - For readers who haven’t done good scientific work, I want them to realize that as quickly and painlessly as possible. Hopefully step 1 will accomplish that in many cases. If it hasn’t, then step 2 (in terms of my goals as a writer) is mostly about getting people to think about whether their ideas can cash out into a falsifiable hypothesis that can make quantitative advance predictions. In cases like this that I’ve read, that’s often the problem; the person’s ideas just don’t meet those criteria at all because (for example) they’re a set of fuzzy descriptive claims that use terms in imprecise ways that don’t and can’t make concrete claims about the world.
  The balance I’ve struck is really imperfect. But I suspect that if I say, ‘Well, you don’t always need a falsifiable hypothesis or an experiment’, readers who have been fooled will just assume that their ideas don’t need those things, and so it’ll do more harm than good.
  - eggsyntax 3 Sep 2025 15:45 UTC
    9 points
    0
    Parent
    Ideas on how to avoid discouraging people doing valid work without providing a way-too-tempting escape hatch are extremely welcome, from you or anyone!
    - OneManyNone 8 Sep 2025 19:43 UTC
      3 points
      −1
      Parent
      Perhaps this is elitist or counter-productive to say but… do these people actually exist?
      
      By which I mean, are there people who are using LLMs to do meaningful novel research, while also lacking the faculties/self-awareness to realize that LLMs can’t produce or verify novel ideas?
      
      My impression has been that LLMs can only be used productively in situations where one of the following holds:
      
      - The task is incredibly easy
      - Precision is not a requirement
      - You have enough skill that you could have done the thing on your own anyway.
      
      In the last case in particular, LLMs are only an effort-saver, and you’d still need to verify and check every step it took. Novel research in particular requires enormous skill—I’m not sure that someone who had that skill would get to the point where they developed a whole theory without noticing it was made up.
      
      [Also, as a meta-point, this is a great piece but I was wondering if it’s going to be posted somewhere else besides LessWrong? If the target demographic is only LW, I worry that it’s trying to have too many audience. Someone coming to this for advice would see the comments from people like me who were critiquing the piece itself, and that would certainly make it less effective. In the right place (not sure what that it) I think this could essay could be much more effective.]
      - eggsyntax 9 Sep 2025 14:50 UTC
        2 points
        0
        Parent
        Thanks for the reply!
        LLMs can’t produce or verify novel ideas?
        I think your view here is too strong. For example, there have been papers showing that LLMs come up with ideas that human judges rate as human-level or above in blind testing. I’ve led a team doing empirical research (described here, results forthcoming) showing that current LLMs can propose and experimentally test hypotheses in novel toy scientific domains.
        So while the typical claimed breakthrough isn’t real, I don’t think we can rule out real ones a priori.
        If the target demographic is only LW, I worry that it’s trying to have too many audience.
        I’m not sure what that means, can you clarify?
        Someone coming to this for advice would see the comments from people like me who were critiquing the piece itself, and that would certainly make it less effective.
        Maybe? I would guess that people who feel they have a breakthrough are usually already aware that they’re going to encounter a lot of skepticism. That’s just my intuition, though; I could be wrong.
        I’m certainly open to posting it elsewhere. I posted a link to it to Reddit (in r/agi), but people who see it there have to come back here to read it. Suggestions are welcome, and I’m fine with you or anyone else posting it elsewhere with attribution (I’d appreciate getting a link to versions posted elsewhere).
- Lex Spoon 17 Sep 2025 0:01 UTC
  1 point
  0
  Parent
  I was about to post something similar but will follow up here since your post is close, @Charlie Steiner .
  
  @eggsyntax, the post is conflating two things: scientific validity, and community penetration. I think it will reach your target audience better to separate thes two things from each other.
  
  I am going to imagine that most people in the scenario you picture are fantasizing that they will post a result and then all the scientists in an area are going to fawn over you and make your life easy from now on. This is what I mean by community penetration.
  
  For that angle, Step 3 is the right way to go. Contact people in your target community. Write them a polite email, show them 1-2 brief things that you have done, and then ask them what to do next. This last part is really important. You don’t want to be a threat to them. You want to be an asset to them. Your goals are going to be things like co-writing a paper with them, or redefining your paper so that they can do a companion one, or at the very, very least, adding some citations in your work to theirs or to othre people that are influential in the target community.
  
  I don’t think you have to do THAT much homework before step 3. Buidling relationships is more about a thousand little interactions than one or two ginormous ones.
  
  I do not see a lot about related work in the post so far. I have found related work to be one of the most productive questions I can ask an LLM. Thye can show you products, papers, articles, and so on that you can go study to see what other people are already doing. This will also show you who you may want to contact for Step 3.
  
  For Steps 1 and 2, I think another way to approach that area is to move away from teh yes/no question and over to standards of evidence. Step 2 is great for developing evidence ifi t applies, but it really depends on the area and on the nature of the idea. It is possible to ask an LLM what the standards of evidence are for an area, and it may tell you something like one of these:
  
  * There may be a way to build a larger version of it the idea to make it less of a toy.
  * There may be a variation of the problem that could be explored. A good idea will hold up under multiple contexts, not just the original one.
  * There may be some kind of experiment you can try. Step 2 is terrific as written, but there are other experimental forms that also provide good evidence.
  
  Based on what comes back here, it can be good to have a conversation with the LLM about how to go deeper on one of these angles.
  
  OK, that’s all. Thanks for the post, and good luck with it.