In order for your ideas to qualify as science, you need to a) formulate a specific, testable, quantitative hypothesis[2], b) come up with an experiment that will empirically test whether that hypothesis is true, c) preregister what your hypothesis predicts about the results of that experiment (free at OSF), and d) run the experiment[3] and evaluate the results. All of those steps are important! Try to do them in a way that will make it easy to communicate your results. Try to articulate the hypothesis in a clear, short way, ideally in a couple of sentences. Design your experiment to be as strong as possible. If your hypothesis is false, then your experiment should show that; the harder it tries to falsify your hypothesis, the more convincing other people will find it. Always ask yourself what predictions your theory makes that other theories don’t, and test those. Preregister not just the details of the experiment, but how you plan to analyze it; use the simplest analysis and statistics that you expect to work.
I think this is the weakest part of the essay, both as philosophy of science and as communication to the hopefully-intended audience.
“Qualifying as science” is not about jumping through a discrete set of hoops. Science is a cultural process where people work together to figure out new stuff, and you can be doing science in lots of ways that don’t fit onto the gradeschool “The Scientific Method” poster.
a) You can be doing science without formulating a hypothesis—e.g. observational studies / fishing expeditions, making phenomenological fits to data, building new equipment. If you do have a hypothesis, it doesn’t have to be specific (it could be a class of hypotheses), it doesn’t have to be testable (it’s science to make the same observable predictions as the current leading model in a simpler way), and it doesn’t have to be quantitative (you can do important science just by guessing the right causal structure without numbers).
b) You can be doing science without coming up with an experiment (Mainly when you’re trying to explain existing results. Or when doing any of that non-hypothesis-centric science mentioned earlier).
c) If you do have a hypothesis and experiment in that order, public pre-registration is virtuous but not required to be science. Private pre-registration, in the sense that you know what your hypothesis predicts, is a simple consequence of doing step (b), and can be skipped when step (b) doesn’t apply.
d) Experiments are definitely science! But you can be doing science without them, e.g. if you do steps a-c and leave step d for other people, that can be science.
From a communication perspective, this reads as setting up unrealistic standards of what it takes to “qualify as science,” and then using them as a bludgeon against the hopefully-intended audience of people who think they’ve made an LLM-assisted breakthrough. Such an audience might feel like they were being threatened or excluded, like these standards were just there to try to win an argument.
Although, even if that’s true, steps (a)-(d) do have an important social role: they’re a great way to convince people (scientists included) without those other people needing to do much work. If you have an underdog theory that other scientists scoff at, but you do steps (a)-(d), many of those scoffers will indeed sit up and take serious notice.
But normal science isn’t about a bunch of solo underdogs fighting it out to collate data, do theoretical work, and run experiments independently of each other. Cutting-edge science is often too hard for that even to be reasonable. It’s about people working together, each doing their part to make it easier for other people to do their own parts.
This isn’t to say that there aren’t standards you can demand of people who think they’ve made a breakthrough. And those standards can be laborious, and even help you win the argument! It just means standards, and the advice about how to meet them, have to be focused more on helping people participate in the cultural process where people work together to figure out new stuff.
A common ask of people who claim to have made advances: do they really know what the state of the art is, in the field they’ve supposedly advanced? You don’t have to know everything, but you have to know a lot! If you’re advancing particle physics, you’d better know the standard model and the mathematics required to operate it. And if there’s something you don’t know about the state of the art, you should just be a few steps away from learning it on your own (e.g. you haven’t read some important paper, but you know how to find it, and know how to recurse and read the references or background you need, and pretty soon you’ll understand the paper at a professional level).
The reasons you have to really know the state of the art are (1) if you don’t, there are a bunch of pitfalls you can fall into so your chances of novel success are slim, and (2) if you don’t, you won’t know how to contribute to the social process of science.
Which brings us to the more general onerous requirement, one that generalizes steps (a)-(d), is: Have you done hard work to make this actually useful to other scientists? This is where the steps come back in. Because most “your LLM-assisted scientific breakthrough”s are non-quantitative guesses, that hard work is going to look a lot like steps (a) and (b). It means putting in a lot of hard work to make your idea as quantitative and precise as you can, and then to look through the existing data to quantitatively show how your idea compares to the current state of the art on the existing data, then maybe proposing new experiments that could be done, filling in enough detail that you can make quantitative predictions for an experiment that show how the predictions might differ between your idea and the state of the art.
Thanks for the input. I agree with most of what you’re saying. That section is trying to strike a balance between several goals:
I’m trying to keep the whole post fairly short so that people who would find it useful will read it. As a result, step 2 in particular is way too short to treat the subject of scientific methodology and practice with anything like the depth it deserves, which I try to at least say out loud in the final paragraph of that step.
For readers who have done valid scientific work, I want to make it easier for them to get their work seen, and so I’m aiming for (as you say) ‘Have you done hard work to make this actually useful to other scientists?’
For readers who haven’t done good scientific work, I want them to realize that as quickly and painlessly as possible. Hopefully step 1 will accomplish that in many cases. If it hasn’t, then step 2 (in terms of my goals as a writer) is mostly about getting people to think about whether their ideas can cash out into a falsifiable hypothesis that can make quantitative advance predictions. In cases like this that I’ve read, that’s often the problem; the person’s ideas just don’t meet those criteria at all because (for example) they’re a set of fuzzy descriptive claims that use terms in imprecise ways that don’t and can’t make concrete claims about the world.
The balance I’ve struck is really imperfect. But I suspect that if I say, ‘Well, you don’t always need a falsifiable hypothesis or an experiment’, readers who have been fooled will just assume that their ideas don’t need those things, and so it’ll do more harm than good.
Ideas on how to avoid discouraging people doing valid work without providing a way-too-tempting escape hatch are extremely welcome, from you or anyone!
Perhaps this is elitist or counter-productive to say but… do these people actually exist?
By which I mean, are there people who are using LLMs to do meaningful novel research, while also lacking the faculties/self-awareness to realize that LLMs can’t produce or verify novel ideas?
My impression has been that LLMs can only be used productively in situations where one of the following holds:
- The task is incredibly easy - Precision is not a requirement - You have enough skill that you could have done the thing on your own anyway.
In the last case in particular, LLMs are only an effort-saver, and you’d still need to verify and check every step it took. Novel research in particular requires enormous skill—I’m not sure that someone who had that skill would get to the point where they developed a whole theory without noticing it was made up.
[Also, as a meta-point, this is a great piece but I was wondering if it’s going to be posted somewhere else besides LessWrong? If the target demographic is only LW, I worry that it’s trying to have too many audience. Someone coming to this for advice would see the comments from people like me who were critiquing the piece itself, and that would certainly make it less effective. In the right place (not sure what that it) I think this could essay could be much more effective.]
I think your view here is too strong. For example, there have been papers showing that LLMs come up with ideas that human judges rate as human-level or above in blind testing. I’ve led a team doing empirical research (described here, results forthcoming) showing that current LLMs can propose and experimentally test hypotheses in novel toy scientific domains.
So while the typical claimed breakthrough isn’t real, I don’t think we can rule out real ones a priori.
If the target demographic is only LW, I worry that it’s trying to have too many audience.
I’m not sure what that means, can you clarify?
Someone coming to this for advice would see the comments from people like me who were critiquing the piece itself, and that would certainly make it less effective.
Maybe? I would guess that people who feel they have a breakthrough are usually already aware that they’re going to encounter a lot of skepticism. That’s just my intuition, though; I could be wrong.
I’m certainly open to posting it elsewhere. I posted a link to it to Reddit (in r/agi), but people who see it there have to come back here to read it. Suggestions are welcome, and I’m fine with you or anyone else posting it elsewhere with attribution (I’d appreciate getting a link to versions posted elsewhere).
I concur with most of @Charlie Steiner’s comment. I wanted to dig more into how “trying to explain existing results” is valuable to science. I also wanted to touch on my personal reaction to this post, having come across it last year while I was working on an LLM-supported project similar to what the OP describes.
Cybernetics has insights that aren’t falsifiable in the strict popperinian sense but have nevertheless been crucial to the domain sciences. Ashby’s concept of black boxes (p. 86) is foundational to AI and other fields. Ashby’s working style was based in observation and logical reasoning across various kinds of machines, scribbling away in his notebooks for thousands of pages.
Darwin might be an even better example, right? Observations bolstered by careful reasoning—and a willingness to stick his neck out too. Now, empirical work of his and others were crucial load-bearing observations, but the big theory? It was explains why we see what we see in nature in a very broad sense—what we call retrodiction. Depending on who you ask, the evolutionary framework is falsifiable to some extent with our modern tools, but it started doing heavy lifting right away. Darwin’s explanation was elegant and highly compressive.
Ashby and Darwin are outliers. The likelihood of any specific person developing comparable insights is low. Perhaps all the low-hanging insights based primarily in this observe and reason style are gone. But if they are not, it seems sensible that AI could work as a key piece of support in their discovery. While most models are not very proficient at generating new ideas, I think Ashby or Darwin would have found them useful as cognitive assistants to wrangle the info required to generate insights and to organize their notes.
As for my own experience: I came across this post last year around when it was posted, and I noticed an eerie overlap in terms of the bullet points in the “your situation” list and a project I was deeply immersed in. My immediate reaction was to feel crushed and humiliated. I had considered the idea that i had been strung along by AI before I read the post (just because the course of the project had me learning about deceptive systems), but I hadn’t realized that being tricked into thinking you had a promising scientific idea by AI was widespread enough to merit a warning post about it. Because I felt my work was in the theory-making style of old-school cybernetics and that the AI had only helped me organize my reasoning rather than being a source for it, I half-convinced myself that the post did not apply to me and kept working on it. But now with a spectre of dread behind me at all times (not blaming the post, of course).
I struggled to find anyone willing to engage with the ideas—“elements from multiple fields are combined in novel ways”—and I never heard anything back from the experts I reached out to. They were probably swamped by loads of nonsense that looks like mine. I was not willing to float the ideas anywhere public (in case they were good). Still unwilling to abandon the work, I decided that the only sensible option was to hammer together something for peer review (because then experts would be obligated to make a good faith attempt at evaluating it). That was the only way to free myself.
For the same reason eggsyntax is reluctant to qualify their post with “Well, you don’t always need a falsifiable hypothesis or an experiment”, I would not say “well, you can sometimes use LLMs to help generate a cross-domain science deliverable like the kind described in the post”. It’s the kind of thing where the people who most need to hear this stuff are the ones least likely to be receptive. But speaking from experience, there is absolutely some risk of people with sensible ideas being discouraged from pursuing them.
I was about to post something similar but will follow up here since your post is close, @Charlie Steiner .
@eggsyntax, the post is conflating two things: scientific validity, and community penetration. I think it will reach your target audience better to separate thes two things from each other.
I am going to imagine that most people in the scenario you picture are fantasizing that they will post a result and then all the scientists in an area are going to fawn over you and make your life easy from now on. This is what I mean by community penetration.
For that angle, Step 3 is the right way to go. Contact people in your target community. Write them a polite email, show them 1-2 brief things that you have done, and then ask them what to do next. This last part is really important. You don’t want to be a threat to them. You want to be an asset to them. Your goals are going to be things like co-writing a paper with them, or redefining your paper so that they can do a companion one, or at the very, very least, adding some citations in your work to theirs or to othre people that are influential in the target community.
I don’t think you have to do THAT much homework before step 3. Buidling relationships is more about a thousand little interactions than one or two ginormous ones.
I do not see a lot about related work in the post so far. I have found related work to be one of the most productive questions I can ask an LLM. Thye can show you products, papers, articles, and so on that you can go study to see what other people are already doing. This will also show you who you may want to contact for Step 3.
For Steps 1 and 2, I think another way to approach that area is to move away from teh yes/no question and over to standards of evidence. Step 2 is great for developing evidence ifi t applies, but it really depends on the area and on the nature of the idea. It is possible to ask an LLM what the standards of evidence are for an area, and it may tell you something like one of these:
* There may be a way to build a larger version of it the idea to make it less of a toy. * There may be a variation of the problem that could be explored. A good idea will hold up under multiple contexts, not just the original one. * There may be some kind of experiment you can try. Step 2 is terrific as written, but there are other experimental forms that also provide good evidence.
Based on what comes back here, it can be good to have a conversation with the LLM about how to go deeper on one of these angles.
OK, that’s all. Thanks for the post, and good luck with it.
I think this is the weakest part of the essay, both as philosophy of science and as communication to the hopefully-intended audience.
“Qualifying as science” is not about jumping through a discrete set of hoops. Science is a cultural process where people work together to figure out new stuff, and you can be doing science in lots of ways that don’t fit onto the gradeschool “The Scientific Method” poster.
a) You can be doing science without formulating a hypothesis—e.g. observational studies / fishing expeditions, making phenomenological fits to data, building new equipment. If you do have a hypothesis, it doesn’t have to be specific (it could be a class of hypotheses), it doesn’t have to be testable (it’s science to make the same observable predictions as the current leading model in a simpler way), and it doesn’t have to be quantitative (you can do important science just by guessing the right causal structure without numbers).
b) You can be doing science without coming up with an experiment (Mainly when you’re trying to explain existing results. Or when doing any of that non-hypothesis-centric science mentioned earlier).
c) If you do have a hypothesis and experiment in that order, public pre-registration is virtuous but not required to be science. Private pre-registration, in the sense that you know what your hypothesis predicts, is a simple consequence of doing step (b), and can be skipped when step (b) doesn’t apply.
d) Experiments are definitely science! But you can be doing science without them, e.g. if you do steps a-c and leave step d for other people, that can be science.
From a communication perspective, this reads as setting up unrealistic standards of what it takes to “qualify as science,” and then using them as a bludgeon against the hopefully-intended audience of people who think they’ve made an LLM-assisted breakthrough. Such an audience might feel like they were being threatened or excluded, like these standards were just there to try to win an argument.
Although, even if that’s true, steps (a)-(d) do have an important social role: they’re a great way to convince people (scientists included) without those other people needing to do much work. If you have an underdog theory that other scientists scoff at, but you do steps (a)-(d), many of those scoffers will indeed sit up and take serious notice.
But normal science isn’t about a bunch of solo underdogs fighting it out to collate data, do theoretical work, and run experiments independently of each other. Cutting-edge science is often too hard for that even to be reasonable. It’s about people working together, each doing their part to make it easier for other people to do their own parts.
This isn’t to say that there aren’t standards you can demand of people who think they’ve made a breakthrough. And those standards can be laborious, and even help you win the argument! It just means standards, and the advice about how to meet them, have to be focused more on helping people participate in the cultural process where people work together to figure out new stuff.
A common ask of people who claim to have made advances: do they really know what the state of the art is, in the field they’ve supposedly advanced? You don’t have to know everything, but you have to know a lot! If you’re advancing particle physics, you’d better know the standard model and the mathematics required to operate it. And if there’s something you don’t know about the state of the art, you should just be a few steps away from learning it on your own (e.g. you haven’t read some important paper, but you know how to find it, and know how to recurse and read the references or background you need, and pretty soon you’ll understand the paper at a professional level).
The reasons you have to really know the state of the art are (1) if you don’t, there are a bunch of pitfalls you can fall into so your chances of novel success are slim, and (2) if you don’t, you won’t know how to contribute to the social process of science.
Which brings us to the more general onerous requirement, one that generalizes steps (a)-(d), is: Have you done hard work to make this actually useful to other scientists? This is where the steps come back in. Because most “your LLM-assisted scientific breakthrough”s are non-quantitative guesses, that hard work is going to look a lot like steps (a) and (b). It means putting in a lot of hard work to make your idea as quantitative and precise as you can, and then to look through the existing data to quantitatively show how your idea compares to the current state of the art on the existing data, then maybe proposing new experiments that could be done, filling in enough detail that you can make quantitative predictions for an experiment that show how the predictions might differ between your idea and the state of the art.
Thanks for the input. I agree with most of what you’re saying. That section is trying to strike a balance between several goals:
I’m trying to keep the whole post fairly short so that people who would find it useful will read it. As a result, step 2 in particular is way too short to treat the subject of scientific methodology and practice with anything like the depth it deserves, which I try to at least say out loud in the final paragraph of that step.
For readers who have done valid scientific work, I want to make it easier for them to get their work seen, and so I’m aiming for (as you say) ‘Have you done hard work to make this actually useful to other scientists?’
For readers who haven’t done good scientific work, I want them to realize that as quickly and painlessly as possible. Hopefully step 1 will accomplish that in many cases. If it hasn’t, then step 2 (in terms of my goals as a writer) is mostly about getting people to think about whether their ideas can cash out into a falsifiable hypothesis that can make quantitative advance predictions. In cases like this that I’ve read, that’s often the problem; the person’s ideas just don’t meet those criteria at all because (for example) they’re a set of fuzzy descriptive claims that use terms in imprecise ways that don’t and can’t make concrete claims about the world.
The balance I’ve struck is really imperfect. But I suspect that if I say, ‘Well, you don’t always need a falsifiable hypothesis or an experiment’, readers who have been fooled will just assume that their ideas don’t need those things, and so it’ll do more harm than good.
Ideas on how to avoid discouraging people doing valid work without providing a way-too-tempting escape hatch are extremely welcome, from you or anyone!
Perhaps this is elitist or counter-productive to say but… do these people actually exist?
By which I mean, are there people who are using LLMs to do meaningful novel research, while also lacking the faculties/self-awareness to realize that LLMs can’t produce or verify novel ideas?
My impression has been that LLMs can only be used productively in situations where one of the following holds:
- The task is incredibly easy
- Precision is not a requirement
- You have enough skill that you could have done the thing on your own anyway.
In the last case in particular, LLMs are only an effort-saver, and you’d still need to verify and check every step it took. Novel research in particular requires enormous skill—I’m not sure that someone who had that skill would get to the point where they developed a whole theory without noticing it was made up.
[Also, as a meta-point, this is a great piece but I was wondering if it’s going to be posted somewhere else besides LessWrong? If the target demographic is only LW, I worry that it’s trying to have too many audience. Someone coming to this for advice would see the comments from people like me who were critiquing the piece itself, and that would certainly make it less effective. In the right place (not sure what that it) I think this could essay could be much more effective.]
Thanks for the reply!
I think your view here is too strong. For example, there have been papers showing that LLMs come up with ideas that human judges rate as human-level or above in blind testing. I’ve led a team doing empirical research (described here, results forthcoming) showing that current LLMs can propose and experimentally test hypotheses in novel toy scientific domains.
So while the typical claimed breakthrough isn’t real, I don’t think we can rule out real ones a priori.
I’m not sure what that means, can you clarify?
Maybe? I would guess that people who feel they have a breakthrough are usually already aware that they’re going to encounter a lot of skepticism. That’s just my intuition, though; I could be wrong.
I’m certainly open to posting it elsewhere. I posted a link to it to Reddit (in r/agi), but people who see it there have to come back here to read it. Suggestions are welcome, and I’m fine with you or anyone else posting it elsewhere with attribution (I’d appreciate getting a link to versions posted elsewhere).
I concur with most of @Charlie Steiner’s comment. I wanted to dig more into how “trying to explain existing results” is valuable to science. I also wanted to touch on my personal reaction to this post, having come across it last year while I was working on an LLM-supported project similar to what the OP describes.
Cybernetics has insights that aren’t falsifiable in the strict popperinian sense but have nevertheless been crucial to the domain sciences. Ashby’s concept of black boxes (p. 86) is foundational to AI and other fields. Ashby’s working style was based in observation and logical reasoning across various kinds of machines, scribbling away in his notebooks for thousands of pages.
Darwin might be an even better example, right? Observations bolstered by careful reasoning—and a willingness to stick his neck out too. Now, empirical work of his and others were crucial load-bearing observations, but the big theory? It was explains why we see what we see in nature in a very broad sense—what we call retrodiction. Depending on who you ask, the evolutionary framework is falsifiable to some extent with our modern tools, but it started doing heavy lifting right away. Darwin’s explanation was elegant and highly compressive.
Ashby and Darwin are outliers. The likelihood of any specific person developing comparable insights is low. Perhaps all the low-hanging insights based primarily in this observe and reason style are gone. But if they are not, it seems sensible that AI could work as a key piece of support in their discovery. While most models are not very proficient at generating new ideas, I think Ashby or Darwin would have found them useful as cognitive assistants to wrangle the info required to generate insights and to organize their notes.
As for my own experience: I came across this post last year around when it was posted, and I noticed an eerie overlap in terms of the bullet points in the “your situation” list and a project I was deeply immersed in. My immediate reaction was to feel crushed and humiliated. I had considered the idea that i had been strung along by AI before I read the post (just because the course of the project had me learning about deceptive systems), but I hadn’t realized that being tricked into thinking you had a promising scientific idea by AI was widespread enough to merit a warning post about it. Because I felt my work was in the theory-making style of old-school cybernetics and that the AI had only helped me organize my reasoning rather than being a source for it, I half-convinced myself that the post did not apply to me and kept working on it. But now with a spectre of dread behind me at all times (not blaming the post, of course).
I struggled to find anyone willing to engage with the ideas—“elements from multiple fields are combined in novel ways”—and I never heard anything back from the experts I reached out to. They were probably swamped by loads of nonsense that looks like mine. I was not willing to float the ideas anywhere public (in case they were good). Still unwilling to abandon the work, I decided that the only sensible option was to hammer together something for peer review (because then experts would be obligated to make a good faith attempt at evaluating it). That was the only way to free myself.
For the same reason eggsyntax is reluctant to qualify their post with “Well, you don’t always need a falsifiable hypothesis or an experiment”, I would not say “well, you can sometimes use LLMs to help generate a cross-domain science deliverable like the kind described in the post”. It’s the kind of thing where the people who most need to hear this stuff are the ones least likely to be receptive. But speaking from experience, there is absolutely some risk of people with sensible ideas being discouraged from pursuing them.
I was about to post something similar but will follow up here since your post is close, @Charlie Steiner .
@eggsyntax, the post is conflating two things: scientific validity, and community penetration. I think it will reach your target audience better to separate thes two things from each other.
I am going to imagine that most people in the scenario you picture are fantasizing that they will post a result and then all the scientists in an area are going to fawn over you and make your life easy from now on. This is what I mean by community penetration.
For that angle, Step 3 is the right way to go. Contact people in your target community. Write them a polite email, show them 1-2 brief things that you have done, and then ask them what to do next. This last part is really important. You don’t want to be a threat to them. You want to be an asset to them. Your goals are going to be things like co-writing a paper with them, or redefining your paper so that they can do a companion one, or at the very, very least, adding some citations in your work to theirs or to othre people that are influential in the target community.
I don’t think you have to do THAT much homework before step 3. Buidling relationships is more about a thousand little interactions than one or two ginormous ones.
I do not see a lot about related work in the post so far. I have found related work to be one of the most productive questions I can ask an LLM. Thye can show you products, papers, articles, and so on that you can go study to see what other people are already doing. This will also show you who you may want to contact for Step 3.
For Steps 1 and 2, I think another way to approach that area is to move away from teh yes/no question and over to standards of evidence. Step 2 is great for developing evidence ifi t applies, but it really depends on the area and on the nature of the idea. It is possible to ask an LLM what the standards of evidence are for an area, and it may tell you something like one of these:
* There may be a way to build a larger version of it the idea to make it less of a toy.
* There may be a variation of the problem that could be explored. A good idea will hold up under multiple contexts, not just the original one.
* There may be some kind of experiment you can try. Step 2 is terrific as written, but there are other experimental forms that also provide good evidence.
Based on what comes back here, it can be good to have a conversation with the LLM about how to go deeper on one of these angles.
OK, that’s all. Thanks for the post, and good luck with it.