any shocking or surprising result in your own experiment is 80% likely to be a bug until proven otherwise. your first thought should always be to comb for bugs.
only after you have ruled out bugs do you get to actually think about how to fit your theory to the data, and even then, there might still be a hidden bug.
most papers are terrible and don’t replicate.
most techniques that sound intuitively plausible don’t work.
most techniques only look good if you don’t pick a strong enough baseline.
an actually good idea can take many tries before it works.
once you have good research intuitions, the most productive state to be in is to literally not think about what will go into the paper and just do experiments that satisfy your curiosity and. convince yourself that the thing is true. once you have that, running the final sweeps is really easy
most people have no intuition whatsoever about their hardware and so will write code that is horribly inefficient. even learning a little bit about hardware fundamentals so you don’t do anything obviously dumb is super valuable
in a long and complex enough project, you will almost certainly have a bug that invalidates weeks (or months) of work. being really careful and testing helps but slows down velocity a lot. unclear what the right equilibrium is.
feedback loop time is incredibly important, if you can get rapid feedback, you will make so much more progress.
implementing something that is already known to work is always vastly easier than inventing/researching something new.
you will inevitably spend a lot of time doing things that have no impact on the final published work whatsoever. like not even contributing that much useful intuition. this is unfortunate but unavoidable
oftentimes you will spend a lot of time being fundamentally philosophically confused about what to do, and only really figure out halfway through the project. this is normal.
direction is really important. most well executed research is still useless because it was the wrong direction.
research impact is super super long tailed. i think it’s really not worth doing research if you aren’t aiming for the long tail. if you’re early career, you should probably focus on doing things that enable you to aim at the long tail eventually, instead of trying to have lots of impact early on (for example, probablly better to do something you feel motivated by and learn a lot from than something that is “maximally important” but which you don’t have the skills to execute adequately on yet)
I agree and this is why research grant proposals often feel very fake to me. I generally just write up my current best idea / plan for what research to do, but I don’t expect it actually pan out that way and it would be silly to try to stick rigidly to a plan.
Can basically attest to all of these, been doing intensive ML upskilling for the last half a year and almost all of these have been true. Highlights include:
Not properly setting up the attention mechanism in multiple experiments, resulting in the conclusion that attention didn’t do much (lmao)
So, so many off-by-one, off-by-two errors, especially for next-token prediction setups
Entire series of weeks-long experiments that turn out to be completely useless (usually based on a seemingly-reasonable intuition of some kind)
Accidentally overwriting/resetting the residual element so the RNN was just an NN with a funky hat on
I now hate shapes, reshaping, squeezing, unsqueezing, devices, torch.nn.functional.pad, so many more functions
Using the wrong loss function
Using the right loss function but with the wrong reduction
Using the right loss function but the learning rate is too aggressive/too low/the optimiser is not initialised properly
Using all the right things but loading the model from an incorrect checkpoint/not saving the weights properly
And also learning that google colab was forged in mount doom, a tool of great power crafted with malicious intent.
any shocking or surprising result in your own experiment is 80% likely to be a bug until proven otherwise. your first thought should always be to comb for bugs.
I will add: 80% likely to be a bug, or a result from random-matrix theory.
some lessons from ml research:
any shocking or surprising result in your own experiment is 80% likely to be a bug until proven otherwise. your first thought should always be to comb for bugs.
only after you have ruled out bugs do you get to actually think about how to fit your theory to the data, and even then, there might still be a hidden bug.
most papers are terrible and don’t replicate.
most techniques that sound intuitively plausible don’t work.
most techniques only look good if you don’t pick a strong enough baseline.
an actually good idea can take many tries before it works.
once you have good research intuitions, the most productive state to be in is to literally not think about what will go into the paper and just do experiments that satisfy your curiosity and. convince yourself that the thing is true. once you have that, running the final sweeps is really easy
most people have no intuition whatsoever about their hardware and so will write code that is horribly inefficient. even learning a little bit about hardware fundamentals so you don’t do anything obviously dumb is super valuable
in a long and complex enough project, you will almost certainly have a bug that invalidates weeks (or months) of work. being really careful and testing helps but slows down velocity a lot. unclear what the right equilibrium is.
feedback loop time is incredibly important, if you can get rapid feedback, you will make so much more progress.
implementing something that is already known to work is always vastly easier than inventing/researching something new.
you will inevitably spend a lot of time doing things that have no impact on the final published work whatsoever. like not even contributing that much useful intuition. this is unfortunate but unavoidable
oftentimes you will spend a lot of time being fundamentally philosophically confused about what to do, and only really figure out halfway through the project. this is normal.
direction is really important. most well executed research is still useless because it was the wrong direction.
research impact is super super long tailed. i think it’s really not worth doing research if you aren’t aiming for the long tail. if you’re early career, you should probably focus on doing things that enable you to aim at the long tail eventually, instead of trying to have lots of impact early on (for example, probablly better to do something you feel motivated by and learn a lot from than something that is “maximally important” but which you don’t have the skills to execute adequately on yet)
I agree and this is why research grant proposals often feel very fake to me. I generally just write up my current best idea / plan for what research to do, but I don’t expect it actually pan out that way and it would be silly to try to stick rigidly to a plan.
Strongly agree. Exact same experience in research, but in finance / quant trading.
Can basically attest to all of these, been doing intensive ML upskilling for the last half a year and almost all of these have been true. Highlights include:
Not properly setting up the attention mechanism in multiple experiments, resulting in the conclusion that attention didn’t do much (lmao)
So, so many off-by-one, off-by-two errors, especially for next-token prediction setups
Entire series of weeks-long experiments that turn out to be completely useless (usually based on a seemingly-reasonable intuition of some kind)
Accidentally overwriting/resetting the residual element so the RNN was just an NN with a funky hat on
I now hate shapes, reshaping, squeezing, unsqueezing, devices, torch.nn.functional.pad, so many more functions
Using the wrong loss function
Using the right loss function but with the wrong reduction
Using the right loss function but the learning rate is too aggressive/too low/the optimiser is not initialised properly
Using all the right things but loading the model from an incorrect checkpoint/not saving the weights properly
And also learning that google colab was forged in mount doom, a tool of great power crafted with malicious intent.
Are you using einops and einsum? I hate these somewhat less since using them. See here for more details.
Be right back, just adding all that to my “AI researcher” prompt
I will add: 80% likely to be a bug, or a result from random-matrix theory.
Research sounds really finicky and tedious.