leogao comments on leogao’s Shortform

leogao 21 Jun 2025 23:25 UTC
195 points
84
some lessons from ml research:
- any shocking or surprising result in your own experiment is 80% likely to be a bug until proven otherwise. your first thought should always be to comb for bugs.
- only after you have ruled out bugs do you get to actually think about how to fit your theory to the data, and even then, there might still be a hidden bug.
- most papers are terrible and don’t replicate.
- most techniques that sound intuitively plausible don’t work.
- most techniques only look good if you don’t pick a strong enough baseline.
- an actually good idea can take many tries before it works.
- once you have good research intuitions, the most productive state to be in is to literally not think about what will go into the paper and just do experiments that satisfy your curiosity and. convince yourself that the thing is true. once you have that, running the final sweeps is really easy
- most people have no intuition whatsoever about their hardware and so will write code that is horribly inefficient. even learning a little bit about hardware fundamentals so you don’t do anything obviously dumb is super valuable
- in a long and complex enough project, you will almost certainly have a bug that invalidates weeks (or months) of work. being really careful and testing helps but slows down velocity a lot. unclear what the right equilibrium is.
- feedback loop time is incredibly important, if you can get rapid feedback, you will make so much more progress.
- implementing something that is already known to work is always vastly easier than inventing/researching something new.
- you will inevitably spend a lot of time doing things that have no impact on the final published work whatsoever. like not even contributing that much useful intuition. this is unfortunate but unavoidable
- oftentimes you will spend a lot of time being fundamentally philosophically confused about what to do, and only really figure out halfway through the project. this is normal.
- direction is really important. most well executed research is still useless because it was the wrong direction.
- research impact is super super long tailed. i think it’s really not worth doing research if you aren’t aiming for the long tail. if you’re early career, you should probably focus on doing things that enable you to aim at the long tail eventually, instead of trying to have lots of impact early on (for example, probablly better to do something you feel motivated by and learn a lot from than something that is “maximally important” but which you don’t have the skills to execute adequately on yet)
- Joseph Miller 23 Jun 2025 0:24 UTC
  20 points
  12
  Parent
  I agree and this is why research grant proposals often feel very fake to me. I generally just write up my current best idea / plan for what research to do, but I don’t expect it actually pan out that way and it would be silly to try to stick rigidly to a plan.
- Alexei 22 Jun 2025 23:13 UTC
  10 points
  5
  Parent
  Strongly agree. Exact same experience in research, but in finance / quant trading.
- testingthewaters 22 Jun 2025 21:23 UTC
  7 points
  2
  Parent
  Can basically attest to all of these, been doing intensive ML upskilling for the last half a year and almost all of these have been true. Highlights include:
  - Not properly setting up the attention mechanism in multiple experiments, resulting in the conclusion that attention didn’t do much (lmao)
  - So, so many off-by-one, off-by-two errors, especially for next-token prediction setups
  - Entire series of weeks-long experiments that turn out to be completely useless (usually based on a seemingly-reasonable intuition of some kind)
  - Accidentally overwriting/resetting the residual element so the RNN was just an NN with a funky hat on
  - I now hate shapes, reshaping, squeezing, unsqueezing, devices, torch.nn.functional.pad, so many more functions
  - Using the wrong loss function
  - Using the right loss function but with the wrong reduction
  - Using the right loss function but the learning rate is too aggressive/too low/the optimiser is not initialised properly
  - Using all the right things but loading the model from an incorrect checkpoint/not saving the weights properly
  - And also learning that google colab was forged in mount doom, a tool of great power crafted with malicious intent.
  - Leon Lang 23 Jun 2025 9:55 UTC
    5 points
    0
    Parent
    I now hate shapes, reshaping, squeezing, unsqueezing
    Are you using einops and einsum? I hate these somewhat less since using them. See here for more details.
- Mitchell_Porter 24 Jun 2025 10:05 UTC
  4 points
  0
  Parent
  Be right back, just adding all that to my “AI researcher” prompt
- Garrett Baker 24 Jun 2025 2:32 UTC
  2 points
  0
  Parent
  
  any shocking or surprising result in your own experiment is 80% likely to be a bug until proven otherwise. your first thought should always be to comb for bugs.
  
  I will add: 80% likely to be a bug, or a result from random-matrix theory.
- Eli Tyre 23 Jun 2025 16:44 UTC
  2 points
  0
  Parent
  Research sounds really finicky and tedious.