testingthewaters comments on leogao’s Shortform

testingthewaters 22 Jun 2025 21:23 UTC
7 points
2
Can basically attest to all of these, been doing intensive ML upskilling for the last half a year and almost all of these have been true. Highlights include:
- Not properly setting up the attention mechanism in multiple experiments, resulting in the conclusion that attention didn’t do much (lmao)
- So, so many off-by-one, off-by-two errors, especially for next-token prediction setups
- Entire series of weeks-long experiments that turn out to be completely useless (usually based on a seemingly-reasonable intuition of some kind)
- Accidentally overwriting/resetting the residual element so the RNN was just an NN with a funky hat on
- I now hate shapes, reshaping, squeezing, unsqueezing, devices, torch.nn.functional.pad, so many more functions
- Using the wrong loss function
- Using the right loss function but with the wrong reduction
- Using the right loss function but the learning rate is too aggressive/too low/the optimiser is not initialised properly
- Using all the right things but loading the model from an incorrect checkpoint/not saving the weights properly
- And also learning that google colab was forged in mount doom, a tool of great power crafted with malicious intent.
- Leon Lang 23 Jun 2025 9:55 UTC
  5 points
  0
  Parent
  I now hate shapes, reshaping, squeezing, unsqueezing
  Are you using einops and einsum? I hate these somewhat less since using them. See here for more details.