Daniel Kokotajlo comments on Google’s new 540 billion parameter language model

Daniel Kokotajlo 5 Apr 2022 17:08 UTC
7 points
0
Am I right in thinking that, according to your theory, the “fix” they did (restarting training from checkpoint 100 steps before the spike started, but with different data, to avoid the spike) is actually counterproductive because it’s preventing the model from grokking? And instead they should have just kept training to “push through the spike” and get to a new, lower-loss regime?
- FeepingCreature 6 Apr 2022 4:51 UTC
  4 points
  0
  Parent
  Now I’m not saying it’s anthropic pressure, but if that’s true maybe we shouldn’t just keep training until we know what exactly it is that the model is grokking.
  - Throwaway2367 14 Apr 2022 23:08 UTC
    5 points
    0
    Parent
    Whatever is happening, I’m really concerned about the current “sufficiently big model starts to exhibit <weird behaviour A>. I don’t understand, but also don’t care, here is a dirty workaround and just give it more compute lol” paradigm. I don’t think this is very safe.
    - Not Relevant 15 Apr 2022 1:05 UTC
      1 point
      0
      Parent
      If I could get people to change that paradigm, you bet I would.