Thanks for this. It seems important. Learning still happening after weights are frozen? That’s crazy. I think it’s a big deal because it is evidence for mesa-optimization being likely and hard to avoid.
It also seems like evidence for the Scaling Hypothesis. One major way the scaling hypothesis could be false is if there are further insights needed to get transformative AI, e.g. a new algorithm or architecture. A simple neural network spontaneously learning to do its own, more efficient form of learning? This seems like a data point in favor of the idea that our current architectures and algorithms are fine, and will eventually (if they are big enough) grope their way towards more efficient internal structures on their own.
EDIT: Now i’m less sure of all the above, thanks to Rohin’s comment below. I guess this is a case of “Evidence to the people who didn’t already understand the theory well enough to make the prediction,” which maybe included me? Though I think I would have made the prediction too had I been asked…
Learning still happening after weights are frozen? That’s crazy. I think it’s a big deal because it is evidence for mesa-optimization being likely and hard to avoid.
Sure. We see that elsewhere too, like Dactyl. And of course, GPT-3.
Thanks for this. It seems important. Learning still happening after weights are frozen? That’s crazy. I think it’s a big deal because it is evidence for mesa-optimization being likely and hard to avoid.
It also seems like evidence for the Scaling Hypothesis. One major way the scaling hypothesis could be false is if there are further insights needed to get transformative AI, e.g. a new algorithm or architecture. A simple neural network spontaneously learning to do its own, more efficient form of learning? This seems like a data point in favor of the idea that our current architectures and algorithms are fine, and will eventually (if they are big enough) grope their way towards more efficient internal structures on their own.
EDIT: Now i’m less sure of all the above, thanks to Rohin’s comment below. I guess this is a case of “Evidence to the people who didn’t already understand the theory well enough to make the prediction,” which maybe included me? Though I think I would have made the prediction too had I been asked…
Sure. We see that elsewhere too, like Dactyl. And of course, GPT-3.
Huh, thanks.
Two separate size parameters. The size of the search space, and the size the traversal algorithm needs to be to span the same gaps brains did.