from https://arxiv.org/pdf/2001.08361also see the grokking literature: https://en.wikipedia.org/wiki/Grokking_(machine_learning)Previous discussion:https://www.lesswrong.com/posts/FRv7ryoqtvSuqBxuT/understanding-deep-double-descent
from https://arxiv.org/pdf/2001.08361
also see the grokking literature: https://en.wikipedia.org/wiki/Grokking_(machine_learning)
Previous discussion:
https://www.lesswrong.com/posts/FRv7ryoqtvSuqBxuT/understanding-deep-double-descent