The memo trap reminds me of the recent work from Anthropic on superposition, memorization, and double descent—it’s plausible that there’s U-shaped scaling in there somewhere for similar reasons. But because of the exponential scaling of how good superposition is for memorization, maybe the paper actually implies the opposite? Hm.
The memo trap reminds me of the recent work from Anthropic on superposition, memorization, and double descent—it’s plausible that there’s U-shaped scaling in there somewhere for similar reasons. But because of the exponential scaling of how good superposition is for memorization, maybe the paper actually implies the opposite? Hm.