You’re right about the “cloning of weights.” That was an overextension of the mechanical metaphor. It’s more accurate to say that in both Hebbian Learning and in distillation, you are approximating a mapping function. Updated for precision to “how LLMs can map their internal weights to clone each other’s outputs as part of a distillation attack.”
I wasn’t being facetious, I really do appreciate you pointing that out. I’m striving to make sure the metaphors for this model line up across AI alignment, cognitive science, art, neuroscience, and UX. It can be hard to tell where to drop the definition when speaking to all these groups at once. It’s a balancing act between scaring off the non-technical artists and annoying professionals with imprecision. Perhaps someone like you might be more interested in a more formal outlining of this model? https://doi.org/10.5281/zenodo.19407789
And to clarify, the “dominant model for learning” to which I referred was Hebbian Learning, which I’m confident you’ll agree is the dominant model in this specific context. I must apologize for the confusion—my website has hoverable in-line citations with quotes, and I forgot that I sometimes implicitly used them to justify points rather than in-line explanations. It was my way of trying to address the difficulty of presenting such an interdisciplinary unifying model.
By presenting citations with optional quotations, it acts as both a nudge for academics like yourself and a jumping-off point for those unfamiliar with a particular research stream. The problem is presenting it here. I could just cite them, but the stripping of the quotations—there’s not really a way to give a citation here in the format of “optional quotation for those who are unfamiliar.”
Guess my next post will include them regardless. I appreciate your points here, thanks much!
Thanks for pointing that out!
You’re right about the “cloning of weights.” That was an overextension of the mechanical metaphor. It’s more accurate to say that in both Hebbian Learning and in distillation, you are approximating a mapping function. Updated for precision to “how LLMs can map their internal weights to clone each other’s outputs as part of a distillation attack.”
I wasn’t being facetious, I really do appreciate you pointing that out. I’m striving to make sure the metaphors for this model line up across AI alignment, cognitive science, art, neuroscience, and UX. It can be hard to tell where to drop the definition when speaking to all these groups at once. It’s a balancing act between scaring off the non-technical artists and annoying professionals with imprecision. Perhaps someone like you might be more interested in a more formal outlining of this model? https://doi.org/10.5281/zenodo.19407789
And to clarify, the “dominant model for learning” to which I referred was Hebbian Learning, which I’m confident you’ll agree is the dominant model in this specific context. I must apologize for the confusion—my website has hoverable in-line citations with quotes, and I forgot that I sometimes implicitly used them to justify points rather than in-line explanations. It was my way of trying to address the difficulty of presenting such an interdisciplinary unifying model.
By presenting citations with optional quotations, it acts as both a nudge for academics like yourself and a jumping-off point for those unfamiliar with a particular research stream. The problem is presenting it here. I could just cite them, but the stripping of the quotations—there’s not really a way to give a citation here in the format of “optional quotation for those who are unfamiliar.”
Guess my next post will include them regardless. I appreciate your points here, thanks much!