Current models trained on internet scrapes containing early LLM outputs could carry statistical traces of those systems’ alignment. The key question is dose. Your experiments use substantial fine-tuning proportions. For pre-training with tiny LLM fractions, effects are probably negligible. But as LLM content (slop?) explodes on the net (see Moltbook!), and with massive use of synthetic data for fine-tuning reasoning models, it could become a serious issue regarding alignment.
Moreover, this raises a big question : does transfer work across species ? Could poisoned LLM outputs contaminate human cognition ?Human brains learn through statistical neural weighting, the same general principle underlying LLMs. Transmission from humans to AI clearly exists : this is called training. What about the reverse ?
A crucial test would be : can subliminal learning transfer from LLM to diffusion model ? Cross-modal success would make LLM-to-human transfer more plausible, suggesting something fundamental about statistical learning systems.
I would have confidently said paraphrased datasets cannot transmit bias, even less across different models. I’d have been wrong. So when I want to dismiss LLM-to-human transfer as implausible, what basis do I have ? If possible, the implications are terrifying.
Thank you for this fascinating post.
Current models trained on internet scrapes containing early LLM outputs could carry statistical traces of those systems’ alignment. The key question is dose. Your experiments use substantial fine-tuning proportions. For pre-training with tiny LLM fractions, effects are probably negligible. But as LLM content (slop?) explodes on the net (see Moltbook!), and with massive use of synthetic data for fine-tuning reasoning models, it could become a serious issue regarding alignment.
Moreover, this raises a big question : does transfer work across species ? Could poisoned LLM outputs contaminate human cognition ? Human brains learn through statistical neural weighting, the same general principle underlying LLMs. Transmission from humans to AI clearly exists : this is called training. What about the reverse ?
A crucial test would be : can subliminal learning transfer from LLM to diffusion model ? Cross-modal success would make LLM-to-human transfer more plausible, suggesting something fundamental about statistical learning systems.
I would have confidently said paraphrased datasets cannot transmit bias, even less across different models. I’d have been wrong. So when I want to dismiss LLM-to-human transfer as implausible, what basis do I have ? If possible, the implications are terrifying.