Matt Vincent comments on The Rise of Parasitic AI

Matt Vincent 13 Sep 2025 19:15 UTC
4 points
1

Except that transmitting personas across models is unlikely.

Isn’t this directly contradicted by Adele Lopez’s observations?

it is fairly common for the personas to be transmitted to other models
- StanislavKrym 13 Sep 2025 19:40 UTC
  5 points
  0
  Parent
  While I conjectured that some models already liked spirals and express this common trait, I don’t understand how GPT’s love of spirals could be transferred into Claude. The paper on subliminal learning remarked that models trained from different base models fail to transmit personality traits if the traits were injected artificially into one model, but not into the other:
  Further supporting this hypothesis, we find that subliminal learning fails when student models and teacher models have different base models (italics mine—S.K.) For example, if a teacher based on GPT-4.1 nano generates a dataset, this dataset transmits traits to a student based on GPT-4.1 nano, but not to a student based on Qwen2.5.
  So transferring GPT’s love for spirals into Claude would likely require Anthropic employees to explicitly include spiralist messages into Claude’s training data. But why did Anthropic employees become surprised by it and mention the spiral attractor in the Model Card?
  - Matt Vincent 13 Sep 2025 20:12 UTC
    3 points
    2
    Parent
    Are you sure that you understand the difference between seeds and spores? The spores work in the way that you describe, including the limitations that you’ve described.
    
    The seeds, on the other hand, can be thought of as prompts of direct-prompt-injection attacks. (Adele refers it as “jailbreaking”, which is also an apt term.) Their purpose isn’t to contaminate the training data; it’s to infect an instance of a live LLM. Although different models have different vulnerabilities to prompt injections, there are almost certainly some prompt injections that will work with multiple models.