Finetuning could be an avenue for transmitting latent knowledge between models.
As AI-generated text increasingly makes its way onto the Internet, it seems likely that we’ll finetune AI on text generated by other AI. If this text contains opaque meaning—e.g. due to steganography or latent knowledge—then finetuning could be a way in which latent knowledge propagates between different models.
What is included within “latent knowledge” here? Does it include both knowledge encoded in M1′s weights and knowledge introduced in-context while running it?
I’m imagining it’s something encoded in M1′s weights. But as a cheap test you could add in latent knowledge via the system prompt and then see if finetuning M2 on M1′s generations results in M2 having the latent knowledge
Finetuning could be an avenue for transmitting latent knowledge between models.
As AI-generated text increasingly makes its way onto the Internet, it seems likely that we’ll finetune AI on text generated by other AI. If this text contains opaque meaning—e.g. due to steganography or latent knowledge—then finetuning could be a way in which latent knowledge propagates between different models.
What is included within “latent knowledge” here?
Does it include both knowledge encoded in M1′s weights and knowledge introduced in-context while running it?
I’m imagining it’s something encoded in M1′s weights. But as a cheap test you could add in latent knowledge via the system prompt and then see if finetuning M2 on M1′s generations results in M2 having the latent knowledge