I still don’t quite see the connection—if it turns out that LLFC holds between different fine-tuned models to some degree, how will this help us interpolate between different simulacra?
Is the idea that we could fine-tune models to only instantiate certain kinds of behaviour and then use LLFC to interpolate between (and maybe even extrapolate between?) different kinds of behaviour?
Yes, roughly (the next comment is supposed to make the connection clearer, though also more speculative); RLFH / supervised fine-tuned models would correspond to ‘more mode-collapsed’ / narrower mixtures of simulacra here (in the limit of mode collapse, one fine-tuned model = one simulacrum).
I still don’t quite see the connection—if it turns out that LLFC holds between different fine-tuned models to some degree, how will this help us interpolate between different simulacra?
Is the idea that we could fine-tune models to only instantiate certain kinds of behaviour and then use LLFC to interpolate between (and maybe even extrapolate between?) different kinds of behaviour?
Yes, roughly (the next comment is supposed to make the connection clearer, though also more speculative); RLFH / supervised fine-tuned models would correspond to ‘more mode-collapsed’ / narrower mixtures of simulacra here (in the limit of mode collapse, one fine-tuned model = one simulacrum).