Reconstruction loss is the CE loss of the patched model
If this is accurate then I agree that this is not the same as “the KL Divergence between the normal model and the model when you patch in the reconstructed activations”. But Fengyuan described reconstruction score as:
measures how replacing activations changes the total loss of the model
which I still claim is equivalent.
I’m confused—why are you so confident that we should avoid processed food. Isn’t the whole point of your post that we don’t know whether processed oil is bad for you? Where’s the overwhelming evidence that processed food in general is bad?