Then we train to match the original model’s output by minimising an MSE loss Lminimality=MSE(f(x,θ∗),f(x,κ(x)))
I think you wanted
MSE(f(x|θ∗),f(x|κ(x)))
I think you wanted