Wait, is this the solution to catastrophic forgetting in fine-tuning? I mean your KL regularisation math.
Wait, is this the solution to catastrophic forgetting in fine-tuning? I mean your KL regularisation math.