IMO the most useful version of this would be to get empirical evidence on techniques. E.g. erasing certain concepts using LEACE and seeing if they can inhibit the model’s use of those concepts including during further training. It seems hard to ensure otherwise that there is not some gap between your definitions and reality.
IMO the most useful version of this would be to get empirical evidence on techniques. E.g. erasing certain concepts using LEACE and seeing if they can inhibit the model’s use of those concepts including during further training. It seems hard to ensure otherwise that there is not some gap between your definitions and reality.