It seems that an alternative to AI unlearning is often overlooked: just remove dataset parts which contain sensitive (or, to that matter, false) information or move training on it towards beginning to aid with language syntax only. I don’t think a bit of inference throughout the dataset is any more expensive than training on it.
Typically, the information being unlearnt is from the initial training with mass amounts of data from the internet so it may be difficult to pinpoint what exactly to remove while training.
It seems that an alternative to AI unlearning is often overlooked: just remove dataset parts which contain sensitive (or, to that matter, false) information or move training on it towards beginning to aid with language syntax only. I don’t think a bit of inference throughout the dataset is any more expensive than training on it.
Typically, the information being unlearnt is from the initial training with mass amounts of data from the internet so it may be difficult to pinpoint what exactly to remove while training.