RSS

Ma­chine Unlearning

TagLast edit: 23 Oct 2023 17:15 UTC by NickyP

In Machine Unlearning, the aim is to reduce performance on some “unlearned” tasks, while keeping performance on some “retained” tasks. While traditionally used in the context of privacy preservation and GDPR, some of the research is relevant to the field of AI Interpretability. Here is some terminology often used in the machine unlearning literature. (note that there can be some minor differences in use):


For an overview, one can look at “A Survey of Machine Unlearning

Ma­chine Un­learn­ing Eval­u­a­tions as In­ter­pretabil­ity Benchmarks

23 Oct 2023 16:33 UTC
33 points
2 comments11 min readLW link

Ma­chine Un­learn­ing in Large Lan­guage Models: A Com­pre­hen­sive Sur­vey with Em­piri­cal In­sights from the Qwen 1.5 1.8B Model

Rudaiba1 Feb 2025 21:26 UTC
9 points
2 comments11 min readLW link

Deep For­get­ting & Un­learn­ing for Safely-Scoped LLMs

scasper5 Dec 2023 16:48 UTC
127 points
30 comments13 min readLW link

Distil­la­tion Ro­bus­tifies Unlearning

13 Jun 2025 13:45 UTC
234 points
43 comments8 min readLW link
(arxiv.org)

Gra­di­ent Rout­ing: Mask­ing Gra­di­ents to Lo­cal­ize Com­pu­ta­tion in Neu­ral Networks

6 Dec 2024 22:19 UTC
169 points
14 comments11 min readLW link
(arxiv.org)

Un­learn­ing via RMU is mostly shallow

23 Jul 2024 16:07 UTC
55 points
4 comments6 min readLW link

Break­ing Cir­cuit Breakers

14 Jul 2024 18:57 UTC
53 points
13 comments1 min readLW link
(confirmlabs.org)

The case for un­learn­ing that re­moves in­for­ma­tion from LLM weights

Fabien Roger14 Oct 2024 14:08 UTC
102 points
18 comments6 min readLW link

Un­learn­ing Needs to be More Selec­tive [Progress Re­port]

27 Jun 2025 16:38 UTC
24 points
6 comments3 min readLW link
No comments.