More speculatively, UNDO’ing deception or sycophancy.
That would be pretty sweet
Another experiment idea: testing whether the reduction in hallucinations that Yao et al. achieved with unlearning can be made robust.
That would be pretty sweet
Another experiment idea: testing whether the reduction in hallucinations that Yao et al. achieved with unlearning can be made robust.