Certainly making some of the hard-to-see stuff visible enough to iterate on is one of the main lines of attack; that’s the central reason why interpretability work is so valuable.
Certainly making some of the hard-to-see stuff visible enough to iterate on is one of the main lines of attack; that’s the central reason why interpretability work is so valuable.