Which seems like an important idea in general, but especially in interpretability for safety.
Specifically, error taxonomy is a subset of action by consequence taxonomy, which is the main goal of interpretability for safety (as it allows us to act on the fact that the model will take actions with bad consequences).
So I ran into this
https://www.youtube.com/watch?v=AF3XJT9YKpM
And I noticed a lot of talk about error taxonomy.
Which seems like an important idea in general, but especially in interpretability for safety.
Specifically, error taxonomy is a subset of action by consequence taxonomy, which is the main goal of interpretability for safety (as it allows us to act on the fact that the model will take actions with bad consequences).