[Question] Would it be useful to collect the contexts, where various LLMs think the same?

My initial idea was Let’s see where the small, interpretable, model makes the same inference as the huge, ¯dangerous, model and focus on those cases in the small model to help explain the bigger one. Quite likely I am wrong, but with a tiny chance for good impact, I have set up a repository.
I would love your feedback on that direction before starting to actually generate the pairs/​sets of context+LMs that match on that context.

No comments.