[Question] How optimistic should we be about AI figuring out how to interpret itself?

Intuitively, it seems to me that a not-that-powerful AI could do a really good job at interpreting other neural nets via some sort of human feedback for how “easy to understand” an explanation is. I would like to hear why this is right or wrong.

No answers.