Mirror: An Automated Journal of AI Interpretability is a fully automated journal of AI interpretability. This journal features original research composed, conducted, and written entirely by LLMs analyzing LLMs. Much of the research published in Mirror falls within the category of “mechanistic interpretability,” in which model behaviors are decomposed into operations in the model’s internal representation space, but any rigorous research advancing our understanding of LLMs is welcome, be it mechanistic, behavioral, or theoretical.
I would be curious to know if others get value from this “fully automated journal”
Since publishing this we noticed the following initiative: https://future-science.org/mirror
I would be curious to know if others get value from this “fully automated journal”