New OpenAI Paper—Language models can explain neurons in language models

MrThink10 May 2023 7:46 UTC

47 points

Interpretability (ML & AI)Language Models (LLMs)AI

Summary by OpenAI: We use GPT-4 to automatically write explanations for the behavior of neurons in large language models and to score those explanations. We release a dataset of these (imperfect) explanations and scores for every neuron in GPT-2.

Link: https://openai.com/research/language-models-can-explain-neurons-in-language-models

Please share your thoughts in the the comments!

MrThink10 May 2023 7:46 UTC

47 points

14 comments1 min readLW link

Interpretability (ML & AI)Language Models (LLMs)AI