Yes, I think this is why laypeople who are new to the field are going to be confused about why interpretability work on LLMs won’t be as simple as, “Uhh, obviously, just ask the LLM why it gave that answer, duh!” FYI, I recently wrote about this same topic as applied to the specific problem of Voynich translation:
Yes, I think this is why laypeople who are new to the field are going to be confused about why interpretability work on LLMs won’t be as simple as, “Uhh, obviously, just ask the LLM why it gave that answer, duh!” FYI, I recently wrote about this same topic as applied to the specific problem of Voynich translation:
Bing AI Generating Voynich Manuscript Continuations—It does not know how it knows