I’m not sure if this is the same thing, but I frequently talk to Claude about research ideas, and if the idea is close enough to a different idea that it knows about, it repeatedly collapses back into talking about the idea it’s familiar with.
One I remember from this week:
I’m looking into ways to make intermediate values more visible in the logit lens, and Claude really wants to talk about the tuned lens, which does the opposite of what I want[1]. Even if Claude itself has explained why this doesn’t make any sense, it will repeatedly suggest trying the tuned lens.
I feel like I had another case where it took forever to get it to grasp what I was even talking about, but I don’t remember the details unfortunately.
I’m not sure if this is the same thing, but I frequently talk to Claude about research ideas, and if the idea is close enough to a different idea that it knows about, it repeatedly collapses back into talking about the idea it’s familiar with.
One I remember from this week:
I’m looking into ways to make intermediate values more visible in the logit lens, and Claude really wants to talk about the tuned lens, which does the opposite of what I want[1]. Even if Claude itself has explained why this doesn’t make any sense, it will repeatedly suggest trying the tuned lens.
I feel like I had another case where it took forever to get it to grasp what I was even talking about, but I don’t remember the details unfortunately.
Specifically, the tuned lens makes the next token’s representation more clear and actively erases anything else.
Thank you for the example, this definitely counts in my mind.