See the “mental math” section of Tracing the thoughts of large language models. They say that when Claude adds two numbers, there are “two paths”
If anyone wants to try investigating the paths for weirdly-behaved modular addition problems, I’ve found a problem where Gemma-2-2B messes up in a similar way; for 15 mod 7 = , the most likely next token is 8, followed by 1 (the correct answer).
Here it is available for GUI-driven circuit tracing on Neuronpedia using cross-layer transcoders:
I probably won’t have time to play with it because of the holiday, but it’s there for anyone who wants to! If you haven’t tried circuit tracing / attribution graphs before, you’ll probably want to check out the guide by clicking any of the question-mark-in-circle icons. Have fun! Feel free to DM me or reply to this comment if you’re stuck on something and I’ll help if I can :)
If anyone wants to try investigating the paths for weirdly-behaved modular addition problems, I’ve found a problem where Gemma-2-2B messes up in a similar way; for
15 mod 7 =, the most likely next token is8, followed by1(the correct answer).Here it is available for GUI-driven circuit tracing on Neuronpedia using cross-layer transcoders:
Circuit tracer: 15 mod 7 =
I probably won’t have time to play with it because of the holiday, but it’s there for anyone who wants to! If you haven’t tried circuit tracing / attribution graphs before, you’ll probably want to check out the guide by clicking any of the question-mark-in-circle icons. Have fun! Feel free to DM me or reply to this comment if you’re stuck on something and I’ll help if I can :)