TLDR:I got stuck on notation [a][b][c][...]→f(a,b,c,...). LLMs probably won’t do much better on that for now. Translating into find an unknown f(*args) and the LLMs get it right with probability ~20% depending on the model. o3-mini-high does better. Sonnet 3.7 did get it one shot but I had it write code for doing substitutions which it messes up a lot.
Like others, I looked for some sort of binary operator or concatenation rule. Replacing “][” with “|” or ”,” would have made this trivial. Straight string substitutions don’t work since “[[]]” can be either 2 or “[...][1][...]” as part of a prime exponent set. The notation is the problem. Staring at string diffs would have helped in hindsight maybe.
Turning this into an unknown f() puzzle makes it straightforward for LLMs (and humans) to solve.
Substitutions are then quite easy though most of the LLMs screw up a substitution somewhere unless they use code to do string replacements or do thinking where they will eventually catch their mistake.
Then it’s ~25% likely they get it one shot. ~100% is you mention primes are involved or that addition isn’t. Depends on the LLM. o3-mini-high got it. Claude 3.7 got it one shot no hints from a fully substituted starting point but that was best of k~=4 with lots of failure otherwise. Models have strong priors for addition as a primitive and definitely don’t approach things systematically. Suggesting they focus on single operand evaluations (2,4,1/2,sqrt(2)) gets them on the right track but there’s still a bias towards addition.
TLDR:I got stuck on notation
[a][b][c][...]→f(a,b,c,...)
. LLMs probably won’t do much better on that for now. Translating into find an unknown f(*args) and the LLMs get it right with probability ~20% depending on the model. o3-mini-high does better. Sonnet 3.7 did get it one shot but I had it write code for doing substitutions which it messes up a lot.Like others, I looked for some sort of binary operator or concatenation rule. Replacing “][” with “|” or ”,” would have made this trivial. Straight string substitutions don’t work since “[[]]” can be either 2 or “[...][1][...]” as part of a prime exponent set. The notation is the problem. Staring at string diffs would have helped in hindsight maybe.
Turning this into an unknown
f()
puzzle makes it straightforward for LLMs (and humans) to solve.Substitutions are then quite easy though most of the LLMs screw up a substitution somewhere unless they use code to do string replacements or do thinking where they will eventually catch their mistake.
Then it’s ~25% likely they get it one shot. ~100% is you mention primes are involved or that addition isn’t. Depends on the LLM. o3-mini-high got it. Claude 3.7 got it one shot no hints from a fully substituted starting point but that was best of k~=4 with lots of failure otherwise. Models have strong priors for addition as a primitive and definitely don’t approach things systematically. Suggesting they focus on single operand evaluations (2,4,1/2,sqrt(2)) gets them on the right track but there’s still a bias towards addition.