I mostly agree, but instead of having LLMs play board games and be arguably selected for war-like or politics-like capabilities one could also use the ARC-AGI-3-like benchmark where agents are to solve puzzles. Or group theoretic or combinatorial problems in text form[1], but I don’t understand how to tell simple ones and complex ones apart.
An example of a group theoretic problem
Consider the array [1,...,n] and permutations A which flips the entire array, B which flips the first n-2 elements, leaving the two others invariant, C which flips the first n-3 elements, leaving the three others invariant. Does there exist a nontrivial sequence independent on n where no operation is applied twice in a row, but we return to the beginning for any n?
Hint 1
The compositions AB and BC are like cycles.
Hint 2
If a permutation moves nearly everything twice to the right, while another moves nearly everything once to the right, then how can we combine them to obtain the permutation which moves just a few elements?
Hint 3
A permutation moving just a few elements around is transformable into the identity by applying it many times.
Answer
(AB(CB)^2)^4.
For example, Grok 4 (whom I, unfortunately, asked in Russian) conducted a BFS and, of course, didn’t find anything worthy of attention. I hope that benchmarks like that won’t end up being contaminated and will instead be reliable indicators…
I mostly agree, but instead of having LLMs play board games and be arguably selected for war-like or politics-like capabilities one could also use the ARC-AGI-3-like benchmark where agents are to solve puzzles. Or group theoretic or combinatorial problems in text form[1], but I don’t understand how to tell simple ones and complex ones apart.
An example of a group theoretic problem
Consider the array [1,...,n] and permutations A which flips the entire array, B which flips the first n-2 elements, leaving the two others invariant, C which flips the first n-3 elements, leaving the three others invariant. Does there exist a nontrivial sequence independent on n where no operation is applied twice in a row, but we return to the beginning for any n?
Hint 1
The compositions AB and BC are like cycles.
Hint 2
If a permutation moves nearly everything twice to the right, while another moves nearly everything once to the right, then how can we combine them to obtain the permutation which moves just a few elements?
Hint 3
A permutation moving just a few elements around is transformable into the identity by applying it many times.
Answer
(AB(CB)^2)^4.
For example, Grok 4 (whom I, unfortunately, asked in Russian) conducted a BFS and, of course, didn’t find anything worthy of attention. I hope that benchmarks like that won’t end up being contaminated and will instead be reliable indicators…
Which LLMs, being trained on texts, could find easier than basic physical tasks which LLMs failed at least on April 14, before the release of o3.