It’s been a while since I checked: can any of the frontier models pass the 2-4-6 task for a novel rule yet?
Grok 4 was able to guess my rule of “three rational numbers.” Haven’t tested out other models yet.
https://grok.com/share/c2hhcmQtMw%3D%3D_748b1b41-eda9-4619-868e-5bb4cb022d50
EDIT: Claude Opus 4 is also able to guess the rule on the first attempt.
https://claude.ai/share/4dcd8fcf-4fcb-4d48-a18f-70c56a9c4be7
It’s been a while since I checked: can any of the frontier models pass the 2-4-6 task for a novel rule yet?
Grok 4 was able to guess my rule of “three rational numbers.” Haven’t tested out other models yet.
https://grok.com/share/c2hhcmQtMw%3D%3D_748b1b41-eda9-4619-868e-5bb4cb022d50
EDIT: Claude Opus 4 is also able to guess the rule on the first attempt.
https://claude.ai/share/4dcd8fcf-4fcb-4d48-a18f-70c56a9c4be7