Czynski comments on Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle

Czynski 30 Mar 2025 6:31 UTC
19 points
0
Yep, that works for Gemini 2.5 as well, got it in one try. In fact, just “think like a mathematician” is enough. Post canceled, everybody go home.
- Knight Lee 31 Mar 2025 0:48 UTC
  7 points
  0
  Parent
  Wow that goes to show that reinforcement learning hasn’t even broken the prompt engineering barrier yet. It isn’t even summoning the LLM’s strongest character/simulacrum/Pokemon yet.
  - Boris Kashirin 31 Mar 2025 9:57 UTC
    1 point
    −3
    Parent
    When I see the question, I know I am on LW. It allows me to deduce that “arcane runes” part is not important, but LLM don’t have this context. Maybe it sounds like crackpot/astrology question to it?
    - Knight Lee 31 Mar 2025 10:41 UTC
      2 points
      0
      Parent
      Good question, though haha the actual prompt was:
      System Instructions: Be concise, specific, and where possible well mathematically justified.
      I have a notation for numbers. [] = 1, [[]] = 2, [[[]]] = 4. [][[]] = 3, [][][[]]=5. -[]= −1, [-[]] = ¹⁄₂, [[-[]]] = sqrt(2), [][[-[]]] = 3^(1/3). [][][][][][][][[]] = 19. What does this notation mean? How would I write 210^(2/15)?
- Yair Halberstadt 30 Mar 2025 9:13 UTC
  3 points
  0
  Parent
  Didn’t work consistently for me, even when I gave it multiple hints. YMMV.
- speck1447 6 Mar 2026 0:21 UTC
  0 points
  0
  Parent
  Knowing that all rational numbers can be represented is a big hint and would have cut at least my solution time in half. This is still probably a good test, and although I’m sure it’s been trained on, it’s not too hard to come up with “similar” puzzles where knowing about this one doesn’t immediately solve it.
  - Czynski 6 Mar 2026 17:00 UTC
    1 point
    0
    Parent
    I don’t think it’s been trained on and all present frontier models one-shot it.
    - speck1447 6 Mar 2026 18:44 UTC
      1 point
      0
      Parent
      I definitely expect that problems at this level of difficulty are within reach for present frontier models. That being said, as I understand it, most labs are still soliciting expert data and doing human-in-the-loop process reward modelling, and those that aren’t (mostly because they think RLVR is better and they have the spare compute) are still using the data they solicited in the past, or are distilling from models that used that data, or etc. etc. For basically the past two years, any math problem which is known to stump LLMs even occasionally is worth ~75 dollars to any contractor in any part of the world working as a data generator for companies like Scale AI. You should expect that any math problem which has been posted publicly, seen by more than ~50 people, and stated to be hard for LLMs in that time period has been trained on, detached from the canary string.