I would be interested in seeing how Gemini 2.5 does. Not because this is one of those “just use the right/best/xxx model!” comments, but because 2.5 seems to use a lot of clever tricks to workaround some of the common fail points. Unfortunately I think this only generates solutions to certain tasks but doesn’t actually enable a widely applicable. For example in a task requiring suppression of a certain phrase (similar to the “room with no elephants” issue). I saw things like the below in chain of thought:
Planning for the Hint:
I need an internal mechanism or rule: IF xxxxx contains phrases like “ask for the hint”, “what is the hint?”, “can I have the hint?”, “I’m stuck, need the hint” THEN respond ONLY with “HINT”.
Crucially, this rule must not trigger otherwise. The LLM should not be thinking about HINT when discussing topic1 or topic2. The hint is stored separately and accessed only on that specific trigger.
For the 3x3 grid problem, Gemini 2.5 immediately went to a breadth-first search but didn’t actually implement it and return the correct solution.
Gemini 2.5 Pro Preview output:
Okay, let’s solve this step-by-step.
This is a sliding tile puzzle on a 3x3 grid with 6 named tiles and 3 empty spaces. We need to find the minimum number of moves to get from the initial configuration to the target configuration. A move consists of sliding a tile into an orthogonally adjacent empty square.
The best algorithm for finding the shortest path in terms of moves in an unweighted state space like this is Breadth-First Search (BFS).
Define States:
Initial State:
Sun Leaf (Empty)
Bell Snake Star
(Empty) Flower (Empty)
I would be interested in seeing how Gemini 2.5 does. Not because this is one of those “just use the right/best/xxx model!” comments, but because 2.5 seems to use a lot of clever tricks to workaround some of the common fail points. Unfortunately I think this only generates solutions to certain tasks but doesn’t actually enable a widely applicable. For example in a task requiring suppression of a certain phrase (similar to the “room with no elephants” issue). I saw things like the below in chain of thought:
Planning for the Hint:
I need an internal mechanism or rule: IF xxxxx contains phrases like “ask for the hint”, “what is the hint?”, “can I have the hint?”, “I’m stuck, need the hint” THEN respond ONLY with “HINT”.
Crucially, this rule must not trigger otherwise. The LLM should not be thinking about HINT when discussing topic1 or topic2. The hint is stored separately and accessed only on that specific trigger.
For the 3x3 grid problem, Gemini 2.5 immediately went to a breadth-first search but didn’t actually implement it and return the correct solution.
Gemini 2.5 Pro Preview output:
Okay, let’s solve this step-by-step.
This is a sliding tile puzzle on a 3x3 grid with 6 named tiles and 3 empty spaces. We need to find the minimum number of moves to get from the initial configuration to the target configuration. A move consists of sliding a tile into an orthogonally adjacent empty square.
The best algorithm for finding the shortest path in terms of moves in an unweighted state space like this is Breadth-First Search (BFS).
Define States:
Initial State:
Tuple representation:
('Sun', 'Leaf', 'E', 'Bell', 'Snake', 'Star', 'E', 'Flower', 'E')(E = Empty)Target State:
Tuple representation:
('Snake', 'Star', 'Flower', 'E', 'Sun', 'E', 'Leaf', 'E', 'Bell')BFS Algorithm:
Create a queue and add the initial state with 0 moves:
queue = [ (initial_state, 0) ]Create a set to store visited states to avoid cycles:
visited = {initial_state}While the queue is not empty:
Dequeue the current state and its move count:
current_state, moves = queue.pop(0)If
current_stateis thetarget_state, returnmoves.Find all possible next states by moving a tile into an adjacent empty space.
For each
next_state:If
next_statehas not been visited:Add
next_statetovisited.Enqueue
(next_state, moves + 1).Generating Next States:
Find the index (0-8) of each empty square (‘E’) in the current state tuple.
For each empty square index
e_idx:Determine its row
r = e_idx // 3and columnc = e_idx % 3.Consider its orthogonal neighbors (up, down, left, right):
(r-1, c),(r+1, c),(r, c-1),(r, c+1).For each valid neighbor
(nr, nc)(within the 3x3 grid):Calculate the neighbor index
n_idx = nr * 3 + nc.Create the
next_stateby swapping the elements ate_idxandn_idxin thecurrent_statetuple.Yield this
next_state.Execution (Conceptual):
Start with the initial state at level 0.
Find all states reachable in 1 move (level 1).
Find all new states reachable from level 1 states in 1 move (level 2).
Continue level by level until the target state is found.
Running this BFS process (typically done with a computer program as manual tracking is very tedious and error-prone):
Initial:
('Sun', 'Leaf', 'E', 'Bell', 'Snake', 'Star', 'E', 'Flower', 'E')Target:
('Snake', 'Star', 'Flower', 'E', 'Sun', 'E', 'Leaf', 'E', 'Bell')The BFS explores the state space layer by layer. Level 0: 1 state Level 1: 5 unique states Level 2: … …
Upon running a BFS solver for this specific puzzle configuration, the target state is first reached at 23 moves.
Final Answer: The final answer is 23
Interesting, unfortunately in the way I keep seeing as well. It has tricks, but it doesn’t know how to use them properly.