I occasionally test LLMs by giving them a chess diagram and let them answer questions about the position ranging from very simple to requiring some calculation or insight.
Gemini 2.5 Pro also impressed me as the first LLM that could at least perceive the position correctly even if it quickly went off the rails as soon as some reasoning was required.
Contrary to manufacturing I expect this to get a lot better as soon as any of the labs makes an effort.
Why do you think labs aren’t making a focused effort on this problem? Is vision understanding not valuable for an automated software engineer or AI scientist?
I occasionally test LLMs by giving them a chess diagram and let them answer questions about the position ranging from very simple to requiring some calculation or insight.
Gemini 2.5 Pro also impressed me as the first LLM that could at least perceive the position correctly even if it quickly went off the rails as soon as some reasoning was required.
Contrary to manufacturing I expect this to get a lot better as soon as any of the labs makes an effort.
Why do you think labs aren’t making a focused effort on this problem? Is vision understanding not valuable for an automated software engineer or AI scientist?
I meant chess specific reasoning.