p.b. comments on Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study

p.b. 15 Apr 2025 15:33 UTC
5 points
0
I occasionally test LLMs by giving them a chess diagram and let them answer questions about the position ranging from very simple to requiring some calculation or insight.
Gemini 2.5 Pro also impressed me as the first LLM that could at least perceive the position correctly even if it quickly went off the rails as soon as some reasoning was required.
Contrary to manufacturing I expect this to get a lot better as soon as any of the labs makes an effort.
- Tachikoma 15 Apr 2025 16:33 UTC
  1 point
  0
  Parent
  Why do you think labs aren’t making a focused effort on this problem? Is vision understanding not valuable for an automated software engineer or AI scientist?
  - p.b. 16 Apr 2025 11:05 UTC
    3 points
    0
    Parent
    I meant chess specific reasoning.