Afterimage comments on Debunk the myth -Testing the generalized reasoning ability of LLM

Afterimage 12 Apr 2025 8:15 UTC
1 point
0
It does seem like LLMs struggle with “trick” questions that are ironically close to well known trick questions but with an easier answer. Simple Bench is doing much the same thing and models do seem to be improving over time. I guess the important question is whether this flaw will effect more sophisticated work.
On another note I find your question 2 to be almost incomprehensible and my first instinct would be to try to trap the bug by feeling for it with my hands.