I suggest rewriting this as “Present LLMs can’t reliably follow rules”. Doing so is clearer and reduces potential misreading. Saying “LLM” is often ambiguous; it could be the current SoTA, but sometimes it means an entire class.
Stronger claims, such as “Vanilla LLMs (without tooling) cannot and will not be able to reliably follow rule-sets as complicated as chess, even with larger context windows, better training, etc … and here is why.” would be very interesting, if there is evidence and reasoning behind them.
I suggest rewriting this as “Present LLMs can’t reliably follow rules”. Doing so is clearer and reduces potential misreading. Saying “LLM” is often ambiguous; it could be the current SoTA, but sometimes it means an entire class.
Stronger claims, such as “Vanilla LLMs (without tooling) cannot and will not be able to reliably follow rule-sets as complicated as chess, even with larger context windows, better training, etc … and here is why.” would be very interesting, if there is evidence and reasoning behind them.