It can’t do math reliably? Let it use a calculator. It improves with more parameters? Scale it up and see if it helps even more.
It’s worth pointing out that using a calculator was a hack and modern LLMs with reasonable tokenizers can do fairly impressive math without a calculator (much better than most humans).
It’s worth pointing out that using a calculator was a hack and modern LLMs with reasonable tokenizers can do fairly impressive math without a calculator (much better than most humans).
I hadn’t been following that story, but I’d heard that tokenization problems were a likely issue, so interesting to see confirmation.