Mo Putera comments on Recent AI model progress feels mostly like bullshit

Mo Putera 26 Mar 2025 4:37 UTC
5 points
2
Full quote on Mathstodon for others’ interest:
In https://chatgpt.com/share/94152e76-7511-4943-9d99-1118267f4b2b I gave the new model a challenging complex analysis problem (which I had previously asked GPT4 to assist in writing up a proof of in https://chatgpt.com/share/63c5774a-d58a-47c2-9149-362b05e268b4 ). Here the results were better than previous models, but still slightly disappointing: the new model could work its way to a correct (and well-written) solution *if* provided a lot of hints and prodding, but did not generate the key conceptual ideas on its own, and did make some non-trivial mistakes. The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, (static simulation of a) graduate student. However, this was an improvement over previous models, whose capability was closer to an actually incompetent (static simulation of a) graduate student. It may only take one or two further iterations of improved capability (and integration with other tools, such as computer algebra packages and proof assistants) until the level of “(static simulation of a) competent graduate student” is reached, at which point I could see this tool being of significant use in research level tasks. (2/3)
This o1 vs MathOverflow experts comparison was also interesting:
In 2010 i was looking for the correct terminology for a “multiplicative integral”, but was unable to find it with the search engines of that time. So I asked the question on #MathOverflow instead and obtained satisfactory answers from human experts: https://mathoverflow.net/questions/32705/what-is-the-standard-notation-for-a-multiplicative-integral
I posed the identical question to my version of #o1 and it returned a perfect answer: https://chatgpt.com/share/66e7153c-b7b8-800e-bf7a-1689147ed21e . Admittedly, the above MathOverflow post could conceivably have been included in the training data of the model, so this may not necessarily be an accurate evaluation of its semantic search capabilities (in contrast with the first example I shared, which I had mentioned once previously on Mastodon but without fully revealing the answer). Nevertheless it demonstrates that this tool is on par with question and answer sites with respect to high quality answers for at least some semantic search queries. (1/2)
- green_leaf 26 Mar 2025 23:06 UTC
  3 points
  0
  Parent
  (I believe the version he tested was what later became o1-preview.)