A second, related thought is that whenever I read statements like “For example, while GPT4 scored very well on the math SAT, it still made elementary-school mistakes on basic arithmetic questions,” I think, “This is true of me, and AFAIK all humans, as well.” I think it is therefore mostly irrelevant to the core question, until and unless we can characterize important differences in when and why it makes such mistakes, compared to humans (which do exist, are getting studied and characterized).
The distribution of mistakes is very different, and, I think, illuminates the differences between human minds and LLMs. (Epistemic status: I have not thoroughly tested the distribution of AI mistakes against humans, nor have I read thorough research which tested it empirically. I could be wrong about the shape of these distributions.) It seems like LLM math ability cuts off much more sharply (around 8 digits I believe), whereas for humans, error rates are only going to go up slowly as we add digits.
This makes me somewhat more inclined towards slow timelines. However, it bears repeating that LLMs are not human-brain-sized yet. Maybe when they get to around human-brain-sized, the distribution of errors will look more human.
The distribution of mistakes is very different, and, I think, illuminates the differences between human minds and LLMs. (Epistemic status: I have not thoroughly tested the distribution of AI mistakes against humans, nor have I read thorough research which tested it empirically. I could be wrong about the shape of these distributions.) It seems like LLM math ability cuts off much more sharply (around 8 digits I believe), whereas for humans, error rates are only going to go up slowly as we add digits.
This makes me somewhat more inclined towards slow timelines. However, it bears repeating that LLMs are not human-brain-sized yet. Maybe when they get to around human-brain-sized, the distribution of errors will look more human.