There’s lots of coding training data and not very much training data for creating documents of a specific length. I think if we added a bunch of “Write ### words about X” training data the LLM’s would suddenly be good at it.
My point was that it’s surprising that AI is so bad at generalizing to tasks that it hasn’t been trained on. I would’ve predicted that generalization would be much better (I also added a link to a post with more examples). This is also why I think creating AGI will be very hard, unless there will be a massive paradigm shift (some new NN architecture or a new way to train NNs).
EDIT: It’s not “Gemini can’t count how many words it has in its output” that surprises me, it’s “Gemini can’t count how many words it has in its output, given that it can code in Python and in a dozen other languages and can also do calculus”.
My point was that it’s surprising that AI is so bad at generalizing to tasks that it hasn’t been trained on. I would’ve predicted that generalization would be much better (I also added a link to a post with more examples). This is also why I think creating AGI will be very hard, unless there will be a massive paradigm shift (some new NN architecture or a new way to train NNs).
EDIT: It’s not “Gemini can’t count how many words it has in its output” that surprises me, it’s “Gemini can’t count how many words it has in its output, given that it can code in Python and in a dozen other languages and can also do calculus”.
The AIs that are good at art and suck at math were probably a surprise for everyone.