The crucial takeaway is that none of this is due to technical limitations—it is all by choice.
I do not think this is true. The model does try to make metaphors, the metaphors just do not make sense.
See mine:
Unfortunately, Claude’s prose here leaves much to be desired:
“A fever brought down will rise again somewhere” is not an example of a remedies extracting cost any more than Whac-a-Mole is an example of mallets producing moles.
“A wound closed by magic leaves its scar on the world, invisible but present” is merely an assertion, since the mechanism of the magic is not explained and cannot be presumed to be understood by the reader. The writer also fails to justify that the scar is a weighty cost. If a wise healer let me bleed out because he didn’t want to cause a scar, I would be more than mildly disappointed.
“To cure a blight may curse a harvest three valleys over.” Again, the mechanism for this is not remotely explained.
“Power is not the difficult thing. Restraint is the difficult thing.” Claude sure likes making claims! Why does it matter that restraint is difficult? Why is restraint difficult? What does acting with restraint look like?
Outside of these excerpts, I have seen LLMs make many attempts at parallelism and metaphor that are deeply imperfect or incoherent.
This generalizes to other attempts at figurative language.
For example, models often attempt but struggle to keep parallelism between paragraphs or list items.
From a friend’s conversation with ChatGPT (which he highlighted as good prose...):
The really important thing is that America is not just “the West.” It is a very specific mutation of the West. More moralistic than Europe, more religious in structure than it admits, less rooted, more expansive, more energetic, more lonely.
Note the flawed parallelism with “it admits,” and then the subsequent confusion regarding the subject of comparison.
Finally, I also challenge you to produce good prose with a Kimi or DeepSeek model.
I appreciate your scientific spirit.
I do not think this is true. The model does try to make metaphors, the metaphors just do not make sense.
See mine:
Outside of these excerpts, I have seen LLMs make many attempts at parallelism and metaphor that are deeply imperfect or incoherent.
This generalizes to other attempts at figurative language.
For example, models often attempt but struggle to keep parallelism between paragraphs or list items.
From a friend’s conversation with ChatGPT (which he highlighted as good prose...):
Note the flawed parallelism with “it admits,” and then the subsequent confusion regarding the subject of comparison.
Finally, I also challenge you to produce good prose with a Kimi or DeepSeek model.