The latest generation of thinking models can definitely do agentic frontend development
But does that imply that they’re general-purpose competent agentic programmers? The answers here didn’t seem consistent with that. Does your experience significantly diverge from that?
My current model is that it’s the standard “jagged capabilities frontier” on a per-task basis, where LLMs are good at some sufficiently “templated” projects, and then they fall apart on everything else. Their proficiency at frontend development is then mostly a sign of frontend code being relatively standardized[1]; not of them being sufficiently agent-y.
I guess quantifying it as “20% of the way from an amateur to a human pro” isn’t necessarily incorrect, depending on how you operationalize this number. But I think it’s also arguable that they haven’t actually 100%’d even amateur general-coding performance yet.
I. e., that most real-world frontend projects have incredibly low description length if expressed in the dictionary of some “frontend templates”, with this dictionary comprehensively represented in LLMs’ training sets.
(To clarify: These projects’ Kolmogorov complexity can still be high, but their cross-entropy relative to said dictionary is low.
Importantly, the cross-entropy relative to a given competent programmer’s “template-dictionary” can also be high, creating the somewhat-deceptive impression of LLMs being able to handle complex projects. But that apparent capability would then fail to generalize to domains in which real-world tasks aren’t short sentences in some pretrained dictionary. And I think we are observing that with e. g. nontrivial backend coding?)
But does that imply that they’re general-purpose competent agentic programmers? The answers here didn’t seem consistent with that. Does your experience significantly diverge from that?
My current model is that it’s the standard “jagged capabilities frontier” on a per-task basis, where LLMs are good at some sufficiently “templated” projects, and then they fall apart on everything else. Their proficiency at frontend development is then mostly a sign of frontend code being relatively standardized[1]; not of them being sufficiently agent-y.
I guess quantifying it as “20% of the way from an amateur to a human pro” isn’t necessarily incorrect, depending on how you operationalize this number. But I think it’s also arguable that they haven’t actually 100%’d even amateur general-coding performance yet.
I. e., that most real-world frontend projects have incredibly low description length if expressed in the dictionary of some “frontend templates”, with this dictionary comprehensively represented in LLMs’ training sets.
(To clarify: These projects’ Kolmogorov complexity can still be high, but their cross-entropy relative to said dictionary is low.
Importantly, the cross-entropy relative to a given competent programmer’s “template-dictionary” can also be high, creating the somewhat-deceptive impression of LLMs being able to handle complex projects. But that apparent capability would then fail to generalize to domains in which real-world tasks aren’t short sentences in some pretrained dictionary. And I think we are observing that with e. g. nontrivial backend coding?)