One thing I’ve noticed is that current models like Claude 3.5 Sonnet can now generate non-trivial 100-line programs like small games that work in one shot and don’t have any syntax or logical errors. I don’t think that was possible with earlier models like GPT-3.5.
My impression is that they are getting consistently better at coding tasks of a kind that would show up in the curriculum of an undergrad CS class, but much more slowly improving at nonstandard or technical tasks.
One thing I’ve noticed is that current models like Claude 3.5 Sonnet can now generate non-trivial 100-line programs like small games that work in one shot and don’t have any syntax or logical errors. I don’t think that was possible with earlier models like GPT-3.5.
My impression is that they are getting consistently better at coding tasks of a kind that would show up in the curriculum of an undergrad CS class, but much more slowly improving at nonstandard or technical tasks.