It did a very nice job designing and building a web UI for an unusually messy and inconsistent database setup.
As a conversational sounding board, it has been getting caught up in overly facile and ultimately complimentary analogies to the point I don’t really want to talk it as much as I might have 4.5. Sample size: Way too small to conclude anything.
It has been surprisingly good in math-related conversations but I don’t have a great baseline with 4.5 for comparison.
This seems like a pretty brutal test.
My experiences with Opus 4.6 so far are mixed:
It did a very nice job designing and building a web UI for an unusually messy and inconsistent database setup.
As a conversational sounding board, it has been getting caught up in overly facile and ultimately complimentary analogies to the point I don’t really want to talk it as much as I might have 4.5. Sample size: Way too small to conclude anything.
It has been surprisingly good in math-related conversations but I don’t have a great baseline with 4.5 for comparison.