My hypothesis is that a given model might succeed on some coding task 80% of the time, and people are well-calibrated to that level of success. Then a new model comes out and succeeds on that coding task first try, feeling like 100% (but n=1). They run to social media and say “this is amazing!!!” then get back to work. Over the next few weeks, they try dozens more times and it sometimes fails, and they perceive the model drop from 100% to a more realistic 90%. They run to social media and say “the model is so dumb now, it fails tasks it used to do!”. They then become well calibrated to the model‘s new 90% level, and they are now primed and ready to repeat the cycle.
My hypothesis is that a given model might succeed on some coding task 80% of the time, and people are well-calibrated to that level of success. Then a new model comes out and succeeds on that coding task first try, feeling like 100% (but n=1). They run to social media and say “this is amazing!!!” then get back to work. Over the next few weeks, they try dozens more times and it sometimes fails, and they perceive the model drop from 100% to a more realistic 90%. They run to social media and say “the model is so dumb now, it fails tasks it used to do!”. They then become well calibrated to the model‘s new 90% level, and they are now primed and ready to repeat the cycle.