I cannot help but see these as things AI companies are already doing to improve capabilities… Many even boast about a form of “data flywheel”. Wouldn’t building out these evals just give more targets for them to chase?
Isn’t that the point—if you can bring everything in-distribution via eval+feedback loop, then why worry about OOD generalization?
I cannot help but see these as things AI companies are already doing to improve capabilities… Many even boast about a form of “data flywheel”. Wouldn’t building out these evals just give more targets for them to chase?
Isn’t that the point—if you can bring everything in-distribution via eval+feedback loop, then why worry about OOD generalization?