testingthewaters comments on Evals in the Age of Jarvis

testingthewaters 21 Sep 2025 20:51 UTC
2 points
0
I cannot help but see these as things AI companies are already doing to improve capabilities… Many even boast about a form of “data flywheel”. Wouldn’t building out these evals just give more targets for them to chase?
- Dinkar Juyal 24 Sep 2025 4:27 UTC
  1 point
  0
  Parent
  Isn’t that the point—if you can bring everything in-distribution via eval+feedback loop, then why worry about OOD generalization?