I imagine we’d both agree that there can and should be a lot of evals and attempts at robustness / reliability for small / low-level systems? It seems like the disagreement is in how useful such work will be for critical and broader alignment challenges.
I imagine we’d both agree that there can and should be a lot of evals and attempts at robustness / reliability for small / low-level systems?
It seems like the disagreement is in how useful such work will be for critical and broader alignment challenges.