My friends(M.K.,he’s on Github) honorable aim to establish a term in the AI evals field: The cognitive asymetry, generating-verifying complexity gap for model-as-judge evals.
Various tasks that have a clear intelligence-to-solve vs. intelligence-to-verify-a-solution gap, ie. only X00-B LMs have a shot, but X-B model is strong on verifying are desired. It fits nicely to the incremental iterative alighnment scaling playbook, I hope.
My friends(M.K.,he’s on Github) honorable aim to establish a term in the AI evals field: The cognitive asymetry, generating-verifying complexity gap for model-as-judge evals.
Various tasks that have a clear intelligence-to-solve vs. intelligence-to-verify-a-solution gap, ie. only X00-B LMs have a shot, but X-B model is strong on verifying are desired.
It fits nicely to the incremental iterative alighnment scaling playbook, I hope.