As a newbie, I am trying to comprehend the AI alignment field. To get a qualitative or quantitative verdict of a test subject’s alignment, we need evaluations for AI. I was wondering if there is an anchor point for evaluation categories and resources.
A proposal for ‘principles as key objectives of AI alignment’: Robustness, Interpretability, Controllability, and Ethicality (RICE) in AI Alignment: A Comprehensive Survey and further identifies categories of research as Learning from Feedback, Learning from Distribution Shift, Assurance and Governance
These are great resources with somewhat overlapping concepts. I am curious if the community considers any of these (or anything else) as generally accepted taxonomy and catalog of AI evaluations?
[Question] Is there a taxonomy & catalog of AI evals?
As a newbie, I am trying to comprehend the AI alignment field. To get a qualitative or quantitative verdict of a test subject’s alignment, we need evaluations for AI. I was wondering if there is an anchor point for evaluation categories and resources.
What I found so far
A Systematic Literature Review of AI Safety Evaluation Methods identifies evaluation target properties (Capability, Propensity, Control), techniques (Behavioral, Internal) and Frameworks (Model-Organism/Technical and Governance)
Evaluations chapter in AI Safety Atlas
Draft catalog from AI Verify
ADeLe
A proposal for ‘principles as key objectives of AI alignment’: Robustness, Interpretability, Controllability, and Ethicality (RICE) in AI Alignment: A Comprehensive Survey and further identifies categories of research as Learning from Feedback, Learning from Distribution Shift, Assurance and Governance
GitHub registries: awesome-ai-eval, Open AI evals registry
Another survey
These are great resources with somewhat overlapping concepts. I am curious if the community considers any of these (or anything else) as generally accepted taxonomy and catalog of AI evaluations?
We do need a formal science of evaluations. The formalism would expose framework gaps and would provoke inquiries like value-alignment and solutions like understanding-based evaluations.