I think this is going to be wrong as an approach. Weight and temperature are properties of physical systems at specific points in time, and can be measured coherently because we understand laws about those systems. Alignment could be measured as a function of a particular system at a specific point in time, once we have a clear understanding of what? All of human values?
I’m not arguing that “alignment” specifically is the thing we should be measuring.
More generally, a useful mantra is “we do not get to choose the ontology”. In this context, it means that there are certain things which are natural to measure (like temperature and weight), and we do not get to pick what they are; we have to discover what they are.
That’s correct. My point is that measuring goals which are not natural to measure will, in general, have many more problems with Goodharting and similar misoptimization and overoptimization pressures. And other approaches can be more productive, or at least more care is needed with design of metrics rather than discovery of what to measure and how.
I think this is going to be wrong as an approach. Weight and temperature are properties of physical systems at specific points in time, and can be measured coherently because we understand laws about those systems. Alignment could be measured as a function of a particular system at a specific point in time, once we have a clear understanding of what? All of human values?
I’m not arguing that “alignment” specifically is the thing we should be measuring.
More generally, a useful mantra is “we do not get to choose the ontology”. In this context, it means that there are certain things which are natural to measure (like temperature and weight), and we do not get to pick what they are; we have to discover what they are.
That’s correct. My point is that measuring goals which are not natural to measure will, in general, have many more problems with Goodharting and similar misoptimization and overoptimization pressures. And other approaches can be more productive, or at least more care is needed with design of metrics rather than discovery of what to measure and how.