One element you’ve omitted so far is the relationship to Goodhart’s Law: the harder you optimize for something, the more likely and worse the unintended consequences tend to be if what you have specified should be optimized isn’t in fact exactly what you actually want.
A further missing element is Value Learning: rather than attempting to specify in detail what you want optimized, you could ask the optimizer to optimize both for finding out what you really want and then for optimizing that.
One element you’ve omitted so far is the relationship to Goodhart’s Law: the harder you optimize for something, the more likely and worse the unintended consequences tend to be if what you have specified should be optimized isn’t in fact exactly what you actually want.
A further missing element is Value Learning: rather than attempting to specify in detail what you want optimized, you could ask the optimizer to optimize both for finding out what you really want and then for optimizing that.
Ooh, Value Learning sounds cool – I’ll check that out.
And yup, explicitly noting Goodhart’s Law would have been nice.
Thanks for the comment!