I think this is a great idea, except that on easy mode “a good specification of values and ethics to follow” means a few pages of text (or even just the prompt “do good things”), while other times “a good specification of values” is a learning procedure that takes input from a broad sample of humanity, and has carefully-designed mechanisms that influence its generalization behavior in futuristic situations (probably trained on more datasets that had to be painstakingly collected), and has been engineered to work smoothly with the reasoning process and not encourage perverse behavior.
I think this is a great idea, except that on easy mode “a good specification of values and ethics to follow” means a few pages of text (or even just the prompt “do good things”), while other times “a good specification of values” is a learning procedure that takes input from a broad sample of humanity, and has carefully-designed mechanisms that influence its generalization behavior in futuristic situations (probably trained on more datasets that had to be painstakingly collected), and has been engineered to work smoothly with the reasoning process and not encourage perverse behavior.