I think I understand now. My best guess is that if your proof was applied to my example the conclusion would be that my example only pushes the problem back. To specify human values via a method like I was suggesting, you would still need to specify the part of the algorithm that “feels like” it has values, which is a similar type of problem.
I think I hadn’t grokked that your proof says something about the space of all abstract value/knowledge systems whereas my thinking was solely about humans. As I understand it, an algorithm that picks out human values from a simulation of the human brain will correspondingly do worse on other types of mind.
I think I understand now. My best guess is that if your proof was applied to my example the conclusion would be that my example only pushes the problem back. To specify human values via a method like I was suggesting, you would still need to specify the part of the algorithm that “feels like” it has values, which is a similar type of problem.
I think I hadn’t grokked that your proof says something about the space of all abstract value/knowledge systems whereas my thinking was solely about humans. As I understand it, an algorithm that picks out human values from a simulation of the human brain will correspondingly do worse on other types of mind.