Why do you consider the ontology problem to be a complete show-stopper? It seems to me there are at least two other approaches to it that we can take:
By “show-stopper” I simply mean that we absolutely have to solve it in some way. Syntactic preference is one way, what you suggest could conceivably be another.
You claim that syntactic preference solves the ontology problem, but I have even fewer ideas about how to extract the syntactic preference of arbitrary programs.
An advantage I see with syntactic preference is that it’s at least more or less clear what are we working with: formal programs and strategies. This opens the whole palette of possible approaches to the remaining problems to try on. With “all mathematical structures” thing, we still don’t know what we are supposed to talk about, there is as of now no way forward already at that step. At least syntactic preference allows to make one step further, to firmer ground, even though admittedly it’s unclear what to do next.
second, we do know that we can’t write out our preference manually.
How do we know that? It’s not clear to me that there is any more evidence for “we can’t write out our preferences manually”, than for “we can’t build an artificial general intelligence manually”.
I mean the “complexity of value”/”value is fragile” thesis. It seems to me quite convincing, and from the opposite direction, I have the “preference is detailed” conjecture resulting from the nature of preference in general. For “is it possible to build AI”, we don’t have similarly convincing arguments (and really, it’s an unrelated claim that only contributes connotation of error in judgment, without giving an analogy in the method of arriving at that judgment).
I mean the “complexity of value”/”value is fragile” thesis.
I agree with “complexity of value” in the sense that human preference, as a mathematical object, has high information content. But I don’t see a convincing argument from this premise to the conclusion that the best course of action for us to take, in the sense of maximizing our values under the constraints that we’re likely to face, involves automated extraction of preferences, instead of writing them down manually.
Consider the counter-example of someone who has the full complexity of human values, but would be willing to give up all of their other goals to fill the universe with orgasmium, if that choice were available. Such an agent could “win” by building a superintelligence with just that one value. How do we know, at this point, that our values are not like that?
Whatever the case is with how acceptable the simplified values are, automated extraction of preference seems to be the only way to actually knowably win, rather than striking a compromise, which simplified preference is suggested to be. We must decide from information we have; how would you come to know that a particular simplified preference definition is any good? I don’t see a way forward without having a more precise moral machine than a human first (but then, we won’t need to consider simplified preference).
By “show-stopper” I simply mean that we absolutely have to solve it in some way. Syntactic preference is one way, what you suggest could conceivably be another.
An advantage I see with syntactic preference is that it’s at least more or less clear what are we working with: formal programs and strategies. This opens the whole palette of possible approaches to the remaining problems to try on. With “all mathematical structures” thing, we still don’t know what we are supposed to talk about, there is as of now no way forward already at that step. At least syntactic preference allows to make one step further, to firmer ground, even though admittedly it’s unclear what to do next.
I mean the “complexity of value”/”value is fragile” thesis. It seems to me quite convincing, and from the opposite direction, I have the “preference is detailed” conjecture resulting from the nature of preference in general. For “is it possible to build AI”, we don’t have similarly convincing arguments (and really, it’s an unrelated claim that only contributes connotation of error in judgment, without giving an analogy in the method of arriving at that judgment).
I agree with “complexity of value” in the sense that human preference, as a mathematical object, has high information content. But I don’t see a convincing argument from this premise to the conclusion that the best course of action for us to take, in the sense of maximizing our values under the constraints that we’re likely to face, involves automated extraction of preferences, instead of writing them down manually.
Consider the counter-example of someone who has the full complexity of human values, but would be willing to give up all of their other goals to fill the universe with orgasmium, if that choice were available. Such an agent could “win” by building a superintelligence with just that one value. How do we know, at this point, that our values are not like that?
Whatever the case is with how acceptable the simplified values are, automated extraction of preference seems to be the only way to actually knowably win, rather than striking a compromise, which simplified preference is suggested to be. We must decide from information we have; how would you come to know that a particular simplified preference definition is any good? I don’t see a way forward without having a more precise moral machine than a human first (but then, we won’t need to consider simplified preference).