To rephrase my comment on your previous post, I think the right solution isn’t to extrapolate our preferences, but to extrapolate our philosophical abilities and use that to figure out what to do with our preferences. There’s no unique way to repair a utility function that assumes a wrong model of the world, or reconcile two utility functions within one agent, but if the agent is also a philosopher there might be hope.
Many philosophical problems seem to have correct solutions, so I have some hope. For example, the Absent-Minded Driver problem is a philosophical problem with a clear correct solution. Formalizing the intuitive process that leads to solving such problems might be safer than solving them all up front (possibly incorrectly) and coding the solutions into FAI.
Why? vNM utility maximization seems like a philosophical idea that’s clearly on the right track. There might be other such ideas about being friendly to imperfect agents.
To rephrase my comment on your previous post, I think the right solution isn’t to extrapolate our preferences, but to extrapolate our philosophical abilities and use that to figure out what to do with our preferences. There’s no unique way to repair a utility function that assumes a wrong model of the world, or reconcile two utility functions within one agent, but if the agent is also a philosopher there might be hope.
Do you expect that there will be a unique way of doing this, too?
Many philosophical problems seem to have correct solutions, so I have some hope. For example, the Absent-Minded Driver problem is a philosophical problem with a clear correct solution. Formalizing the intuitive process that leads to solving such problems might be safer than solving them all up front (possibly incorrectly) and coding the solutions into FAI.
It seems that the problems to do with rationality have correct solutions, but not the problems to do with values.
Why? vNM utility maximization seems like a philosophical idea that’s clearly on the right track. There might be other such ideas about being friendly to imperfect agents.
vNM is rationality—decisions.
Being friendly to imperfect agents is something I’ve seen no evidence for; it’s very hard to even define.