This seems interesting and novel to me, but (of course) I’m still skeptical.
I gave the relevant example of relatively well-understood values, preference for lower x-risks.
Preference for lower x-risk doesn’t seem “well-understood” to me, if we include in “x-risk” things like value drift/corruption, premature value lock-in, and other highly consequential AI-enabled decisions (potential existential mistakes) that depend on hard philosophical questions. I gave some specific examples in this recent comment. What do you think about the problems on that list? (Do you agree that they are serious problems, and if so how do you envision them being solved or prevented in your scenario?)
This seems interesting and novel to me, but (of course) I’m still skeptical.
Preference for lower x-risk doesn’t seem “well-understood” to me, if we include in “x-risk” things like value drift/corruption, premature value lock-in, and other highly consequential AI-enabled decisions (potential existential mistakes) that depend on hard philosophical questions. I gave some specific examples in this recent comment. What do you think about the problems on that list? (Do you agree that they are serious problems, and if so how do you envision them being solved or prevented in your scenario?)