1) Sure. 2) Okay. 3) Yup. 4) This is weasely. Sure, 1-3 are enough to establish that an okay outcome is possible, but don’t really say anything about probability. You also don’t talk about how good of an optimization process is trying to optimize these values.
5) Willing to assume for the sake of argument. 6) Certainly true but not certainly useful. 7) Doesn’t follow, unless you read 6 in a way that makes it potentially untrue.
All of this would make more sense if you tried to put probabilities to how likely you think certain outcomes are.
That was pretty much my take. I get the feeling that “okay” outcomes are a vanishingly small portion of probability space. This suggests to me that the additional marginal effort to stipulate “okay” outcomes instead of perfect CEV is extremely small, if not negative. (By negative, I mean that it would actually take additional effort to program an AI to maximize for “okay” outcomes instead of CEV.)
However, I didn’t want to ask a leading question, so I left it in the present form. It’s perhaps academically interesting that the desirability of outcomes as a function of “similarity to CEV” is a continuous curve rather than a binary good/bad step function. However, I couldn’t really see any way of taking advantage of this. I posted mainly to see if others might spot potential low hanging fruit.
I guess the interesting follow up questions are these: Is there any chance that humans are sufficiently adaptable that human values are more than just an infinitesimally small sliver of the set of all possible values? If so, is there any chance this enables an easier alternative version of the control problem? It would be nice to have a plan B.
1) Sure.
2) Okay.
3) Yup.
4) This is weasely. Sure, 1-3 are enough to establish that an okay outcome is possible, but don’t really say anything about probability. You also don’t talk about how good of an optimization process is trying to optimize these values.
5) Willing to assume for the sake of argument.
6) Certainly true but not certainly useful.
7) Doesn’t follow, unless you read 6 in a way that makes it potentially untrue.
All of this would make more sense if you tried to put probabilities to how likely you think certain outcomes are.
That was pretty much my take. I get the feeling that “okay” outcomes are a vanishingly small portion of probability space. This suggests to me that the additional marginal effort to stipulate “okay” outcomes instead of perfect CEV is extremely small, if not negative. (By negative, I mean that it would actually take additional effort to program an AI to maximize for “okay” outcomes instead of CEV.)
However, I didn’t want to ask a leading question, so I left it in the present form. It’s perhaps academically interesting that the desirability of outcomes as a function of “similarity to CEV” is a continuous curve rather than a binary good/bad step function. However, I couldn’t really see any way of taking advantage of this. I posted mainly to see if others might spot potential low hanging fruit.
I guess the interesting follow up questions are these: Is there any chance that humans are sufficiently adaptable that human values are more than just an infinitesimally small sliver of the set of all possible values? If so, is there any chance this enables an easier alternative version of the control problem? It would be nice to have a plan B.