Do you think it is possible to capture the idealized utility function of humans merely from factual knowledge about the world dynamics? Is it possible to find the concept of “right” without specifying even the minimal instructions on how to find it in humans, by merely making AI search for a simple utility function underlying the factual observations (given that it won’t be remotely simple)? I started from this position several months back, but gradually shifted to requiring at least some anthropomorphic directions initiating the extraction of “right”, establishing a path by which representation of “right” in humans translates into representation of utility computation in AI. One of the stronger problems for me is that it’s hard to distinguish between observing terminal and instrumental optimization performed in the current circumstances, and you can’t start reiterating towards volition without knowing even approximately right direction of transformations to the agent (or whole environment, if agent is initially undefined). It would be interesting if it’s unnecessary, although of course plenty of safeguards are still in order, if there is an overarching mechanisms to wash them out eventually.
Do you think it is possible to capture the idealized utility function of humans merely from factual knowledge about the world dynamics? Is it possible to find the concept of “right” without specifying even the minimal instructions on how to find it in humans, by merely making AI search for a simple utility function underlying the factual observations (given that it won’t be remotely simple)? I started from this position several months back, but gradually shifted to requiring at least some anthropomorphic directions initiating the extraction of “right”, establishing a path by which representation of “right” in humans translates into representation of utility computation in AI. One of the stronger problems for me is that it’s hard to distinguish between observing terminal and instrumental optimization performed in the current circumstances, and you can’t start reiterating towards volition without knowing even approximately right direction of transformations to the agent (or whole environment, if agent is initially undefined). It would be interesting if it’s unnecessary, although of course plenty of safeguards are still in order, if there is an overarching mechanisms to wash them out eventually.