TAG comments on [missing post]

TAG 10 Dec 2019 11:09 UTC
1 point

Given a hard-coded agent that explicitly computed the consequences of its actions, and then took the action which maximized expected value according to its utility function, we would observe precisely the opposite behavior. Mutate a single line of code and the functionality of this agent would almost certainly be broken. However, the agent is still robust in the alignment sense, as its values will never drift

Why are its values immune? If its utility function is not made of code, what is it made of?