“Can be successfully navigated towards” means that there exists a set of policies for the agent that is reachable via reinforcement learning on the goal objective, which would allow the agent to consistently achieve the goal when followed (barring any drastic changes to the environment, although the policy may account for environmental fluctuations).
Thanks for the paper on causal entropic forces, by the way. I hadn’t seen this research before, but it synergizes well with ideas I’ve been having related to alignment. At the risk of being overly reductive, I think we could do worse than designing an AGI that predictively models the goal distributions of other agents (i.e., humans) and generates as its own “terminal” goals those states that maximize the entropy of goal distributions reachable by the other agents. Essentially, seeking to create a world from which humans (and other systems) have the best chance at directing their own future.
“Can be successfully navigated towards” means that there exists a set of policies for the agent that is reachable via reinforcement learning on the goal objective, which would allow the agent to consistently achieve the goal when followed (barring any drastic changes to the environment, although the policy may account for environmental fluctuations).
Thanks for the paper on causal entropic forces, by the way. I hadn’t seen this research before, but it synergizes well with ideas I’ve been having related to alignment. At the risk of being overly reductive, I think we could do worse than designing an AGI that predictively models the goal distributions of other agents (i.e., humans) and generates as its own “terminal” goals those states that maximize the entropy of goal distributions reachable by the other agents. Essentially, seeking to create a world from which humans (and other systems) have the best chance at directing their own future.