And it feels like becoming a winner means consistently winning.
Reminds me strongly of the difficulty of accepting commitment strategies in decision theories as in Parfit’s Hitchhiker: one gets the impression that a rational agent win-oriented should win in all situations (being greedy); but in reality, this is not always what winning looks like (optimal policy rather than optimal actions).
Stop worrying about finding “outer objectives” which are safe to maximize.[9]I think that you’re not going to get an outer-objective-maximizer (i.e. an agent which maximizes the explicitly specified reward function).
Instead, focus on building good cognition within the agent.
In my ontology, there’s only one question: How do we grow good cognition inside of the trained agent?
How does this relate to the goal/path confusion? Alignment paths strategies:
Outer + Inner alignment aims to be an alignment strategy, but only the straight path one. Any homotopic alignment path could be safe as well, our only real concern.
Reminds me strongly of the difficulty of accepting commitment strategies in decision theories as in Parfit’s Hitchhiker: one gets the impression that a rational agent win-oriented should win in all situations (being greedy); but in reality, this is not always what winning looks like (optimal policy rather than optimal actions).
Let’s try apply this to a more confused topic. Risky. Recently I’ve slighty updated away from the mesa paradigm, reading the following in Reward is not the optimization target:
How does this relate to the goal/path confusion? Alignment paths strategies:
Outer + Inner alignment aims to be an alignment strategy, but only the straight path one. Any homotopic alignment path could be safe as well, our only real concern.