Thanks for sharing. Though can you explain this phrasing in the abstract?:
Eventually, retargetable training procedures may train real-world agents which seek power over humans.
As I understand, agents inherently have some non-zero possibility of seeking power over humans, other agents, etc., by definition.
Thanks for sharing. Though can you explain this phrasing in the abstract?:
As I understand, agents inherently have some non-zero possibility of seeking power over humans, other agents, etc., by definition.