So is Agent Foundations primarily about understanding the nature of agency so we can detect it and/or control it in artificial models, or does it also include the concept of equipping AI with the means of detecting and predictively modeling agency in other systems? Because I strongly suspect the latter will be crucial in solving the alignment problem.
The best definition I have at the moment sees agents as systems that actively maintain their internal state within a bounded range of viability in the face of environmental perturbations (which would apply to all living systems) and that can form internal representations of arbitrary goal states and use those representations to reinforce and adjust their behavior to achieve them. An AGI whose architecture is biased to recognize needs and goals in other systems, not just those matching human-specific heuristics, could be designed to adopt those predicted needs and goals as its own provisional objectives, steering the world toward its continually evolving best estimate of what other agentic systems want the world to be like. I think this would be safer, more robust, and more scalable than trying to define all human preferences up front.
These are just my thoughts. Take from them what you will.
I am not personally working on “equipping AI with the means of detecting and predictively modeling agency in other systems”, but I have heard other people talk about that cluster of ideas. I think it’s in-scope for agent foundations.
So is Agent Foundations primarily about understanding the nature of agency so we can detect it and/or control it in artificial models, or does it also include the concept of equipping AI with the means of detecting and predictively modeling agency in other systems? Because I strongly suspect the latter will be crucial in solving the alignment problem.
The best definition I have at the moment sees agents as systems that actively maintain their internal state within a bounded range of viability in the face of environmental perturbations (which would apply to all living systems) and that can form internal representations of arbitrary goal states and use those representations to reinforce and adjust their behavior to achieve them. An AGI whose architecture is biased to recognize needs and goals in other systems, not just those matching human-specific heuristics, could be designed to adopt those predicted needs and goals as its own provisional objectives, steering the world toward its continually evolving best estimate of what other agentic systems want the world to be like. I think this would be safer, more robust, and more scalable than trying to define all human preferences up front.
These are just my thoughts. Take from them what you will.
I am not personally working on “equipping AI with the means of detecting and predictively modeling agency in other systems”, but I have heard other people talk about that cluster of ideas. I think it’s in-scope for agent foundations.