Upon more thought, I definitely agree with you more, but still sort of disagree.
You’re absolutely right that I wasn’t actually thinking about the kind of AI you were talking about. And evolution does reliably teach animals to have theory of mind. And if the training environment is at least sorta like our ancestral environment, it does seem natural that an AI would learn to draw the boundary around humans more or less the same way we do.
But our evolved theory of mind capabilities are still fairly anthropocentric, suited to the needs, interests, and capabilities of our ancestors, even when we can extend them a bit using abstract reasoning. Evolving an AI in a non-Earth environment in a non-human ecological niche, or optimizing an AI using an algorithm that diverges from evolution (e.g. by allowing more memorization of neuron weights) would give you different sorts of theories of mind.
Aside: I disagree that the examples I gave in the previous comment require verbal reasoning. They can be used nonverbally just fine. But using a model doesn’t feel like using a model, it feels like perceiving the world. E.g. I might say “birds fly south to avoid winter,” which sounds like a mere statement but actually inserts my own convenient model of the world (where “winter” is a natural concept) into a statement about birds’ goals.
An AI that’s missing some way of understanding the world that humans find natural might construct a model of our values that’s missing entire dimensions. Or an AI that understands the world in ways we don’t might naturally construct a model of our values that has a bunch of distinctions and details where we would make none.
What it means to “empower” some agent does seem more convergent than that. Maybe not perfectly convergent (e.g. evaluating social empowerment seems pretty mixed-up with subtle human instincts), but enough that I have changed my mind, and am no longer most concerned about the AI simply failing to locate the abstraction we’re trying to point to.
So it sounds like we are now actually mostly in agreement.
I agree there may be difficulties learning and grounding accurate mental models of human motivations/values into the AGI, but that is more reason to take the brain-like path with anthropomorphic AGI. Still I hedge between directly emulating human empathy/altruism vs using external empowerment. External empowerment may be simpler/easier to specify and thus more robust against failures to match human value learning more directly, but it also has it’s own potential specific failure modes (The AGI would want to keep you alive and wealthy, but it may not care about your suffering/pain as much as we’d like). But I do also suspect that it could turn out that human value learning follows a path of increasing generalization and robustness, starting with oldbrain social instincts as proxy to ground newbrain empathy learning which eventually generalizes widely to something more like external empowerment. At least some humans generally optimize for the well-being (non-suffering) and possibly empowerment of animals, and that expanded/generalized circle of empathy will likely include AI, even if it doesn’t obviously mimics human emotions.
Upon more thought, I definitely agree with you more, but still sort of disagree.
You’re absolutely right that I wasn’t actually thinking about the kind of AI you were talking about. And evolution does reliably teach animals to have theory of mind. And if the training environment is at least sorta like our ancestral environment, it does seem natural that an AI would learn to draw the boundary around humans more or less the same way we do.
But our evolved theory of mind capabilities are still fairly anthropocentric, suited to the needs, interests, and capabilities of our ancestors, even when we can extend them a bit using abstract reasoning. Evolving an AI in a non-Earth environment in a non-human ecological niche, or optimizing an AI using an algorithm that diverges from evolution (e.g. by allowing more memorization of neuron weights) would give you different sorts of theories of mind.
Aside: I disagree that the examples I gave in the previous comment require verbal reasoning. They can be used nonverbally just fine. But using a model doesn’t feel like using a model, it feels like perceiving the world. E.g. I might say “birds fly south to avoid winter,” which sounds like a mere statement but actually inserts my own convenient model of the world (where “winter” is a natural concept) into a statement about birds’ goals.
An AI that’s missing some way of understanding the world that humans find natural might construct a model of our values that’s missing entire dimensions. Or an AI that understands the world in ways we don’t might naturally construct a model of our values that has a bunch of distinctions and details where we would make none.
What it means to “empower” some agent does seem more convergent than that. Maybe not perfectly convergent (e.g. evaluating social empowerment seems pretty mixed-up with subtle human instincts), but enough that I have changed my mind, and am no longer most concerned about the AI simply failing to locate the abstraction we’re trying to point to.
So it sounds like we are now actually mostly in agreement.
I agree there may be difficulties learning and grounding accurate mental models of human motivations/values into the AGI, but that is more reason to take the brain-like path with anthropomorphic AGI. Still I hedge between directly emulating human empathy/altruism vs using external empowerment. External empowerment may be simpler/easier to specify and thus more robust against failures to match human value learning more directly, but it also has it’s own potential specific failure modes (The AGI would want to keep you alive and wealthy, but it may not care about your suffering/pain as much as we’d like). But I do also suspect that it could turn out that human value learning follows a path of increasing generalization and robustness, starting with oldbrain social instincts as proxy to ground newbrain empathy learning which eventually generalizes widely to something more like external empowerment. At least some humans generally optimize for the well-being (non-suffering) and possibly empowerment of animals, and that expanded/generalized circle of empathy will likely include AI, even if it doesn’t obviously mimics human emotions.