This is a really cool post. Do you have books/blogs recommendations for digging into non-western philosophies?
On the philosophical tendencies you see, I would like to point some examples which don’t follow these tendencies. But on the whole I agree with your assessment.
For symbolic AI, I would say a big part of the Agents Foundations researchers (which includes a lot of MIRI researchers and people like John S. Wentworth) definitely do consider symbolic AI in their work. I won’t go as far as saying that they don’t care about connectionism, but I definitely don’t get a “connectionism or nothing” vibe from them
For cognitivist AI, examples of people thinking in terms of internal behaviors are Evan Hubinger from MIRI and Richard Ngo, who worked at DeepMind and is now doing a PhD in Philosophy at Oxford.
For reasonableness/sense-making, researchers on Debate (like Beth Barnes, Paul Christiano, Joe Collman) and the people I mentioned in the symbolic AI point seem to also consider more argumentation and logical forms of rationality (in combination with decision theoretic reasoning)
4. Pluralism as respect for the equality and autonomy of persons
This feels like something that a lot of current research focuses on. Most people trying to learn values and preferences focus on the individual preferences of people at a specific point in time, which seems pretty good for respecting the differences in value. The way this wouldn’t work would be if the specific formalism (like utility functions over histories) was really biased against some forms of value.
Furthermore, when it comes to human values, then at least in some domains (e.g. what is beautiful, racist, admirable, or just), we ought to identify what’s valuable not with the revealed preference or even the reflective judgement of a single individual, but with the outcome of some evaluative social process that takes into account pre-existing standards of valuation, particular features of the entity under evaluation, and potentially competing reasons for applying, not applying, or revising those standards.
As it happens, this anti-individualist approach to valuation isn’t particularly prominent in Western philosophical thought (but again, see Anderson). Perhaps then, by looking towards philosophical traditions like Confucianism, we can develop a better sense of how these normative social processes should be modeled.
Do you think this relates to idea like computational social choice? I guess the difference with the latter comes from it taking individual preferences as building blocks, where you seem to want community norms as primitives.
I definitely don’t know Confucianism enough for discussing it in this context, but I’m really not convinced by the value of all social norms. For some (like those around language, and morality), the Learning Normativity agenda of Abram feels relevant.
I think this methodology is actually really promising way to deal with the question of ontological shifts. Rather than framing ontological shifts as quasi-exogenous occurrence that agents have to respond to, it frames them as meta-cognitive choices that we select with particular ends in mind.
My first reaction is horror at imagining how this approach could allow an AGI to take a decision with terrible consequences for humans, and then change its concept to justify it to itself. Being more charitable with your proposal, I do think that this can be a good analysis perspective, especially for understanding reward tampering problems. But I want the algorithm/program dealing with ontological crises to keep some tethers to important things I want it aligned to. So in some sense, I want AGIs to be some for of realists according to concepts like corrigibility and impact.
The worry here is that consciousness may have evolved in animals because it serves some function, and so, AI might only reach human-level usefulness if it is conscious. And if it is conscious, it could suffer. Most of us who care about sentient beings besides humans would want to make sure that AI doesn’t suffer — we don’t want to create a race of artificial slaves. So that’s why it might be really important to figure out whether agents can have functional consciousness without suffering.
I’m significantly more worried about AGI creating terrible suffering in humans than about AIs and AGIs themselves suffering. This is probably an issue with my moral circle, but I still stand by that priority. That being said, I’m not for suffering for no reason whatsoever. So finding ways to limit this suffering without compromising alignment seems worthwhile. Thanks for pointing me to this question and this paper.
This is a really cool post. Do you have books/blogs recommendations for digging into non-western philosophies?
On the philosophical tendencies you see, I would like to point some examples which don’t follow these tendencies. But on the whole I agree with your assessment.
For symbolic AI, I would say a big part of the Agents Foundations researchers (which includes a lot of MIRI researchers and people like John S. Wentworth) definitely do consider symbolic AI in their work. I won’t go as far as saying that they don’t care about connectionism, but I definitely don’t get a “connectionism or nothing” vibe from them
For cognitivist AI, examples of people thinking in terms of internal behaviors are Evan Hubinger from MIRI and Richard Ngo, who worked at DeepMind and is now doing a PhD in Philosophy at Oxford.
For reasonableness/sense-making, researchers on Debate (like Beth Barnes, Paul Christiano, Joe Collman) and the people I mentioned in the symbolic AI point seem to also consider more argumentation and logical forms of rationality (in combination with decision theoretic reasoning)
This feels like something that a lot of current research focuses on. Most people trying to learn values and preferences focus on the individual preferences of people at a specific point in time, which seems pretty good for respecting the differences in value. The way this wouldn’t work would be if the specific formalism (like utility functions over histories) was really biased against some forms of value.
Do you think this relates to idea like computational social choice? I guess the difference with the latter comes from it taking individual preferences as building blocks, where you seem to want community norms as primitives.
I definitely don’t know Confucianism enough for discussing it in this context, but I’m really not convinced by the value of all social norms. For some (like those around language, and morality), the Learning Normativity agenda of Abram feels relevant.
My first reaction is horror at imagining how this approach could allow an AGI to take a decision with terrible consequences for humans, and then change its concept to justify it to itself. Being more charitable with your proposal, I do think that this can be a good analysis perspective, especially for understanding reward tampering problems. But I want the algorithm/program dealing with ontological crises to keep some tethers to important things I want it aligned to. So in some sense, I want AGIs to be some for of realists according to concepts like corrigibility and impact.
I’m significantly more worried about AGI creating terrible suffering in humans than about AIs and AGIs themselves suffering. This is probably an issue with my moral circle, but I still stand by that priority. That being said, I’m not for suffering for no reason whatsoever. So finding ways to limit this suffering without compromising alignment seems worthwhile. Thanks for pointing me to this question and this paper.