AI Alignment and the Classical Humanist Tradition

Hi guys, I’d like to share a proposal regarding AI alignment. The proposal is that training AI in the curriculum of Classical Virtue ethics could be a promising approach to alignment. A) Because general virtues with many exemplifications can help us teach the AI what we would really want it to do, even when we can’t micromanage it, and B) because this pedagogy seems to be a good fit for AI’s style of learning more generally.

Background i) The Pedagogy of Classical Art

In the Classical Humanist Tradition the pedagogy of learning depends on whether one studies a subject, or practices an art. Art is understood as craft or skill, for example the art of speaking well and persuading (rhetoric), and the art of living (virtue ethics). Training in the arts is done by

  1. Studying general principles of the art

  2. Seeing the art practiced, either by seeing it done by a master, or by reading or otherwise studying exemplars

  3. Practicing the art (imitation, etc.)

cf. John Sellars, «The Art of Living», and Walker, «The Genuine Teachers of This Art»

Notably, 2) seems to be close to how AI learns – by reading information, and then using it as a base for its own imitation and application.

Background ii) Classical Humanist Ethics as Art

The Classical Humanist tradition of Virtue ethics operates in the wake of Socrates, and includes Plato, Aristotle, the Stoics, the Epicureans, and others. Later some Catholics took this up as well. They all practiced virtue as an art rather than as (just) a science.

The Stoic tradition is especially practical. As described by John Sellars, it has general and theoretical texts on the virtues, especially on the «Cardinal» virtues of justice, temperance, courage, and wisdom. And they also have moral biographies to read to see how exemplars have lived up to or failed to live up to these virtues. In this way one can learn and adopt the virtues by imitating the exemplars. And one can also get a lot of experience second-hand, by reading about their lives.

Massimo Pigliucci has created a beginner’s curriculum in the last chapter of his «Quest for Character». Donald Robertson has written on the same topic in his «How to Think Like a Roman Emperor». They are both writing in the tradition of Pierre Hadot: Philosophy as a Way of Life.

Classical Humanist Virtue Ethics and AI

The combination of general concepts of virtues (for example justice, benevolence, temperance, etc.), with many detailed exemplifications, might be an ideal way to teach an AI to do what we would deem wise and just, even in situations where we can’t micromanage the AI. And it seems to me that the Classical Humanist tradition has a pedagogy of virtue that might be a good fit for AI’s style of learning more broadly.

Suggestion

My suggestion is that we experiment with training an AI in the Classical humanist tradition of Virtue Ethics, and using the classical pedagogy of art, combined with theoretical treatment of the virtues, as well as practical examples, along the lines of Hadot, Sellars, Pigliucci and Robertson. For that I would need the help of someone with more technical skill.

(Sidenote: the above might primarily target “outer alignment”. For strengthening “inner alignment”, one could take advantage of the fact that the Classical Virtue tradition operates with self concept-models. If we could get the AI to adopt a virtuous self-concept, it might also become “inwardly aligned”. This is how human alignment with the virtues function.)

Appreciate any thoughts/​suggestions.