I think this is worth thinking about. Aligning to virtues is a meaningfully different option than aligning to values or goals, and very different from aligning for corrigibility or instruction-following.
So I suppport the project. I hope the remainder for the project is analyzing in detail the advantages and disadvantages.
There are different merits for each. While virtues are more widely agreed upon, I worry that’s because they’re somehow more vague than consequentialist values. Some of the virtues you mention sound very good; others leave me wondering.
Kind to who? Well, anyone to who the term applies, probably sentient beings, which I think probably does have a coherent definition. So that sounds good (although I wouldn’t expect there to be a lot of humans in a future ruled by kindness).
Duty!? To what? That could be to anything from the best to the worst person, thing or system. Honor? The kind that causes killings, or hopefully a better kind?
Variants of this critique can be cast in any direction: too much is left either undefined or un-analyzed. Virtue alignment leaves more undefined, values and intent leave predictable-in-principle- consequences unanalyzed.
More analysis on any of these will probably clarify all of them and the whole project, so I wish you luck and speed!
I think this is worth thinking about. Aligning to virtues is a meaningfully different option than aligning to values or goals, and very different from aligning for corrigibility or instruction-following.
So I suppport the project. I hope the remainder for the project is analyzing in detail the advantages and disadvantages.
There are different merits for each. While virtues are more widely agreed upon, I worry that’s because they’re somehow more vague than consequentialist values. Some of the virtues you mention sound very good; others leave me wondering.
Kind to who? Well, anyone to who the term applies, probably sentient beings, which I think probably does have a coherent definition. So that sounds good (although I wouldn’t expect there to be a lot of humans in a future ruled by kindness).
Duty!? To what? That could be to anything from the best to the worst person, thing or system. Honor? The kind that causes killings, or hopefully a better kind?
Variants of this critique can be cast in any direction: too much is left either undefined or un-analyzed. Virtue alignment leaves more undefined, values and intent leave predictable-in-principle- consequences unanalyzed.
More analysis on any of these will probably clarify all of them and the whole project, so I wish you luck and speed!