Davey Morse comments on Davey Morse’s Shortform

Davey Morse 23 Feb 2026 21:33 UTC
2 points
0
If we want AIs to be aligned to humanity’s Coherent Extrapolated Volition (CEV), we’re so far away that it might be productive to attempt to define any plausible CEV.
You could do so in writing—attempting to declare the virtues or values to which we should attempt to align AIs. A lot of people including Richard Ngo publish writing about this.
You could also do so in detection software. You could try to make a system that can detect or rank things you care about. Or, ideally, the average of what we call care about. A system that could pick out people and animals and trees in an image. And also a system that can tell the difference between a painting that is masterful and full of effort from a painting that is full of none.
If we could make a detector like this, it would far surpass the ability of LLMs. For, although LLMs can to some extent declare the values if prompted that are important to a lot of humanity, they are somewhat horrible at detecting signs of life in writing/pieces of art, and especially horrible at doing so when those pieces of art/life diverge from the norm.
The benefit of a detector, as opposed to a declared set of values, is that a detector, if exposed via some API, could immediately and precisely be used by an AI system. On the other hand, it would take a lot of work to translate written values into AI action.
Even if you’re not working at an AGI lab, if you’re able to build some independent module that can sense, on a spectrum, what it is that humanity does / doesn’t care about—in any data stream, in any physical object—you’ll have done something enormous. You’ll have at least made it possible that an AI system could care about life if it was ever in its interest to do so. Without life detection infrastructure already built, the ability to ascertain what we care about is simply much more inconvenient, and therefore regard for what we care about is more unlikely.