Beginning resources for CEV research

I’ve been working on metaethics/​CEV research for a couple months now (publishing mostly prerequisite material) and figured I’d share some of the sources I’ve been using.

CEV sources.

Motivation. CEV extrapolates human motivations/​desires/​values/​volition. As such, it will help to understand how human motivation works.


Extrapolation. Is it plausible to think that some kind of extrapolation of human motivations will converge on a single motivational set? How would extrapolation work, exactly?

  • Reflective equilibrium. Yudkowsky’s proposed extrapolation works analogously to what philosophers call ‘reflective equilibrium.’ The most thorough work here is the 1996 book by Daniels, and there have been lots of papers, but this genre is only barely relevant for CEV. Basically, an entirely new literature on volition-extrapolation algorithms needs to be created.

  • Full-information accounts of value and ideal observer theories. This is what philosophers call theories of value that talk about ‘what we would want if we were fully informed, etc.’ or ‘what a perfectly informed agent would want’ like CEV does. There’s some literature on this, but it’s only marginally relevant to CEV. Again, an entirely new literature needs to be written to solve this problem.


Metaethics. Should we use CEV, or something else? What does ‘should’ mean?


Building the utility function. How can a seed AI be built? How can it read what to value?


Preserving the utility function. How can the motivations we put into a superintelligence be preserved over time and self-modifcation?


Reflective decision theory. Current decision theories tell us little about software agents that make decisions to modify their own decision-making mechanisms.


Additional suggestions welcome. I’ll try to keep this page up-to-date.