CEV is not meant to depend on the state of human society. It is supposed to be derived from “human nature”, e.g. genetically determined needs, dispositions, norms and so forth, that are characteristic of our species as a whole
(This is false. CEV is a process that combines extrapolated volitions of individual humans, which is meant to depend fully on the state of every particular person and their wishes about how they wish are to be extrapolated. See the value theory and the metaethics sequences, in particular, stuff like this, as well as the CEV Arbital page. E.g., CEV of humanity is plausibly very different from the CEV of ancient Greeks, who might even, on reflection, want to die gloriously in battles.)
CEV is a process that combines extrapolated volitions of individual humans, which is meant to depend fully on the state of every particular person and their wishes about how they wish are to be extrapolated
I think it is unclear what the exact initial data are supposed to be, or needs to be.
The value system that CEV outputs is going to be abstract at some level. It won’t say directly “if someone has a toothache, fix the toothache”; that should follow from a more general principle, combined with the nature of toothaches. The same goes for the extrapolations of individuals and the aggregations of their preferences: the CEV value system in action has to care about particulars, but what it does with those particulars, will be governed by an abstract definition.
The question is, what do we need to know about humanity, in order for CEV’s Value Extrapolation Procedure to arrive at the correct abstract definition? This is hard to answer if we don’t know what the VEP is in any detail. But finding a correct VEP is also part of the process.
Apparently a popular proposal for the VEP is something like “upload 10,000 philosophers and let them deliberate for as many subjective years as they need to solve all CEV’s problems and arrive at a consensus”, or similar proposals according to which there is a digital parliament of human proxies (e.g. Jan Leike’s “simulated deliberative democracy”).
I guess this defines a possible VEP, but I have long thought that a better VEP would involve theoretical identification of the existing “human decision procedure” (which I assume is a topic for cognitive neuroscience, and which in the individual is determined through a mix of genes, culture, and life incidents), and then extrapolating that. And again, the human decision procedure would in some way be a template, a schema whose details are “filled out” differently in different individuals (similar to how we learn the grammar and vocabulary of our native languages); and some of CEV’s extrapolation would depend on those details, some of it only on the structure of the schema.
You might even expect that Leike’s democracy would arrive at something like this, rather than just deciding everything via a vote among our extrapolated higher selves, forever. But then do you need the whole digression into upload societies devoted to the task of alignment? You just do AI-assisted neuroscience, figure out how human nature actually works, and “extrapolate” that.
Years ago, I thought that might be what would happen. Instead, the VEP that our frontier AI companies are employing, is to engage in value learning from the training corpora, as part of general world-modeling, and then refining and activating it with RHLF, constitutions, and so forth.
(This is false. CEV is a process that combines extrapolated volitions of individual humans, which is meant to depend fully on the state of every particular person and their wishes about how they wish are to be extrapolated. See the value theory and the metaethics sequences, in particular, stuff like this, as well as the CEV Arbital page. E.g., CEV of humanity is plausibly very different from the CEV of ancient Greeks, who might even, on reflection, want to die gloriously in battles.)
I think it is unclear what the exact initial data are supposed to be, or needs to be.
The value system that CEV outputs is going to be abstract at some level. It won’t say directly “if someone has a toothache, fix the toothache”; that should follow from a more general principle, combined with the nature of toothaches. The same goes for the extrapolations of individuals and the aggregations of their preferences: the CEV value system in action has to care about particulars, but what it does with those particulars, will be governed by an abstract definition.
The question is, what do we need to know about humanity, in order for CEV’s Value Extrapolation Procedure to arrive at the correct abstract definition? This is hard to answer if we don’t know what the VEP is in any detail. But finding a correct VEP is also part of the process.
Apparently a popular proposal for the VEP is something like “upload 10,000 philosophers and let them deliberate for as many subjective years as they need to solve all CEV’s problems and arrive at a consensus”, or similar proposals according to which there is a digital parliament of human proxies (e.g. Jan Leike’s “simulated deliberative democracy”).
I guess this defines a possible VEP, but I have long thought that a better VEP would involve theoretical identification of the existing “human decision procedure” (which I assume is a topic for cognitive neuroscience, and which in the individual is determined through a mix of genes, culture, and life incidents), and then extrapolating that. And again, the human decision procedure would in some way be a template, a schema whose details are “filled out” differently in different individuals (similar to how we learn the grammar and vocabulary of our native languages); and some of CEV’s extrapolation would depend on those details, some of it only on the structure of the schema.
You might even expect that Leike’s democracy would arrive at something like this, rather than just deciding everything via a vote among our extrapolated higher selves, forever. But then do you need the whole digression into upload societies devoted to the task of alignment? You just do AI-assisted neuroscience, figure out how human nature actually works, and “extrapolate” that.
Years ago, I thought that might be what would happen. Instead, the VEP that our frontier AI companies are employing, is to engage in value learning from the training corpora, as part of general world-modeling, and then refining and activating it with RHLF, constitutions, and so forth.