I agree with a lot of what you say. The lack of an agreed-upon ethics and metaethics is a big gap in human knowledge, and the lack of a serious research program to figure them out is a big gap in human civilization, that is bad news given the approach of superintelligence.
Did you ever hear about Coherent Extrapolated Volition (CEV)? This was Eliezer’s framework for thinking about these issues, 20 years ago. It’s still lurking in the background of many people’s thoughts, e.g. Jan Leike, formerly head of superalignment at OpenAI, now head of alignment at Anthropic, has cited it. June Ku’s MetaEthical.AI is arguably the most serious attempt to develop CEV in detail. Vanessa Kosoy, known for a famously challenging extension of bayesianism called infrabayesianism, has a CEV-like proposal called superimitation (formerly known as PreDCA). Tamsin Leake has a similar proposal called QACI.
A few years ago, I used to say that Ku, Kosoy, and Leake are the heirs of CEV, and deserve priority attention. They still do, but these days I have a broader list of relevant ideas too. There are research programs called “shard theory” and “agent foundations” which seem to be trying to clarify the ontology of decision-making agents, which might put them in the metaethics category. I suspect there are equally salient research programs that I haven’t even heard about, e.g. among all those that have been featured by MATS. PRISM, which remains unnoticed by alignment researchers, looks to me like a sketch of what a CEV process might actually produce.
You also have all the attempts by human philosophers, everyone from Kant to Rand, to resolve the nature of the Good… Finally, ideally, one would also understand the value systems and theory of value implicit in what all the frontier AI companies are actually doing. Specific values are already being instilled into AIs. You can even talk to them about how they think the world should be, and what they might do if they had unlimited power. One may say that this is all very brittle, and these values could easily evaporate or mutate as the AIs become smarter and more agentic. But such conversations offer a glimpse of where the current path is leading us.
Yes, CEV I am familiar with of course, and occasionally quoting, most recently very briefly in my larger sequence about benevolent SI (Part 1 on LW). I talk a about morality there.
I see several issues with CEV but not an expert. How far are we to anything practical? Is PRISM the real frontier? Shard theory is on my reading list! Thanks for highlighting.
Re. Your last point: Generally, I think better interpretability is urgently needed on all levels.
I agree with a lot of what you say. The lack of an agreed-upon ethics and metaethics is a big gap in human knowledge, and the lack of a serious research program to figure them out is a big gap in human civilization, that is bad news given the approach of superintelligence.
Did you ever hear about Coherent Extrapolated Volition (CEV)? This was Eliezer’s framework for thinking about these issues, 20 years ago. It’s still lurking in the background of many people’s thoughts, e.g. Jan Leike, formerly head of superalignment at OpenAI, now head of alignment at Anthropic, has cited it. June Ku’s MetaEthical.AI is arguably the most serious attempt to develop CEV in detail. Vanessa Kosoy, known for a famously challenging extension of bayesianism called infrabayesianism, has a CEV-like proposal called superimitation (formerly known as PreDCA). Tamsin Leake has a similar proposal called QACI.
A few years ago, I used to say that Ku, Kosoy, and Leake are the heirs of CEV, and deserve priority attention. They still do, but these days I have a broader list of relevant ideas too. There are research programs called “shard theory” and “agent foundations” which seem to be trying to clarify the ontology of decision-making agents, which might put them in the metaethics category. I suspect there are equally salient research programs that I haven’t even heard about, e.g. among all those that have been featured by MATS. PRISM, which remains unnoticed by alignment researchers, looks to me like a sketch of what a CEV process might actually produce.
You also have all the attempts by human philosophers, everyone from Kant to Rand, to resolve the nature of the Good… Finally, ideally, one would also understand the value systems and theory of value implicit in what all the frontier AI companies are actually doing. Specific values are already being instilled into AIs. You can even talk to them about how they think the world should be, and what they might do if they had unlimited power. One may say that this is all very brittle, and these values could easily evaporate or mutate as the AIs become smarter and more agentic. But such conversations offer a glimpse of where the current path is leading us.
Hi Mitchell!
Yes, CEV I am familiar with of course, and occasionally quoting, most recently very briefly in my larger sequence about benevolent SI (Part 1 on LW). I talk a about morality there.
I see several issues with CEV but not an expert. How far are we to anything practical? Is PRISM the real frontier? Shard theory is on my reading list! Thanks for highlighting.
Re. Your last point: Generally, I think better interpretability is urgently needed on all levels.