I’m not sure what I’m proposing falls into the CEV category, but maybe.
I want to take a stab at your question about “which framework?”.
I agree that any ethics can be formalized. In fact, Spinoza did this when he wrote Descartes’ Principles of Philosophy (1663) to teach Cartesian philosophy to his student Johannes Casearius using the geometric method.
The reason I’m suggesting Spinoza’s Ethics is because it’s already formalized and simply makes a good starting point. It’s a concrete thing that people can point to and say, I think the demonstration of Part 1, Proposition 11, that “nature necessarily exists” is wrong because xyz, and it should be abc.
We could take the route of “cognitive neuroscientific study of human beings, to discover both the values that people are actually implicitly pursuing, and also a natural metaethics (or ontology of value) implicit in how our brains represent reality.” Or, while we wait for that to happen, we could use the Ethics as a theoretical approximation, implement it, and test the results. The just implement-it-and-see-what-happens approach seems to be what Anthropic is doing with the Claude constitution. It includes some well chosen ethics statements, that have proven to give good results. I’m just suggesting an upgrade to a more formalized, comprehensive starting point that might give even better results and cover edge cases that haven’t been encountered..
HI and thanks for the comment.
I’m new to the forum and wasn’t familiar with the CEV school of alignment. I found this post and read through it. https://www.lesswrong.com/w/coherent-extrapolated-volition-alignment-target
I’m not sure what I’m proposing falls into the CEV category, but maybe.
I want to take a stab at your question about “which framework?”.
I agree that any ethics can be formalized. In fact, Spinoza did this when he wrote Descartes’ Principles of Philosophy (1663) to teach Cartesian philosophy to his student Johannes Casearius using the geometric method.
The reason I’m suggesting Spinoza’s Ethics is because it’s already formalized and simply makes a good starting point. It’s a concrete thing that people can point to and say, I think the demonstration of Part 1, Proposition 11, that “nature necessarily exists” is wrong because xyz, and it should be abc.
We could take the route of “cognitive neuroscientific study of human beings, to discover both the values that people are actually implicitly pursuing, and also a natural metaethics (or ontology of value) implicit in how our brains represent reality.” Or, while we wait for that to happen, we could use the Ethics as a theoretical approximation, implement it, and test the results. The just implement-it-and-see-what-happens approach seems to be what Anthropic is doing with the Claude constitution. It includes some well chosen ethics statements, that have proven to give good results. I’m just suggesting an upgrade to a more formalized, comprehensive starting point that might give even better results and cover edge cases that haven’t been encountered..
I put together a “Value Graph” to try and quantify or at least give some heuristics the AI model can use as a basis for evaluation. https://jeff962.github.io/Ethics_Exposed/Supplemental/Value_Graph
I also had Claude run through some typical difficult scenarios to see how it applied the values compared to other approaches. The results were encouraging enough to suggest trying it for real. https://jeff962.github.io/Ethics_Exposed/Supplemental/Difficult-AI-Scenarios
Thanks again for the comment! I was hoping to get some push-back on whether this idea has merit.