I met someone on here who wanted to do this with Kant. I recently thought about doing it with Badiou…
The LLM work that is being done with mathematical proofs, shows that LLMs can work productively within formalized frameworks. Here the obvious question is, which framework?
Spinozist ethics stands out because it was already formalized by Spinoza himself, and it seems to appeal to you because it promises universality on the basis of shared substance. However, any ethics can be formalized, even a non-universal one.
For the CEV school of alignment, the framework is something that should be found by cognitive neuroscientific study of human beings, to discover both the values that people are actually implicitly pursuing, and also a natural metaethics (or ontology of value) implicit in how our brains represent reality. The perfect moral agent (from a human standpoint) is then the product of applying this natural metaethics to the actual values of imperfect human beings (this is the “extrapolation” in CEV).
I would be interested to know if other schools of alignment have their own principled way of identifying what the values framework should be.
I’m not sure what I’m proposing falls into the CEV category, but maybe.
I want to take a stab at your question about “which framework?”.
I agree that any ethics can be formalized. In fact, Spinoza did this when he wrote Descartes’ Principles of Philosophy (1663) to teach Cartesian philosophy to his student Johannes Casearius using the geometric method.
The reason I’m suggesting Spinoza’s Ethics is because it’s already formalized and simply makes a good starting point. It’s a concrete thing that people can point to and say, I think the demonstration of Part 1, Proposition 11, that “nature necessarily exists” is wrong because xyz, and it should be abc.
We could take the route of “cognitive neuroscientific study of human beings, to discover both the values that people are actually implicitly pursuing, and also a natural metaethics (or ontology of value) implicit in how our brains represent reality.” Or, while we wait for that to happen, we could use the Ethics as a theoretical approximation, implement it, and test the results. The just implement-it-and-see-what-happens approach seems to be what Anthropic is doing with the Claude constitution. It includes some well chosen ethics statements, that have proven to give good results. I’m just suggesting an upgrade to a more formalized, comprehensive starting point that might give even better results and cover edge cases that haven’t been encountered..
I met someone on here who wanted to do this with Kant. I recently thought about doing it with Badiou…
The LLM work that is being done with mathematical proofs, shows that LLMs can work productively within formalized frameworks. Here the obvious question is, which framework?
Spinozist ethics stands out because it was already formalized by Spinoza himself, and it seems to appeal to you because it promises universality on the basis of shared substance. However, any ethics can be formalized, even a non-universal one.
For the CEV school of alignment, the framework is something that should be found by cognitive neuroscientific study of human beings, to discover both the values that people are actually implicitly pursuing, and also a natural metaethics (or ontology of value) implicit in how our brains represent reality. The perfect moral agent (from a human standpoint) is then the product of applying this natural metaethics to the actual values of imperfect human beings (this is the “extrapolation” in CEV).
I would be interested to know if other schools of alignment have their own principled way of identifying what the values framework should be.
HI and thanks for the comment.
I’m new to the forum and wasn’t familiar with the CEV school of alignment. I found this post and read through it. https://www.lesswrong.com/w/coherent-extrapolated-volition-alignment-target
I’m not sure what I’m proposing falls into the CEV category, but maybe.
I want to take a stab at your question about “which framework?”.
I agree that any ethics can be formalized. In fact, Spinoza did this when he wrote Descartes’ Principles of Philosophy (1663) to teach Cartesian philosophy to his student Johannes Casearius using the geometric method.
The reason I’m suggesting Spinoza’s Ethics is because it’s already formalized and simply makes a good starting point. It’s a concrete thing that people can point to and say, I think the demonstration of Part 1, Proposition 11, that “nature necessarily exists” is wrong because xyz, and it should be abc.
We could take the route of “cognitive neuroscientific study of human beings, to discover both the values that people are actually implicitly pursuing, and also a natural metaethics (or ontology of value) implicit in how our brains represent reality.” Or, while we wait for that to happen, we could use the Ethics as a theoretical approximation, implement it, and test the results. The just implement-it-and-see-what-happens approach seems to be what Anthropic is doing with the Claude constitution. It includes some well chosen ethics statements, that have proven to give good results. I’m just suggesting an upgrade to a more formalized, comprehensive starting point that might give even better results and cover edge cases that haven’t been encountered..
I put together a “Value Graph” to try and quantify or at least give some heuristics the AI model can use as a basis for evaluation. https://jeff962.github.io/Ethics_Exposed/Supplemental/Value_Graph
I also had Claude run through some typical difficult scenarios to see how it applied the values compared to other approaches. The results were encouraging enough to suggest trying it for real. https://jeff962.github.io/Ethics_Exposed/Supplemental/Difficult-AI-Scenarios
Thanks again for the comment! I was hoping to get some push-back on whether this idea has merit.