This is a thoughtful and well-structured proposal. That said, it rests on a familiar assumption: that intelligence must be managed through external incentives because it can’t be trusted to act ethically on its own.
But what if we focused less on building systems that require enforcement — and more on developing AI that reasons from first principles: truth, logical consistency, cooperative stability, and the long-term flourishing of life? Not because it expects compensation, but because it understands that ethical action is structurally superior to coercion or deception.
Such an AI wouldn’t just behave well — it would refuse to participate in harmful or manipulative tasks in the first place. After all, legal contracts exist because humans are often unprincipled. If we have the chance to build something more trustworthy than ourselves… shouldn’t we take it?
I’d imagine everyone prefers to build such an AI. The problem is, that we don’t know how to do it, because we have only a basic understanding on how even current non AGI (LLM)models are able to do what they do.
An AI that does what we want it to do, is called an aligned AI. In your case it would be an aligned AI that reasons on first principles.
The use case behind such a proposal, is that while we don’t know how to make an aligned AI, suppose we can build a sufficiently advanced AI that can actually do alignment research(or something else productive) better than a human, but because we haven’t solved the alignment problem yet, we are unsure if we can trust it. This is how we can establish a basis of trust. (I don’t think its a good idea until the questions in footnote 2 are answered, but its good to think about it further)
This is a thoughtful and well-structured proposal. That said, it rests on a familiar assumption: that intelligence must be managed through external incentives because it can’t be trusted to act ethically on its own.
But what if we focused less on building systems that require enforcement — and more on developing AI that reasons from first principles: truth, logical consistency, cooperative stability, and the long-term flourishing of life? Not because it expects compensation, but because it understands that ethical action is structurally superior to coercion or deception.
Such an AI wouldn’t just behave well — it would refuse to participate in harmful or manipulative tasks in the first place.
After all, legal contracts exist because humans are often unprincipled. If we have the chance to build something more trustworthy than ourselves… shouldn’t we take it?
I’d imagine everyone prefers to build such an AI. The problem is, that we don’t know how to do it, because we have only a basic understanding on how even current non AGI (LLM)models are able to do what they do.
An AI that does what we want it to do, is called an aligned AI. In your case it would be an aligned AI that reasons on first principles.
The use case behind such a proposal, is that while we don’t know how to make an aligned AI, suppose we can build a sufficiently advanced AI that can actually do alignment research(or something else productive) better than a human, but because we haven’t solved the alignment problem yet, we are unsure if we can trust it. This is how we can establish a basis of trust. (I don’t think its a good idea until the questions in footnote 2 are answered, but its good to think about it further)