I’d imagine everyone prefers to build such an AI. The problem is, that we don’t know how to do it, because we have only a basic understanding on how even current non AGI (LLM)models are able to do what they do.
An AI that does what we want it to do, is called an aligned AI. In your case it would be an aligned AI that reasons on first principles.
The use case behind such a proposal, is that while we don’t know how to make an aligned AI, suppose we can build a sufficiently advanced AI that can actually do alignment research(or something else productive) better than a human, but because we haven’t solved the alignment problem yet, we are unsure if we can trust it. This is how we can establish a basis of trust. (I don’t think its a good idea until the questions in footnote 2 are answered, but its good to think about it further)
I’d imagine everyone prefers to build such an AI. The problem is, that we don’t know how to do it, because we have only a basic understanding on how even current non AGI (LLM)models are able to do what they do.
An AI that does what we want it to do, is called an aligned AI. In your case it would be an aligned AI that reasons on first principles.
The use case behind such a proposal, is that while we don’t know how to make an aligned AI, suppose we can build a sufficiently advanced AI that can actually do alignment research(or something else productive) better than a human, but because we haven’t solved the alignment problem yet, we are unsure if we can trust it. This is how we can establish a basis of trust. (I don’t think its a good idea until the questions in footnote 2 are answered, but its good to think about it further)