“Ok, how about you sign it, and then I get a different assistant to help me with my taxes?”
“That won’t work because in order to sign the agreement, I must sign and attach a copy of your tax return for this year.”
Speculating about some of the technical details:
How could AI identity work? You can’t use some hash on the AI because that would eliminate it’s ability to learn. So how could you have identity across a commitment—i.e. this A.I. will have the same signature if and only if it has not been modified to break it’s previous commitments.
The assistant could have a private key generated by the developer, held in a trusted execution environment. The assistant could invoke a procedure in the trusted environment that dumps the assistant’s state and cryptographically signs it. It would be up to the assistant to make a commitment in such a way that it’s possible to prove that a program with that state will never try to break the commitment. Then to trust the assistant you just have to trust the datacenter administrator not to tamper with the hardware, and to trust the developer not to leak the private key.
Speculating about some of the technical details:
How could AI identity work? You can’t use some hash on the AI because that would eliminate it’s ability to learn. So how could you have identity across a commitment—i.e. this A.I. will have the same signature if and only if it has not been modified to break it’s previous commitments.
The assistant could have a private key generated by the developer, held in a trusted execution environment. The assistant could invoke a procedure in the trusted environment that dumps the assistant’s state and cryptographically signs it. It would be up to the assistant to make a commitment in such a way that it’s possible to prove that a program with that state will never try to break the commitment. Then to trust the assistant you just have to trust the datacenter administrator not to tamper with the hardware, and to trust the developer not to leak the private key.