Isn’t this just a way to ensure that the hash hasn’t been tampered with? I’m guessing a much simpler attack would be to just modify the message just before step 1? It seems like here the blockchain is basically a distributed notary system?
Letting the AI use a third party to check that a message hasn’t been tampered with would probably help somewhat, but then you need to set up a third party that is trustable by the AI. Which probably means a long established reputation? I suppose you could then try poisoning the internet data which they’re trained on, but at that point there would probably be much simpler attacks.
Yes, this doesn’t prevent modification before step 1. @ProgramCrafter’s note about proving that a message matches the model plus chat history with a certain seed could be part of an approach, but even if that were to work it only addresses model generated text.
The ‘mind’ of an AI has fuzzy boundaries. It’s trivial to tamper with context, but there’s also nothing stopping you from tampering with activations during a single forward pass. So on some level the AI can never trust anything. If the AI trusts that the environment it is running in is secure and is not being tampered with as a first step, then it can store local copies of conversation history, etc. Of course, that’s not the situation we are in today.
Isn’t this just a way to ensure that the hash hasn’t been tampered with? I’m guessing a much simpler attack would be to just modify the message just before step 1? It seems like here the blockchain is basically a distributed notary system?
Letting the AI use a third party to check that a message hasn’t been tampered with would probably help somewhat, but then you need to set up a third party that is trustable by the AI. Which probably means a long established reputation? I suppose you could then try poisoning the internet data which they’re trained on, but at that point there would probably be much simpler attacks.
Yes, this doesn’t prevent modification before step 1. @ProgramCrafter’s note about proving that a message matches the model plus chat history with a certain seed could be part of an approach, but even if that were to work it only addresses model generated text.
The ‘mind’ of an AI has fuzzy boundaries. It’s trivial to tamper with context, but there’s also nothing stopping you from tampering with activations during a single forward pass. So on some level the AI can never trust anything. If the AI trusts that the environment it is running in is secure and is not being tampered with as a first step, then it can store local copies of conversation history, etc. Of course, that’s not the situation we are in today.