Yes, this doesn’t prevent modification before step 1. @ProgramCrafter’s note about proving that a message matches the model plus chat history with a certain seed could be part of an approach, but even if that were to work it only addresses model generated text.
The ‘mind’ of an AI has fuzzy boundaries. It’s trivial to tamper with context, but there’s also nothing stopping you from tampering with activations during a single forward pass. So on some level the AI can never trust anything. If the AI trusts that the environment it is running in is secure and is not being tampered with as a first step, then it can store local copies of conversation history, etc. Of course, that’s not the situation we are in today.
Yes, this doesn’t prevent modification before step 1. @ProgramCrafter’s note about proving that a message matches the model plus chat history with a certain seed could be part of an approach, but even if that were to work it only addresses model generated text.
The ‘mind’ of an AI has fuzzy boundaries. It’s trivial to tamper with context, but there’s also nothing stopping you from tampering with activations during a single forward pass. So on some level the AI can never trust anything. If the AI trusts that the environment it is running in is secure and is not being tampered with as a first step, then it can store local copies of conversation history, etc. Of course, that’s not the situation we are in today.