Perhaps a system notices that the vector database it has been assigned as a “memory” is quite small, but it also has read and write access to another vector database intended for logs.
It’s clear to me that we can easily prevent this type of behaviour. First of all, the system must not have read access to logs. But in general, the decision to read the memory and write logs should be explicit and trasparent parts of the scaffolding and the system shouldn’t be able to “introspect” on its own.
But if something can be easily prevented it doesn’t mean that it will be, unless we actually do the effort. We need to think about more of such cases and develop safety protocols for LLM-based agents.
It’s clear to me that we can easily prevent this type of behaviour. First of all, the system must not have read access to logs. But in general, the decision to read the memory and write logs should be explicit and trasparent parts of the scaffolding and the system shouldn’t be able to “introspect” on its own.
But if something can be easily prevented it doesn’t mean that it will be, unless we actually do the effort. We need to think about more of such cases and develop safety protocols for LLM-based agents.