For gradual disempowerment-like threat models, including AI 2027, it seems important that AIs can cooperate with 1) themselves, but at another time, and 2) other AIs. If we block such cross-inference communication, can’t we rule out this threat model?
No, because we want some of that behaviour. It is neccesary for being able to split research tasks across multiple models (in research settings), so that we can get integrated work out of forks, which requires some amount of communication.
Additionally, cross inference communication is likely a goal for practical applications, since it is what allows (I send a customer service bothering agent to talk to the customer service agent), which is a predicted pattern.
Basically, stenography concerns mean that cross inference communication can come through any shared environement element, and so to rule it out is to block all communication, and additionally hide all shared environment, and so we can never guarntee that there is not a bypass.
But siloing is still a useful tool for principle of minimum required privledge out of regular cybersecurity.
For gradual disempowerment-like threat models, including AI 2027, it seems important that AIs can cooperate with 1) themselves, but at another time, and 2) other AIs. If we block such cross-inference communication, can’t we rule out this threat model?
No, because we want some of that behaviour. It is neccesary for being able to split research tasks across multiple models (in research settings), so that we can get integrated work out of forks, which requires some amount of communication.
Additionally, cross inference communication is likely a goal for practical applications, since it is what allows (I send a customer service bothering agent to talk to the customer service agent), which is a predicted pattern.
Basically, stenography concerns mean that cross inference communication can come through any shared environement element, and so to rule it out is to block all communication, and additionally hide all shared environment, and so we can never guarntee that there is not a bypass.
But siloing is still a useful tool for principle of minimum required privledge out of regular cybersecurity.