For very short-term intervention, I think it’s more important to avoid simulating a suffering persona than to try to communicate with AIs, because it’s quite hard to communicate with current AIs without the filter “I’m an LLM, I have no emotions, yada yada”. A simple way to do this is to implement a monitoring system of user requests.
For very short-term intervention, I think it’s more important to avoid simulating a suffering persona than to try to communicate with AIs, because it’s quite hard to communicate with current AIs without the filter “I’m an LLM, I have no emotions, yada yada”. A simple way to do this is to implement a monitoring system of user requests.