This is a decades-old problem—internal IT has always had to trust human admins at some point. But the biggest threats to systems are rarely technical; they’re social. A properly aligned AI is actually easier to control than a human—an AI won’t plug in a USB stick it found in the parking lot out of curiosity. Most importantly, internal AI alignment can be explicitly enforced. Human alignment cannot. This analysis doesn’t really reflect how real-world information systems are managed.
Most importantly, internal AI alignment can be explicitly enforced.
Only if you have a whitelist (can be implemented as “limited range”) of actions.
Otherwise it will be the case, and not once but many times, that you provide “Do not delete existing security checks from the infrastructure.” sentence in context and LLM proceeds to delete them. That’s a large part of the problem we face.
“Only if you have a whitelist (can be implemented as “limited range”) of actions.”
I’m not sure this is the only way, but it is one way. You still cannot guarantee this with a human admin, but, more importantly, you are still missing the point. AI is not an employee with agency, it is a tool that a human admin can control, like any other form of automation.
This is a decades-old problem—internal IT has always had to trust human admins at some point. But the biggest threats to systems are rarely technical; they’re social. A properly aligned AI is actually easier to control than a human—an AI won’t plug in a USB stick it found in the parking lot out of curiosity. Most importantly, internal AI alignment can be explicitly enforced. Human alignment cannot. This analysis doesn’t really reflect how real-world information systems are managed.
Only if you have a whitelist (can be implemented as “limited range”) of actions.
Otherwise it will be the case, and not once but many times, that you provide “Do not delete existing security checks from the infrastructure.” sentence in context and LLM proceeds to delete them. That’s a large part of the problem we face.
“Only if you have a whitelist (can be implemented as “limited range”) of actions.”
I’m not sure this is the only way, but it is one way. You still cannot guarantee this with a human admin, but, more importantly, you are still missing the point. AI is not an employee with agency, it is a tool that a human admin can control, like any other form of automation.