Rohin Shah comments on How much novel security-critical infrastructure do you need during the singularity?

Rohin Shah 5 Jul 2025 8:58 UTC
LW: 4 AF: 4
2
AF
Technologies that allow workers to be more isolated from each other gain you both convenience (because your coworkers no longer accidentally mess up what you’re doing) and also security (because you can remove your coworker’s permission to affect the code you’re running), but generally reduce efficiency. When we try to buy efficiency at the cost of convenience, we might lose security too.
Hmm, this feels less likely to me. Isolation can often be an efficiency benefit because one employee’s mistake doesn’t propagate to screwing up the work of everyone else, and this benefit scales as the number of employees increases. This assumes that labor is at least somewhat unreliable, but I do expect that AI labor will be overall unreliable, just because reliable AI labor will likely be much more expensive and won’t look cost-effective to deploy in most situations.
I’d expect that you buy efficiency by building more fine-grained configuration mechanisms and requiring that employees use them. This will also become easier with AI, since you can have specialized AI assistants that understand the configuration mechanisms well and can set them for all experiments (unlike the current status quo where all your employees have to be separately trained on the use of the configurations and many of them forget it and just copy configurations from elsewhere).
In general it seems like this should improve security by creating more info and legibility around what is happening (which in turn gives you more affordances to enforce security). Some examples (not necessarily about AI):
1. For code that is costly to run (e.g. uses lots of compute, causes physical-world effects, affects production systems), spending labor to run automated bug finding tools that generate lots of false positives, and addressing the results. This directly increases security by having fewer bugs.
2. Breaking up big jobs into many smaller ones that are easier to schedule across your compute fleet. This can enable you to isolate small jobs from each other which reduces the attack surface.
3. Requiring detailed analysis and justification before using significant amounts of compute for experiments. This should increase security against rogue internal deployments, as it becomes easier to catch the AI in a lie.
That said, I did think of one example that worsens security—switching from a memory-safe language like Python to one that requires manual memory management like C. Though in this case, it’s that the “security infra” (automatic memory management) itself is too compute-expensive to run, which doesn’t seem like it will apply in the AI case.
Overall my very weak guess is that the net effect will be to increase security. Of course, the part where you build more fine-grained configuration mechanisms is “novel (potentially) security-critical infrastructure”, so this does support your broader point. (When I say it will increase security, this is under the assumption that the novel infra isn’t sabotaged by scheming AIs.)
- Buck 5 Jul 2025 16:18 UTC
  LW: 4 AF: 4
  0
  AF Parent
  I agree that fine-grained configuration mechanisms probably increase security, and this might net increase security, as I say in the post.
  I agree that the increasing value of compute might increase the isolation you use for the reason you said. One reason I’m skeptical is that you can get almost all that value by having AIs voluntarily adopt mechanisms that generally isolate their jobs from the other jobs that are running (e.g. having naming conventions about who is allowed to read or write what) that get you the reliability benefits without getting any security.
  - Rohin Shah 6 Jul 2025 7:35 UTC
    LW: 2 AF: 2
    0
    AF Parent
    I agree that fine-grained configuration mechanisms probably increase security, and this might net increase security, as I say in the post.
    You mention permission systems, which is certainly a big deal, but I didn’t see anything about broader configuration mechanisms, much of which can be motivated solely by efficiency and incidentally helps with security. (I was disputing your efficiency → less security claim; permissions mechanisms aren’t a valid counterargument since they aren’t motivated by efficiency.)
    One reason I’m skeptical is that you can get almost all that value by having AIs voluntarily adopt mechanisms that generally isolate their jobs from the other jobs that are running (e.g. having naming conventions about who is allowed to read or write what) that get you the reliability benefits without getting any security.
    Hmm, I think this assumes the AIs will be more reliable than I expect them to be. You would want this to be reliable enough that your (AI or human) software engineers ~never have to consider non-isolation as a possible cause of bugs (unless explicitly flagged otherwise); I’d guess that corresponds to about 6 nines of reliability at following the naming conventions? (Which might get you to 8 nines of reliability at avoiding non-isolation bugs, since most naming convention errors won’t lead to bugs.)
    Some complete guesses: human software engineers might get 0.5-2 nines of reliability at following the naming conventions, a superhuman coder model might hit ~4 nines of reliability if you made this a medium priority, if you add a review step that explicitly focuses just on checking adherence to naming conventions then you might hit 6 nines of reliability if the critique uses a model somewhat more powerful than a current frontier model. But this is now a non-trivial expense; it’s not clear to me it would be worth it.