For example, instead of working on separate computers, they might want to work as different users (or the same user!) on the same Linux server.
Oh god… hope you’re wrong about this providing a significant efficiency boost. If working on the same server is viable then presumably there’s a pretty homogenous software stack, which would also mean docker images can often be reused?
The AI company builds new security-critical infra really fast for the sake of security
This is basically in our SL5 recommendations, with the assumption that the model you’d use for this didn’t merit being trained in an SL5 data center and so is not a catastrophic risk. But maybe it could already misaligned and sneaky enough to introduce a backdoor even if it’s not smart enough to be a catastrophic risk? Do we have a prediction for number of model generations between models still being dumb enough to trust, and models being smart enough to be a catastrophic risk? (with the best case being 1). If we buy the AI-2027 distinction between superhuman coder and superhuman AI researcher, it could even be the case that the big scary model doesn’t write any code, and only comes up with ideas which are then implemented by the superhuman coder, or whatever coder you have that is the smartest one you trust.
At some point, I expect the AI company will want to use its smartest models to create the infrastructure necessary to serve itself over the internet while remaining SL5. Hopefully this only happens after they’re justifiably confident that their model is aligned.
Another idea is to require that any infra created by an untrusted model has to be formally verified against a specification written by a human or trusted model, though I don’t know enough about formal verification to know if that’s good enough.
Oh god… hope you’re wrong about this providing a significant efficiency boost. If working on the same server is viable then presumably there’s a pretty homogenous software stack, which would also mean docker images can often be reused?
This is basically in our SL5 recommendations, with the assumption that the model you’d use for this didn’t merit being trained in an SL5 data center and so is not a catastrophic risk. But maybe it could already misaligned and sneaky enough to introduce a backdoor even if it’s not smart enough to be a catastrophic risk? Do we have a prediction for number of model generations between models still being dumb enough to trust, and models being smart enough to be a catastrophic risk? (with the best case being 1). If we buy the AI-2027 distinction between superhuman coder and superhuman AI researcher, it could even be the case that the big scary model doesn’t write any code, and only comes up with ideas which are then implemented by the superhuman coder, or whatever coder you have that is the smartest one you trust.
At some point, I expect the AI company will want to use its smartest models to create the infrastructure necessary to serve itself over the internet while remaining SL5. Hopefully this only happens after they’re justifiably confident that their model is aligned.
Another idea is to require that any infra created by an untrusted model has to be formally verified against a specification written by a human or trusted model, though I don’t know enough about formal verification to know if that’s good enough.