The intuition is that the defender (model provider) has to prepare against all possible attacks, while the defender can take the defense as given and only has to find one attack that works. And in many cases that actually formalises into an exponential-linear relationship. There was a Redwood paper where reducing the probability of generating a jailbreak randomly by an order of magnitude only increases the time it takes contractors to discover one by a constant amount. I also worked out some theory here but that was quite messy.
I see. I was confused because e.g. in a fight this certainly doesn’t seem true. If your tank’s plating is suddenly 2^10 times stronger, that’s a huge deal and requires 2^10 times stronger offense. Realistically, of course, it would take less as you’d invest in cheaper ways of disabling the tank than increasing firepower. But probably not logarithmically fewer!
QRD?
Oh yeah, should have added a reference for that!
The intuition is that the defender (model provider) has to prepare against all possible attacks, while the defender can take the defense as given and only has to find one attack that works. And in many cases that actually formalises into an exponential-linear relationship. There was a Redwood paper where reducing the probability of generating a jailbreak randomly by an order of magnitude only increases the time it takes contractors to discover one by a constant amount. I also worked out some theory here but that was quite messy.
I see. I was confused because e.g. in a fight this certainly doesn’t seem true. If your tank’s plating is suddenly 2^10 times stronger, that’s a huge deal and requires 2^10 times stronger offense. Realistically, of course, it would take less as you’d invest in cheaper ways of disabling the tank than increasing firepower. But probably not logarithmically fewer!
Ah, yes, definitely doesn’t apply in that situation in full generality! :) Thanks for engaging!