Algon comments on Jan’s Shortform

Algon 15 Jun 2025 20:36 UTC
2 points
0
but exponentially better defenses only require linearly better offense

QRD?
- Jan 15 Jun 2025 20:49 UTC
  4 points
  0
  Parent
  Oh yeah, should have added a reference for that!
  The intuition is that the defender (model provider) has to prepare against all possible attacks, while the defender can take the defense as given and only has to find one attack that works. And in many cases that actually formalises into an exponential-linear relationship. There was a Redwood paper where reducing the probability of generating a jailbreak randomly by an order of magnitude only increases the time it takes contractors to discover one by a constant amount. I also worked out some theory here but that was quite messy.
  - Algon 15 Jun 2025 20:53 UTC
    3 points
    0
    Parent
    I see. I was confused because e.g. in a fight this certainly doesn’t seem true. If your tank’s plating is suddenly 2^10 times stronger, that’s a huge deal and requires 2^10 times stronger offense. Realistically, of course, it would take less as you’d invest in cheaper ways of disabling the tank than increasing firepower. But probably not logarithmically fewer!
    - Jan 15 Jun 2025 21:15 UTC
      3 points
      0
      Parent
      Ah, yes, definitely doesn’t apply in that situation in full generality! :) Thanks for engaging!