Also, one of my points was that even a tiny trace of malicious optimization here can still have large effects because there are so many neutral options which are convergently unsafe, so with such a high base rate, even a bit of optimization can result in a large absolute increase in failure
Your example has it be an important bit though. What database to use. Not a random bit. If I’m getting this right, that would correspond to far more than one bit of adversarial optimisation permitted for the oracle in this setup.
|S∩R|=2 doesn’t mean the oracle gets to select one bit of its choice in the string to flip, it means it gets to select one of two strings[1].
I think you mean |S∩R|=2 (two answers that satisfice and fulfill the safety constraint), but otherwise I agree. This is also an example of this whole “let’s measure optimization in bits”-business being a lot more subtle than it appears at first sight.
Your example has it be an important bit though. What database to use. Not a random bit. If I’m getting this right, that would correspond to far more than one bit of adversarial optimisation permitted for the oracle in this setup.
|S∩R|=2 doesn’t mean the oracle gets to select one bit of its choice in the string to flip, it means it gets to select one of two strings[1].
Plus the empty string for not answering.
I think you mean |S∩R|=2 (two answers that satisfice and fulfill the safety constraint), but otherwise I agree. This is also an example of this whole “let’s measure optimization in bits”-business being a lot more subtle than it appears at first sight.
Typo fixed, thanks.