The primary application of “safety research” is improving refusal calibration, which, at least from a retail client’s perspective, is exactly like a capability improvement: it makes no difference to me whether the model can’t satisfy my request or can but won’t. It’s easy to demonstrate differences in this regard – simply show one model refusing a request another fulfills – so I disagree that this would cause clients to be “dissuaded from AI in general.”
The primary application of “safety research” is improving refusal calibration, which, at least from a retail client’s perspective, is exactly like a capability improvement: it makes no difference to me whether the model can’t satisfy my request or can but won’t. It’s easy to demonstrate differences in this regard – simply show one model refusing a request another fulfills – so I disagree that this would cause clients to be “dissuaded from AI in general.”
I disagree that the primary application of safety research is improving refusal calibration. This take seems outdated by ~12 months.