I agree, and I made a related claim that alignment of a model is mostly a type error. I think the same point applies to safety more generally: modulo exotic concerns about inner-optimizers, there is no such thing as an unsafe model, only unsafe applications of that model.
Though I also think the red teaming that ARC did on GPT-4 actually did successfully demonstrate that GPT-4 is unlikely to fail in certain dangerous ways when integrated into any system which is not already dangerously capable in its own right. This is simultaneously a very impressive safety result for a novel AI model, and not very impressive in any other domain. For example, I’m not too impressed if a car manufacturer demonstrates that their car’s entertainment system never takes control of the car and drives it off a cliff under any circumstances. (Though I still hope there is some auto manufacturing safety standard which covers such a case.)
I will note that correctly isolating the entertainment system from the car control system is one of those things you’d expect, but you’d be disappointed. Safety is hard.
I agree, and I made a related claim that alignment of a model is mostly a type error. I think the same point applies to safety more generally: modulo exotic concerns about inner-optimizers, there is no such thing as an unsafe model, only unsafe applications of that model.
Though I also think the red teaming that ARC did on GPT-4 actually did successfully demonstrate that GPT-4 is unlikely to fail in certain dangerous ways when integrated into any system which is not already dangerously capable in its own right. This is simultaneously a very impressive safety result for a novel AI model, and not very impressive in any other domain. For example, I’m not too impressed if a car manufacturer demonstrates that their car’s entertainment system never takes control of the car and drives it off a cliff under any circumstances. (Though I still hope there is some auto manufacturing safety standard which covers such a case.)
Mostly agree.
I will note that correctly isolating the entertainment system from the car control system is one of those things you’d expect, but you’d be disappointed. Safety is hard.