Max H comments on Systems that cannot be unsafe cannot be safe

Max H 2 May 2023 14:52 UTC
LW: 12 AF: 4
6
AF
I agree, and I made a related claim that alignment of a model is mostly a type error. I think the same point applies to safety more generally: modulo exotic concerns about inner-optimizers, there is no such thing as an unsafe model, only unsafe applications of that model.

Though I also think the red teaming that ARC did on GPT-4 actually did successfully demonstrate that GPT-4 is unlikely to fail in certain dangerous ways when integrated into any system which is not already dangerously capable in its own right. This is simultaneously a very impressive safety result for a novel AI model, and not very impressive in any other domain. For example, I’m not too impressed if a car manufacturer demonstrates that their car’s entertainment system never takes control of the car and drives it off a cliff under any circumstances. (Though I still hope there is some auto manufacturing safety standard which covers such a case.)
- Davidmanheim 2 May 2023 16:26 UTC
  8 points
  0
  Parent
  Mostly agree.
  I will note that correctly isolating the entertainment system from the car control system is one of those things you’d expect, but you’d be disappointed. Safety is hard.