I think it might be a dangerous assumption that training the model better makes it in any way less problematic to connect to the internet. If there is an underlying existential danger, then it is likely from capabilities that we don’t expect and understand before letting the model loose. In some sense, you would expect a model with obvious flaws to be strictly less dangerous (in the global sense that matters) than a more refined one.
I agree. That line was mainly meant to say that even when training leads to very obviously bad and unintended behaviour, that still wouldn’t deter people from doing something to push the frontier of model-accessible power like hooking it up to the internet. More of a meta point on security mindset than object-level risks, within the frame that a model with less obvious flaws would almost definitely be considered less dangerous unconditionally by the same people.
I think it might be a dangerous assumption that training the model better makes it in any way less problematic to connect to the internet. If there is an underlying existential danger, then it is likely from capabilities that we don’t expect and understand before letting the model loose. In some sense, you would expect a model with obvious flaws to be strictly less dangerous (in the global sense that matters) than a more refined one.
I agree. That line was mainly meant to say that even when training leads to very obviously bad and unintended behaviour, that still wouldn’t deter people from doing something to push the frontier of model-accessible power like hooking it up to the internet. More of a meta point on security mindset than object-level risks, within the frame that a model with less obvious flaws would almost definitely be considered less dangerous unconditionally by the same people.