Not sure if it’s a right place to ask, instead of just googling it, but anyway: does anyone know what’s the current state of AI security practices at DeepMind, OpenAI and other such places? Like, did they estimate probability of GPT-3 killing everyone before turning it on, do they have procedures for not turning something on, did they test these procedures by someone impersonating unaligned GPT and trying to manipulate researchers, things like that?
No, I very strongly predict they did not do things like that. I expect they (perhaps implicitly) predicted with high confidence that GPT-3 would not have the capabilities needed to kill everyone.
Not sure if it’s a right place to ask, instead of just googling it, but anyway: does anyone know what’s the current state of AI security practices at DeepMind, OpenAI and other such places? Like, did they estimate probability of GPT-3 killing everyone before turning it on, do they have procedures for not turning something on, did they test these procedures by someone impersonating unaligned GPT and trying to manipulate researchers, things like that?
No, I very strongly predict they did not do things like that. I expect they (perhaps implicitly) predicted with high confidence that GPT-3 would not have the capabilities needed to kill everyone.
Do they have plans to do something in the future?
I would assume that the safety teams plan to do this (certainly I plan to). It’s less clear what the opinions are outside of the safety teams.