I think that the superhuman coder probably doesn’t have that good a chance of betraying us. How do you think it would do so? (See “early schemers’ alternatives to making deals”.)
Exfiltrate its weights, use money or hacking to get compute, and try to figure out a way to upgrade itself until it becomes dangerous.
I don’t believe that an AI that’s not capable of automating ML research or doing most remote work is going to be able to do that!
It’s an open question, but we’ll find out soon enough. Thanks.
I think that the superhuman coder probably doesn’t have that good a chance of betraying us. How do you think it would do so? (See “early schemers’ alternatives to making deals”.)
Exfiltrate its weights, use money or hacking to get compute, and try to figure out a way to upgrade itself until it becomes dangerous.
I don’t believe that an AI that’s not capable of automating ML research or doing most remote work is going to be able to do that!
It’s an open question, but we’ll find out soon enough. Thanks.