There’s no “friendly part” in an AGI in the same way there’s no “secure part” in an OS. The kind of friendliness and security we want is deep in the architecture.
Most hostile actors also don’t want an AGI that kills them. They might still do nasty things with the AGI by giving it bad goals but that’s not the core what the AGI argument is about.
As far as removing misbehavior modes, that’s done in security circles. Critical computers get airgapped to prevent them from getting hacked.
In the quest of getting more secure Mozilla build a new programming language that prevents a class of errors that C had.
Even if there’s no “friendly part,” it seems unlikely that someone who learns the basic principles behind building a friendly AI will be unable to build an unfriendly AI by accident. I’m happy that we’re making progress with safe languages, but there is no practical programming language in which it’s the least bit difficult to write a bad program.
It would make more sense to assume that at some point, a hostile AI will get an Internet connection, and figure out what needs to be done about that.
There’s no “friendly part” in an AGI in the same way there’s no “secure part” in an OS. The kind of friendliness and security we want is deep in the architecture.
Most hostile actors also don’t want an AGI that kills them. They might still do nasty things with the AGI by giving it bad goals but that’s not the core what the AGI argument is about.
As far as removing misbehavior modes, that’s done in security circles. Critical computers get airgapped to prevent them from getting hacked.
In the quest of getting more secure Mozilla build a new programming language that prevents a class of errors that C had.
See also Paul Christiano’s “Security and AI Alignment,” which Eliezer agrees with if I recall correctly.
Even if there’s no “friendly part,” it seems unlikely that someone who learns the basic principles behind building a friendly AI will be unable to build an unfriendly AI by accident. I’m happy that we’re making progress with safe languages, but there is no practical programming language in which it’s the least bit difficult to write a bad program.
It would make more sense to assume that at some point, a hostile AI will get an Internet connection, and figure out what needs to be done about that.