I’m happy to see a demonstration that Eliezer has a good understanding of the top-level issues involving computer security.
One thing I wonder though, is why making Internet security better across the board isn’t a more important goal in the rationality community? Although very difficult (for reasons illustrated here), it seems immediately useful and also a good prerequisite for any sort of AI security. If we can’t secure the Internet against nation-state level attacks, what hope is there against an AI that falls into the wrong hands?
In particular, building “friendly AI” and assuming it will remain friendly seems naive at best, since it will copied and then the friendly part will be modified by hostile actors.
It seems like someone with a security mindset will want to avoid making any assumption of friendliness and instead work on making critical systems that are simple enough to be mathematically proven secure. I wonder why this quote (from the previous post) isn’t treated as a serious plan: “If your system literally has no misbehavior modes at all, it doesn’t matter if you have IQ 140 and the enemy has IQ 160—it’s not an arm-wrestling contest.”
We are far from being able to build these systems but it still seems like a more plausible research project than ensuring that nobody in the world makes unfriendly AI.
In particular, building “friendly AI” and assuming it will remain friendly seems naive at best, since it will copied and then the friendly part will be modified by hostile actors.
This is correct. Any reasonable AGI development strategy must have strong closure and security measures in place to minimize the risk of leaks, and deployment has to meet the conditions in Dewey (2014).
It seems like someone with a security mindset will want to avoid making any assumption of friendliness and instead work on making critical systems that are simple enough to be mathematically proven secure.
This is also correct, if the idea is to ensure that developers understand their system (and safety-critical subsystems in particular) well enough for the end product to be “friendly”/”aligned.” If you’re instead saying that alignment isn’t a good target to shoot for, then I’m not sure I understand what you’re saying. Why not? How do we achieve good long-term outcomes without routing through alignment?
I think odds are good that, assuming general AI happens at all, someone will build a hostile AI and connect it to the Internet. I think a proper understanding the security mindset is that the assumption “nobody will connect a hostile AI to the Internet” is something we should stop relying on. (In particular, maintaining secrecy and internatonal cooperation seems unlikely. We shouldn’t assume they will work.)
We should be looking for defenses that aren’t dependent of the IQ level of the attacker, similar to how mathematical proofs are independent of IQ. AI alignment is an important research problem, but doesn’t seem directly relevant for this.
In particular, I don’t see why you think “routing through alignment” is important for making sound mathematical proofs. Narrow AI should be sufficient for making advances in mathematics.
I think odds are good that, assuming general AI happens at all, someone will build a hostile AI and connect it to the Internet. I think a proper understanding the security mindset is that the assumption “nobody will connect a hostile AI to the Internet” is something we should stop relying on. (In particular, maintaining secrecy and internatonal cooperation seems unlikely. We shouldn’t assume they will work.)
Yup, all of this seems like the standard MIRI/Eliezer view.
In particular, I don’t see why you think “routing through alignment” is important for making sound mathematical proofs. Narrow AI should be sufficient for making advances in mathematics.
I don’t know what the relevance of “mathematical proofs” is. Are you talking about applying formal methods of some kind to the problem of ensuring that AGI technology doesn’t leak, and saying that AGI is unnecessary for this task? I’m guessing that part of the story you’re missing is that proliferation of AGI technology is at least as much about independent discovery as it is about leaks, splintering, or espionage. You have to address those issues, but the overall task of achieving nonproliferation is much larger than that, and it doesn’t do a lot of good to solve part of the problem without solving the whole problem. AGI is potentially a route to solving the whole problem, not to solving the (relatively easy, though still very important) leaks/espionage problem.
I mean things like using mathematical proofs to ensure that Internet-exposed services have no bugs that a hostile agent might exploit. We don’t need to be able to build an AI to improve defences.
There’s no “friendly part” in an AGI in the same way there’s no “secure part” in an OS. The kind of friendliness and security we want is deep in the architecture.
Most hostile actors also don’t want an AGI that kills them. They might still do nasty things with the AGI by giving it bad goals but that’s not the core what the AGI argument is about.
As far as removing misbehavior modes, that’s done in security circles. Critical computers get airgapped to prevent them from getting hacked.
In the quest of getting more secure Mozilla build a new programming language that prevents a class of errors that C had.
Even if there’s no “friendly part,” it seems unlikely that someone who learns the basic principles behind building a friendly AI will be unable to build an unfriendly AI by accident. I’m happy that we’re making progress with safe languages, but there is no practical programming language in which it’s the least bit difficult to write a bad program.
It would make more sense to assume that at some point, a hostile AI will get an Internet connection, and figure out what needs to be done about that.
I’m happy to see a demonstration that Eliezer has a good understanding of the top-level issues involving computer security.
One thing I wonder though, is why making Internet security better across the board isn’t a more important goal in the rationality community? Although very difficult (for reasons illustrated here), it seems immediately useful and also a good prerequisite for any sort of AI security. If we can’t secure the Internet against nation-state level attacks, what hope is there against an AI that falls into the wrong hands?
In particular, building “friendly AI” and assuming it will remain friendly seems naive at best, since it will copied and then the friendly part will be modified by hostile actors.
It seems like someone with a security mindset will want to avoid making any assumption of friendliness and instead work on making critical systems that are simple enough to be mathematically proven secure. I wonder why this quote (from the previous post) isn’t treated as a serious plan: “If your system literally has no misbehavior modes at all, it doesn’t matter if you have IQ 140 and the enemy has IQ 160—it’s not an arm-wrestling contest.”
We are far from being able to build these systems but it still seems like a more plausible research project than ensuring that nobody in the world makes unfriendly AI.
This is correct. Any reasonable AGI development strategy must have strong closure and security measures in place to minimize the risk of leaks, and deployment has to meet the conditions in Dewey (2014).
This is also correct, if the idea is to ensure that developers understand their system (and safety-critical subsystems in particular) well enough for the end product to be “friendly”/”aligned.” If you’re instead saying that alignment isn’t a good target to shoot for, then I’m not sure I understand what you’re saying. Why not? How do we achieve good long-term outcomes without routing through alignment?
I think odds are good that, assuming general AI happens at all, someone will build a hostile AI and connect it to the Internet. I think a proper understanding the security mindset is that the assumption “nobody will connect a hostile AI to the Internet” is something we should stop relying on. (In particular, maintaining secrecy and internatonal cooperation seems unlikely. We shouldn’t assume they will work.)
We should be looking for defenses that aren’t dependent of the IQ level of the attacker, similar to how mathematical proofs are independent of IQ. AI alignment is an important research problem, but doesn’t seem directly relevant for this.
In particular, I don’t see why you think “routing through alignment” is important for making sound mathematical proofs. Narrow AI should be sufficient for making advances in mathematics.
Yup, all of this seems like the standard MIRI/Eliezer view.
I don’t know what the relevance of “mathematical proofs” is. Are you talking about applying formal methods of some kind to the problem of ensuring that AGI technology doesn’t leak, and saying that AGI is unnecessary for this task? I’m guessing that part of the story you’re missing is that proliferation of AGI technology is at least as much about independent discovery as it is about leaks, splintering, or espionage. You have to address those issues, but the overall task of achieving nonproliferation is much larger than that, and it doesn’t do a lot of good to solve part of the problem without solving the whole problem. AGI is potentially a route to solving the whole problem, not to solving the (relatively easy, though still very important) leaks/espionage problem.
I mean things like using mathematical proofs to ensure that Internet-exposed services have no bugs that a hostile agent might exploit. We don’t need to be able to build an AI to improve defences.
There’s no “friendly part” in an AGI in the same way there’s no “secure part” in an OS. The kind of friendliness and security we want is deep in the architecture.
Most hostile actors also don’t want an AGI that kills them. They might still do nasty things with the AGI by giving it bad goals but that’s not the core what the AGI argument is about.
As far as removing misbehavior modes, that’s done in security circles. Critical computers get airgapped to prevent them from getting hacked.
In the quest of getting more secure Mozilla build a new programming language that prevents a class of errors that C had.
See also Paul Christiano’s “Security and AI Alignment,” which Eliezer agrees with if I recall correctly.
Even if there’s no “friendly part,” it seems unlikely that someone who learns the basic principles behind building a friendly AI will be unable to build an unfriendly AI by accident. I’m happy that we’re making progress with safe languages, but there is no practical programming language in which it’s the least bit difficult to write a bad program.
It would make more sense to assume that at some point, a hostile AI will get an Internet connection, and figure out what needs to be done about that.