So there’s a nice analogy to MIRI’s work, where we’re trying to figure out what an AGI would look like if it was built from the ground up to get the strongest safety guarantees possible for such an autonomous and capable system.
Except we’re not; we’re trying to get adequate guarantees which is much harder.
The main image reason I object to “safe AI” is the image it implies of, “Oh, well, AIs might be dangerous because, you know, AIs are naturally dangerous for some mysterious reason, so instead you have to build a class of AIs that can never harm people because they have the First Law of Robotics, and then we’re safe.”
Which is just not at all what the technical research program is about.
Which isn’t at all what the bigger picture looks like. The vast majority of self-improving agents have utility functions indifferent to your existence; they do not hate you, nor do they love you, and you are made of atoms they can use for something else. If you don’t want that to happen you need to build, from the ground up, an AI that has something so close to your normalized / idealized utility function as to avert all perverse instantiation pathways.
There isn’t a small class of “threat” pathways that you patch, or a conscience module that you install, and then you’re left with an AI that’s like the previous AI but safe, like a safe paperclip maximizer that doesn’t harm humans. That’s not what’s happening here.
It sounds like you’re nervous about some unspecified kind of bad behavior from AIs, like someone nervous in an unspecified way about, oh, say, genetically modified foods, and then you want “safe foods” instead, or you want to slap some kind of wacky crackpot behavior-limiter on the AI so it can never threaten you in this mysterious way you worry about.
Which brings us to the other image problem: you’re using a technophobic codeword, “safe”.
Imagine somebody advocating for “safe nuclear power plants, instead of the nuclear plants we have now”.
If you’re from a power plant company the anti-nuclear advocates are like, “Nice try, but we know that no matter what kind of clever valve you’re putting on the plant, it’s not really safe.” Even the pro-nuclear people would quietly grit their teeth and swallow their words, because they know, but cannot say, that this safety is not perfect. I can’t imagine Bruce Schneier getting behind any cryptographic initiative that was called “safe computing”; everyone in the field knows better, and in that field they’re allowed to say so.
If you’re not from a power plant company—which we’re not, in the metaphor—if you look more like some kind of person making a bunch of noise about social interests, then the pro-nuclear types who see the entire global warming problem as being caused by anti-nuclear idiots giving us all these coal-burning plants, think that you’re trying to call your thing “safe” to make our on-the-whole good modern nuclear power plants sound “unsafe” by contrast, and that you’ll never be satisfied until everything is being done your way.
Most of our supporters come from technophilic backgrounds. The fundamental image that a technophile has of a technophobe / neo-Luddite is that when a technophobe talks about “safety” their real agenda is to demand unreasonable levels of safety, to keep raising the bar until the technology is driven to near-extinction, all in the name of “safety”. They’re aware of how they lost the fight for nukes. They’re aware that “You’re endangering the children!” is a memetic superweapon, and they regard anyone who resorts to “You’re endangering the children!” as a defector against their standards of epistemic hygiene. You know how so many people think that MIRI is arguing that we ought to do these crazy expensive measures because if there’s even a chance that AI is dangerous, we ought to do these things? even though I’ve repeatedly repudiated that kind of reasoning at every possible juncture? It’s because they’ve been primed to expect attack with a particular memetic superweapon.
When you say “Safe AI”, that’s what a technophile thinks you’re preparing to do—preparing to demand expensive, unnecessary measures and assert your own status over real scientists, using a “You’re endangering the children!” argument that requires unlimited spending on tiny risks. They’ve seen it over, and over, and over again; they’ve seen it with GMOs and nuclear weapons and the FDA regulating drug development out of existence.
“Safety” is a word used by their enemies that means “You must spend infinite money on infinitesimal risks.” Again, this is the fight they’ve seen the forces of science and sanity lose, over and over again.
Take that phenomenon, combined with the fact that what we want is not remotely like a conscience module slapped onto exogenously originating magical threat-risks from otherwise okay AIs, combined with people knowing perfectly well that your innovations do not make AI truly perfectly safe. Then “safe AI” does not sound like a good name to me. Talking about how we want the “best possible” “guarantee” is worse.
“Friendly AI” is there to just not sound like anything, more or less, and if we want to replace it with a more technical-sounding term, it should perhaps also not sound like anything. Maybe we can go back to Greek or Latin roots.
Failing that, “high-assurance AI” at least sounds more like what we actually do than “safe AI”. It doesn’t convey the concept that low-assurance AIs automatically kill you with probability ~1, but at least you’re not using a codeword that people know from anti-GMO campaigns, and at least the corresponding research process someone visualizes sounds a bit more like what we actually do (having to design things from scratch to support certain guarantees, rather than slapping a safety module onto something that already exists).
After thinking and talking about it more, I still think “AGI safety” is the best term I’ve got so far. Or, “AI safety,” in contexts where we don’t mind being less specific, and are speaking to an audience that doesn’t know what “AGI” means.
Basically, (1) I think your objections to “safe AGI” mostly don’t hold for “AGI safety,” and (2) I think the audience you seem most concerned about (technophiles) isn’t the right audience to be most concerned about.
Maybe Schneier wouldn’t get behind something called “safe computing” or “secure computing,” but he happily works in a field called “computer security.” The latter phrasing suggests the idea that we can get some degree of security (or safety) even though we can never make systems 100% safe or secure. Scientists don’t object to people working on “computer security,” and I haven’t seen technophiles object to it either. Heck, many of them work in computer security. “X security” and “X safety” don’t imply to anyone I know that “you must spend infinite money on infinitesimal risks.” It just implies you’re trying to provide some reasonable level of safety and security, and people like that. Technophiles want their autonomous car to be reasonably safe just like everyone else does.
I think your worry that “safety” implies there’s a small class of threat pathways that need to be patched, rather than implying that an AGI needs to be designed from the ground up to stably optimize for your idealized values, is more of a concern. But it’s a small concern. A term like “Friendly AI” is a non-starter for many smart and/or influential people, whereas “AGI safety” serves as a rung in Wittgeinstein’s ladder from which you can go on to explain that the challenge of AGI safety is not to patch a small class of threat pathways but instead to build a system from the ground to ensure desirable behavior.
(Here again, the analogy to other safety-critical autonomous systems is strong. Such systems are often, like FAI, built from the ground up for safety and/or security precisely because in such autonomous systems there isn’t a small class of threat pathways. Instead, almost all possible designs you might come up with don’t do what you intended in some system states or environments. See e.g. my interviews with Michael Fisher and Benjamin Pierce. But that’s not something even most computer scientists will know anything about — it’s an approach to AI safety work that would have to be explained after they’ve already got a foot on the “AGI safety” rung of the expository ladder.)
Moreover, you seem to be most worried about how our terminology will play to the technophile audience. But playing well to technophiles isn’t MIRI’s current or likely future bottleneck. Attracting brilliant researchers is. If we can attract brilliant researchers, funding (from technophiles and others) won’t be so hard. But it’s hard to attract brilliant researchers with a whimsical home-brewed term like “Friendly AI” (especially when it’s paired with other red flags like a shockingly-arrogant-for-academia tone and an apparent lack of familiarity with related work, but that’s a different issue).
As Toby reports, it’s also hard to get the ear of policy-makers with a term like “Friendly AI,” but I know you are less interested in reaching policy-makers than I am.
Anyway, naming things is hard, and I certainly don’t fault you (or was it Bostrom?) for picking “Friendly AI” back in the day, but from our current vantage point we can see better alternatives. Even LWers think so, and I’d expect them to be more sympathetic to “Friendly AI” than anyone else.
Except we’re not; we’re trying to get adequate guarantees...
Sure, that’s a more accurate phrasing. Though I don’t understand how “adequate guarantees” can be harder than “strongest guarantees possible.” Anyway, you can substitute “adequate guarantees” into my sentence and it still makes the same point I wanted to make with that sentence, and still makes the analogy to contemporary high assurance systems.
The main image reason I object to “safe AI” is the image it implies of...
That’s roughly why I prefer “AGI safety” to “safe AGI.” What do you think of “AGI safety” compared to “Safe AGI”?
Which brings us to the other image problem: you’re using a technophobic codeword...
I raised this in the OP and my response was “I’ve not actually witnessed this in reality, and contemporary AI safety researchers seem to be doing fine when they use the word ‘safety’.”
“Friendly AI” is there to just not sound like anything, more or less, and if we want to replace it with a more technical-sounding term, it should perhaps also not sound like anything.
I think these days it sounds like a companion robot, which didn’t really exist when the term was invented. But even then it might have sounded like C-3PO. I do like the not-sound-like-anything approach, though. Possibly via Greek or Latin roots, as you say. Certus-AI (“dependable” in Latin), or something like that.
Except we’re not; we’re trying to get adequate guarantees which is much harder.
The main image reason I object to “safe AI” is the image it implies of, “Oh, well, AIs might be dangerous because, you know, AIs are naturally dangerous for some mysterious reason, so instead you have to build a class of AIs that can never harm people because they have the First Law of Robotics, and then we’re safe.”
Which is just not at all what the technical research program is about.
Which isn’t at all what the bigger picture looks like. The vast majority of self-improving agents have utility functions indifferent to your existence; they do not hate you, nor do they love you, and you are made of atoms they can use for something else. If you don’t want that to happen you need to build, from the ground up, an AI that has something so close to your normalized / idealized utility function as to avert all perverse instantiation pathways.
There isn’t a small class of “threat” pathways that you patch, or a conscience module that you install, and then you’re left with an AI that’s like the previous AI but safe, like a safe paperclip maximizer that doesn’t harm humans. That’s not what’s happening here.
It sounds like you’re nervous about some unspecified kind of bad behavior from AIs, like someone nervous in an unspecified way about, oh, say, genetically modified foods, and then you want “safe foods” instead, or you want to slap some kind of wacky crackpot behavior-limiter on the AI so it can never threaten you in this mysterious way you worry about.
Which brings us to the other image problem: you’re using a technophobic codeword, “safe”.
Imagine somebody advocating for “safe nuclear power plants, instead of the nuclear plants we have now”.
If you’re from a power plant company the anti-nuclear advocates are like, “Nice try, but we know that no matter what kind of clever valve you’re putting on the plant, it’s not really safe.” Even the pro-nuclear people would quietly grit their teeth and swallow their words, because they know, but cannot say, that this safety is not perfect. I can’t imagine Bruce Schneier getting behind any cryptographic initiative that was called “safe computing”; everyone in the field knows better, and in that field they’re allowed to say so.
If you’re not from a power plant company—which we’re not, in the metaphor—if you look more like some kind of person making a bunch of noise about social interests, then the pro-nuclear types who see the entire global warming problem as being caused by anti-nuclear idiots giving us all these coal-burning plants, think that you’re trying to call your thing “safe” to make our on-the-whole good modern nuclear power plants sound “unsafe” by contrast, and that you’ll never be satisfied until everything is being done your way.
Most of our supporters come from technophilic backgrounds. The fundamental image that a technophile has of a technophobe / neo-Luddite is that when a technophobe talks about “safety” their real agenda is to demand unreasonable levels of safety, to keep raising the bar until the technology is driven to near-extinction, all in the name of “safety”. They’re aware of how they lost the fight for nukes. They’re aware that “You’re endangering the children!” is a memetic superweapon, and they regard anyone who resorts to “You’re endangering the children!” as a defector against their standards of epistemic hygiene. You know how so many people think that MIRI is arguing that we ought to do these crazy expensive measures because if there’s even a chance that AI is dangerous, we ought to do these things? even though I’ve repeatedly repudiated that kind of reasoning at every possible juncture? It’s because they’ve been primed to expect attack with a particular memetic superweapon.
When you say “Safe AI”, that’s what a technophile thinks you’re preparing to do—preparing to demand expensive, unnecessary measures and assert your own status over real scientists, using a “You’re endangering the children!” argument that requires unlimited spending on tiny risks. They’ve seen it over, and over, and over again; they’ve seen it with GMOs and nuclear weapons and the FDA regulating drug development out of existence.
“Safety” is a word used by their enemies that means “You must spend infinite money on infinitesimal risks.” Again, this is the fight they’ve seen the forces of science and sanity lose, over and over again.
Take that phenomenon, combined with the fact that what we want is not remotely like a conscience module slapped onto exogenously originating magical threat-risks from otherwise okay AIs, combined with people knowing perfectly well that your innovations do not make AI truly perfectly safe. Then “safe AI” does not sound like a good name to me. Talking about how we want the “best possible” “guarantee” is worse.
“Friendly AI” is there to just not sound like anything, more or less, and if we want to replace it with a more technical-sounding term, it should perhaps also not sound like anything. Maybe we can go back to Greek or Latin roots.
Failing that, “high-assurance AI” at least sounds more like what we actually do than “safe AI”. It doesn’t convey the concept that low-assurance AIs automatically kill you with probability ~1, but at least you’re not using a codeword that people know from anti-GMO campaigns, and at least the corresponding research process someone visualizes sounds a bit more like what we actually do (having to design things from scratch to support certain guarantees, rather than slapping a safety module onto something that already exists).
After thinking and talking about it more, I still think “AGI safety” is the best term I’ve got so far. Or, “AI safety,” in contexts where we don’t mind being less specific, and are speaking to an audience that doesn’t know what “AGI” means.
Basically, (1) I think your objections to “safe AGI” mostly don’t hold for “AGI safety,” and (2) I think the audience you seem most concerned about (technophiles) isn’t the right audience to be most concerned about.
Maybe Schneier wouldn’t get behind something called “safe computing” or “secure computing,” but he happily works in a field called “computer security.” The latter phrasing suggests the idea that we can get some degree of security (or safety) even though we can never make systems 100% safe or secure. Scientists don’t object to people working on “computer security,” and I haven’t seen technophiles object to it either. Heck, many of them work in computer security. “X security” and “X safety” don’t imply to anyone I know that “you must spend infinite money on infinitesimal risks.” It just implies you’re trying to provide some reasonable level of safety and security, and people like that. Technophiles want their autonomous car to be reasonably safe just like everyone else does.
I think your worry that “safety” implies there’s a small class of threat pathways that need to be patched, rather than implying that an AGI needs to be designed from the ground up to stably optimize for your idealized values, is more of a concern. But it’s a small concern. A term like “Friendly AI” is a non-starter for many smart and/or influential people, whereas “AGI safety” serves as a rung in Wittgeinstein’s ladder from which you can go on to explain that the challenge of AGI safety is not to patch a small class of threat pathways but instead to build a system from the ground to ensure desirable behavior.
(Here again, the analogy to other safety-critical autonomous systems is strong. Such systems are often, like FAI, built from the ground up for safety and/or security precisely because in such autonomous systems there isn’t a small class of threat pathways. Instead, almost all possible designs you might come up with don’t do what you intended in some system states or environments. See e.g. my interviews with Michael Fisher and Benjamin Pierce. But that’s not something even most computer scientists will know anything about — it’s an approach to AI safety work that would have to be explained after they’ve already got a foot on the “AGI safety” rung of the expository ladder.)
Moreover, you seem to be most worried about how our terminology will play to the technophile audience. But playing well to technophiles isn’t MIRI’s current or likely future bottleneck. Attracting brilliant researchers is. If we can attract brilliant researchers, funding (from technophiles and others) won’t be so hard. But it’s hard to attract brilliant researchers with a whimsical home-brewed term like “Friendly AI” (especially when it’s paired with other red flags like a shockingly-arrogant-for-academia tone and an apparent lack of familiarity with related work, but that’s a different issue).
As Toby reports, it’s also hard to get the ear of policy-makers with a term like “Friendly AI,” but I know you are less interested in reaching policy-makers than I am.
Anyway, naming things is hard, and I certainly don’t fault you (or was it Bostrom?) for picking “Friendly AI” back in the day, but from our current vantage point we can see better alternatives. Even LWers think so, and I’d expect them to be more sympathetic to “Friendly AI” than anyone else.
I’ll say again, “high assurance AI” better captures everything you described than “AI safety”.
Sure, that’s a more accurate phrasing. Though I don’t understand how “adequate guarantees” can be harder than “strongest guarantees possible.” Anyway, you can substitute “adequate guarantees” into my sentence and it still makes the same point I wanted to make with that sentence, and still makes the analogy to contemporary high assurance systems.
That’s roughly why I prefer “AGI safety” to “safe AGI.” What do you think of “AGI safety” compared to “Safe AGI”?
I raised this in the OP and my response was “I’ve not actually witnessed this in reality, and contemporary AI safety researchers seem to be doing fine when they use the word ‘safety’.”
I think these days it sounds like a companion robot, which didn’t really exist when the term was invented. But even then it might have sounded like C-3PO. I do like the not-sound-like-anything approach, though. Possibly via Greek or Latin roots, as you say. Certus-AI (“dependable” in Latin), or something like that.
Unfortunately there’s cross-contamination with “certifiable” which is NOT a label you want associated with an AI :-D