AI risk is just a shorthand for “accidental technical AI risk.”
I don’t think “AI risk” was originally meant to be a shorthand for “accidental technical AI risk”. The earliest considered (i.e., not off-hand) usage I can find is in the title of Luke Muehlhauser’s AI Risk and Opportunity: A Strategic Analysis where he defined it as “the risk of AI-caused extinction”.
(He used “extinction” but nowadays we tend think in terms of “existential risk” which also includes “permanent large negative consequences”, which seems like an reasonable expansion of “AI risk”.)
However, I disagree with the idea that we should expand the word AI risk to include philosophical failures and intentional risks.
I want to include philosophical failures, as long as the consequences of the failures flow through AI, because (aside from historical usage) technical problems and philosophical problems blend into each other, and I don’t see a point in drawing an arbitrary and potentially contentious border between them. (Is UDT a technical advance or a philosophical advance? Is defining the right utility function for a Sovereign Singleton a technical problem or a philosophical problem? Why force ourselves to answer these questions?)
As for “intentional risks” it’s already common practice to include that in “AI risk”:
Dividing AI risks into misuse risks and accident risks has become a prevailing approach in the field.
Besides that, I think there’s also a large grey area between “accident risk” and “misuse” where the risk partly comes from technical/philosophical problems and partly from human nature. For example humans might be easily persuaded by wrong but psychologically convincing moral/philosophical arguments that AIs can come up with and then order their AIs to do terrible things. Even pure intentional risks might have technical solutions. Again I don’t really see the point of trying to figure out which of these problems should be excluded from “AI risk”.
It becomes unclear in conversation what people mean when they say AI risk
It seems perfectly fine to me to use that as shorthand for “AI-caused x-risk” and use more specific terms when we mean more specific risks.
Like The Singularity, it becomes a buzzword
What do you mean? Like people will use “AI risk” when their project has nothing to do with “AI-caused x-risk”? Couldn’t they do that even if we define “AI risk” to be “accidental technical AI risk”?
Journalists start projecting Terminator scenarios onto the words, and now have justification because even the researchers say that AI risk can mean a lot of different things.
Terminator scenarios seem to be scenarios of “accidental technical AI risk” (they’re just not very realistic scenarios) so I don’t see how defining “AI risk” to mean that would prevent journalists from using Terminator scenarios to illustrate “AI risk”.
It puts a whole bunch of types of risk into one basket, suggesting to outsiders that all attempts to reduce “AI risk” might be equally worthwhile.
I don’t think this is a good argument, because even within “accidental technical AI risk” there are different problems that aren’t equally worthwhile to solve, so why aren’t you already worried about outsiders thinking all those problems are equally worthwhile?
ML researchers start to distrust AI risk researchers, because people who are worried about the Terminator are using the same words as the AI risk researchers and therefore get associated with them.
See my response above regarding “Terminator scenarios”.
This can all be avoided by having a community norm to clarify that we mean technical accidental risk when we say AI risk, and when we’re talking about other types of risks we use more precise terminology.
I propose that we instead stick with historical precedent and keep “AI risk” to mean “AI-caused x-risk” and use more precise terminology to refer to more specific types of AI-caused x-risk that we might want to talk about. Aside from what I wrote above, it’s just more intuitive/commonsensical that “AI risk” means “AI-caused x-risk” in general instead of a specific kind of AI-caused x-risk.
However I appreciate that someone who works mostly on the less philosophical / less human-related problems might find it tiresome to say or type “technical accidental AI risk” all the time to describe what they do or to discuss the importance of their work, and can find it very tempting to just use “AI risk”. It would probably be good to create a (different) shorthand or acronym for it to remove this temptation and to make their lives easier.
I appreciate the arguments, and I think you’ve mostly convinced me, mostly because of the historical argument.
I do still have some remaining apprehension about using AI risk to describe every type of risk arising from AI.
I want to include philosophical failures, as long as the consequences of the failures flow through AI, because (aside from historical usage) technical problems and philosophical problems blend into each other, and I don’t see a point in drawing an arbitrary and potentially contentious border between them.
That is true. The way I see it, UDT is definitely on the technical side, even though it incorporates a large amount of philosophical background. When I say technical, I mostly mean “specific, uses math, has clear meaning within the language of computer science” rather than a more narrow meaning of “is related to machine learning” or something similar.
My issue with arguing for philosophical failure is that, as I’m sure you’re aware, there’s a well known failure mode of worrying about vague philosophical problems rather than more concrete ones. Within academic philosophy, the majority of discussion surrounding AI is centered around consciousness, intentionality, whether it’s possible to even construct a human-like machine, whether they should have rights etc.
There’s a unique thread of philosophy that arose from Lesswrong, which includes work on decision theory, that doesn’t focus on these thorny and low priority questions. While I’m comfortable with you arguing that philosophical failure is important, my impression is that the overly philosophical approach used by many people has done more harm than good for the field in the past, and continues to do so.
It is therefore sometimes nice to tell people that the problems that people work on here are concrete and specific, and don’t require doing a ton of abstract philosophy or political advocacy.
I don’t think this is a good argument, because even within “accidental technical AI risk” there are different problems that aren’t equally worthwhile to solve, so why aren’t you already worried about outsiders thinking all those problems are equally worthwhile?
This is true, but my impression is that when you tell people that a problem is “technical” it generally makes them refrain from having a strong opinion before understanding a lot about it. “Accidental” also reframes the discussion by reducing the risk of polarizing biases. This is a common theme in many fields:
Physicists sometimes get frustrated with people arguing about “the philosophy of the interpretation of quantum mechanics” because there’s a large subset of people who think that since it’s philosophical, then you don’t need to have any subject-level expertise to talk about it.
Economists try to emphasize that they use models and empirical data, because a lot of people think that their field of study is more-or-less just high status opinion + math. Emphasizing that there are real, specific models that they study helps to reduce this impression. Same with political science.
A large fraction of tech workers are frustrated about the use of Machine Learning as a buzzword right now, and part of it is that people started saying Machine Learning = AI rather than Machine Learning = Statistics, and so a lot of people thought that even if they don’t understand statistics, they can understand AI since that’s like philosophy and stuff.
But I’ve drawn much closer to the community over the last few years, because of a combination of factors: [...] The AI-risk folks started publishing some research papers that I found interesting—some with relatively approachable problems that I could see myself trying to think about if quantum computing ever got boring. This shift seems to have happened at roughly around the same time my former student, Paul Christiano, “defected” from quantum computing to AI-risk research.
My guess is that this shift in his thinking occurred because a lot of people started talking about technical risks from AI, rather than framing it as a philosophy problem, or a problem of eliminating bad actors. Eliezer has shared this viewpoint for years, writing in the CEV document,
Warning: Beware of things that are fun to argue.
reflecting the temptation to derail discussions about technical accidental risks.
I don’t think “AI risk” was originally meant to be a shorthand for “accidental technical AI risk”. The earliest considered (i.e., not off-hand) usage I can find is in the title of Luke Muehlhauser’s AI Risk and Opportunity: A Strategic Analysis where he defined it as “the risk of AI-caused extinction”.
(He used “extinction” but nowadays we tend think in terms of “existential risk” which also includes “permanent large negative consequences”, which seems like an reasonable expansion of “AI risk”.)
I want to include philosophical failures, as long as the consequences of the failures flow through AI, because (aside from historical usage) technical problems and philosophical problems blend into each other, and I don’t see a point in drawing an arbitrary and potentially contentious border between them. (Is UDT a technical advance or a philosophical advance? Is defining the right utility function for a Sovereign Singleton a technical problem or a philosophical problem? Why force ourselves to answer these questions?)
As for “intentional risks” it’s already common practice to include that in “AI risk”:
Besides that, I think there’s also a large grey area between “accident risk” and “misuse” where the risk partly comes from technical/philosophical problems and partly from human nature. For example humans might be easily persuaded by wrong but psychologically convincing moral/philosophical arguments that AIs can come up with and then order their AIs to do terrible things. Even pure intentional risks might have technical solutions. Again I don’t really see the point of trying to figure out which of these problems should be excluded from “AI risk”.
It seems perfectly fine to me to use that as shorthand for “AI-caused x-risk” and use more specific terms when we mean more specific risks.
What do you mean? Like people will use “AI risk” when their project has nothing to do with “AI-caused x-risk”? Couldn’t they do that even if we define “AI risk” to be “accidental technical AI risk”?
Terminator scenarios seem to be scenarios of “accidental technical AI risk” (they’re just not very realistic scenarios) so I don’t see how defining “AI risk” to mean that would prevent journalists from using Terminator scenarios to illustrate “AI risk”.
I don’t think this is a good argument, because even within “accidental technical AI risk” there are different problems that aren’t equally worthwhile to solve, so why aren’t you already worried about outsiders thinking all those problems are equally worthwhile?
See my response above regarding “Terminator scenarios”.
I propose that we instead stick with historical precedent and keep “AI risk” to mean “AI-caused x-risk” and use more precise terminology to refer to more specific types of AI-caused x-risk that we might want to talk about. Aside from what I wrote above, it’s just more intuitive/commonsensical that “AI risk” means “AI-caused x-risk” in general instead of a specific kind of AI-caused x-risk.
However I appreciate that someone who works mostly on the less philosophical / less human-related problems might find it tiresome to say or type “technical accidental AI risk” all the time to describe what they do or to discuss the importance of their work, and can find it very tempting to just use “AI risk”. It would probably be good to create a (different) shorthand or acronym for it to remove this temptation and to make their lives easier.
I appreciate the arguments, and I think you’ve mostly convinced me, mostly because of the historical argument.
I do still have some remaining apprehension about using AI risk to describe every type of risk arising from AI.
That is true. The way I see it, UDT is definitely on the technical side, even though it incorporates a large amount of philosophical background. When I say technical, I mostly mean “specific, uses math, has clear meaning within the language of computer science” rather than a more narrow meaning of “is related to machine learning” or something similar.
My issue with arguing for philosophical failure is that, as I’m sure you’re aware, there’s a well known failure mode of worrying about vague philosophical problems rather than more concrete ones. Within academic philosophy, the majority of discussion surrounding AI is centered around consciousness, intentionality, whether it’s possible to even construct a human-like machine, whether they should have rights etc.
There’s a unique thread of philosophy that arose from Lesswrong, which includes work on decision theory, that doesn’t focus on these thorny and low priority questions. While I’m comfortable with you arguing that philosophical failure is important, my impression is that the overly philosophical approach used by many people has done more harm than good for the field in the past, and continues to do so.
It is therefore sometimes nice to tell people that the problems that people work on here are concrete and specific, and don’t require doing a ton of abstract philosophy or political advocacy.
This is true, but my impression is that when you tell people that a problem is “technical” it generally makes them refrain from having a strong opinion before understanding a lot about it. “Accidental” also reframes the discussion by reducing the risk of polarizing biases. This is a common theme in many fields:
Physicists sometimes get frustrated with people arguing about “the philosophy of the interpretation of quantum mechanics” because there’s a large subset of people who think that since it’s philosophical, then you don’t need to have any subject-level expertise to talk about it.
Economists try to emphasize that they use models and empirical data, because a lot of people think that their field of study is more-or-less just high status opinion + math. Emphasizing that there are real, specific models that they study helps to reduce this impression. Same with political science.
A large fraction of tech workers are frustrated about the use of Machine Learning as a buzzword right now, and part of it is that people started saying Machine Learning = AI rather than Machine Learning = Statistics, and so a lot of people thought that even if they don’t understand statistics, they can understand AI since that’s like philosophy and stuff.
Scott Aaronson has said
My guess is that this shift in his thinking occurred because a lot of people started talking about technical risks from AI, rather than framing it as a philosophy problem, or a problem of eliminating bad actors. Eliezer has shared this viewpoint for years, writing in the CEV document,
reflecting the temptation to derail discussions about technical accidental risks.