Note: This is a quick write-up of some thoughts I’ve had recently about communicating AI risk.
There are two claims I wish to argue for concerning understanding and communicating AI risk. Neither of them is entirely original, but I thought it is worth articulating them clearly now that AI progress is starting to gain more public attention. The first is that communicating the difficulty of the different sub-areas of AI safety tends to involve many abstractions that might be confusing as well as creating future scenarios that at the moment might sound far-fetched. In this post, I’ll assume that abstractions are conducive to reaching scientific insights. The second claim I’ll put forth is a prediction that in the coming years, everything will be “normal” until something bad happens. I’ll explain what “normal” means and I’ll try to make specific bets and challenge myself to be as epistemically virtuous as possible.
Talking in terms of abstractions
I see technological forecasting as a rigorous version of philosophical thought experimenting. Writing a concrete AI risk story, for example, entails giving a lot of details about a possible world in the future, near-term, or long-term, where AI capabilities impact society at large in different ways. Stories of this sort require abstractions. Just like the rest of the research in AI safety. And if we want to be completely honest, this is not particularly different from the rest of science—we simply happen to care a lot more about understanding the various failure modes of advanced AIs and find ourselves scarily confused.
Notice that by “abstraction” I mean both the process of deriving a general concept from specific instances and the property of an idea to be abstract, i.e., not concrete or instantiated. I think that these are easy to conflate because they’re not exactly dissimilar but they don’t technically overlap either. The problem with abstractions is that they create a kind of map vs. territory gap. The better our abstractions become, the more likely it is that we will get things right about the world. In the meantime, we find ourselves in a dark forest of unprocessed conceptual tools that we try to manipulate so that we can move forward. We are, however, still in the dark.
While here, we use these abstractions to describe our situation. We try to construct explanatory frameworks and make predictions. We incorporate our abstractions in the stories we come up with to better understand the world. The technical analyses of AI safety inform how we communicate the risks when explaining them to those outside of the field. This is by default challenging since science communication requires a deep understanding of the actual problems. My claim, which may strike one as trivial, is that the abstractions that concern the future of AI make it hard to explain to outsiders. Again, this phenomenon is not different from the rest of science. What is different is that we desperately need to communicate the potential of AI risk to social actors that could influence AI research, strategy, governance, etc.
Reality does not operate with abstractions- this is just our map. When the bad thing happens it will be as concrete as it can get, just like with all disasters.[1] What we think of as a mere abstraction collapses in the face of things happening out there.
If I had to bet, I’d say we’ve already been living in this kind of “normal” for a while now. The deep learning revolution that’s given us high-quality Google Translate, Netflix algorithms, Roombas, and all the other clever tools we’ve been using makes me predict that as AI models become more and more powerful and exhibit impressive capabilities, most of us will become increasingly less sensitive to their impressiveness. In favor of this prediction, plenty of claims have already circulated the idea that models that speak English or solve math aren’t really intelligent, not to mention the recent piece by Noam Chomsky claiming that “true intelligence is also capable of moral thinking”.[2]
Current language models have mastered the ability to compose all sorts of text and in many cases at a human level. I often joke about how ChatGPT writes like an average college student. And we already think this is “normal”, the normal of the most important century. We’ll see models performing all sorts of tasks in the near future (if not already). Models that design scientific experiments, give accurate medical advice, make people fall in love with them, and so on. And we’ll still hear voices saying “well, but that was only a mediocre experiment” or, “this was a pretty trivial piece of advice”, or “it’s just predicting what you want to hear to make you like it and spitting that out”. Our kind of normal is the world where deep learning mysteriously works and we were quick to take the pros of that for granted until we reached a point of no return.
An invitation to think in slow-motion
AI progress is fast and will most likely continue in this rhythm. Yet, we need to understand it and do our best to minimize the risk. On that note, take a moment to reflect on what’s normal and ask about what would be different if we weren’t “racing to the precipice”. What kind of signs would you want to observe to be convinced that this is a different world?
Everything’s normal until it’s not
Crossposted from the EA Forum: https://forum.effectivealtruism.org/posts/2hduXN5MXCZPqKjSv/everything-s-normal-until-it-s-not
Note: This is a quick write-up of some thoughts I’ve had recently about communicating AI risk.
There are two claims I wish to argue for concerning understanding and communicating AI risk. Neither of them is entirely original, but I thought it is worth articulating them clearly now that AI progress is starting to gain more public attention. The first is that communicating the difficulty of the different sub-areas of AI safety tends to involve many abstractions that might be confusing as well as creating future scenarios that at the moment might sound far-fetched. In this post, I’ll assume that abstractions are conducive to reaching scientific insights. The second claim I’ll put forth is a prediction that in the coming years, everything will be “normal” until something bad happens. I’ll explain what “normal” means and I’ll try to make specific bets and challenge myself to be as epistemically virtuous as possible.
Talking in terms of abstractions
I see technological forecasting as a rigorous version of philosophical thought experimenting. Writing a concrete AI risk story, for example, entails giving a lot of details about a possible world in the future, near-term, or long-term, where AI capabilities impact society at large in different ways. Stories of this sort require abstractions. Just like the rest of the research in AI safety. And if we want to be completely honest, this is not particularly different from the rest of science—we simply happen to care a lot more about understanding the various failure modes of advanced AIs and find ourselves scarily confused.
Notice that by “abstraction” I mean both the process of deriving a general concept from specific instances and the property of an idea to be abstract, i.e., not concrete or instantiated. I think that these are easy to conflate because they’re not exactly dissimilar but they don’t technically overlap either. The problem with abstractions is that they create a kind of map vs. territory gap. The better our abstractions become, the more likely it is that we will get things right about the world. In the meantime, we find ourselves in a dark forest of unprocessed conceptual tools that we try to manipulate so that we can move forward. We are, however, still in the dark.
While here, we use these abstractions to describe our situation. We try to construct explanatory frameworks and make predictions. We incorporate our abstractions in the stories we come up with to better understand the world. The technical analyses of AI safety inform how we communicate the risks when explaining them to those outside of the field. This is by default challenging since science communication requires a deep understanding of the actual problems. My claim, which may strike one as trivial, is that the abstractions that concern the future of AI make it hard to explain to outsiders. Again, this phenomenon is not different from the rest of science. What is different is that we desperately need to communicate the potential of AI risk to social actors that could influence AI research, strategy, governance, etc.
Reality does not operate with abstractions- this is just our map. When the bad thing happens it will be as concrete as it can get, just like with all disasters.[1] What we think of as a mere abstraction collapses in the face of things happening out there.
A kind of “normal” for the most important century
If I had to bet, I’d say we’ve already been living in this kind of “normal” for a while now. The deep learning revolution that’s given us high-quality Google Translate, Netflix algorithms, Roombas, and all the other clever tools we’ve been using makes me predict that as AI models become more and more powerful and exhibit impressive capabilities, most of us will become increasingly less sensitive to their impressiveness. In favor of this prediction, plenty of claims have already circulated the idea that models that speak English or solve math aren’t really intelligent, not to mention the recent piece by Noam Chomsky claiming that “true intelligence is also capable of moral thinking”.[2]
Current language models have mastered the ability to compose all sorts of text and in many cases at a human level. I often joke about how ChatGPT writes like an average college student. And we already think this is “normal”, the normal of the most important century. We’ll see models performing all sorts of tasks in the near future (if not already). Models that design scientific experiments, give accurate medical advice, make people fall in love with them, and so on. And we’ll still hear voices saying “well, but that was only a mediocre experiment” or, “this was a pretty trivial piece of advice”, or “it’s just predicting what you want to hear to make you like it and spitting that out”. Our kind of normal is the world where deep learning mysteriously works and we were quick to take the pros of that for granted until we reached a point of no return.
An invitation to think in slow-motion
AI progress is fast and will most likely continue in this rhythm. Yet, we need to understand it and do our best to minimize the risk. On that note, take a moment to reflect on what’s normal and ask about what would be different if we weren’t “racing to the precipice”. What kind of signs would you want to observe to be convinced that this is a different world?
Katja Grace said something similar during her talk at EAG Oakland 2023. I cannot recall the exact quote, but this is the gist.
If we follow this claim, a deceptively aligned TAI wiping out humanity would not be truly intelligent. As if this would make any difference.