To be honest, I wasn’t really pointing at you when I made the comment, more at the practice of the hedges and the qualifiers. I want to emphasise that (from the evidence available to me publicly) I think that you have internalised your beliefs a lot more than those the author collects into the “uniparty”. I think that you have acted bravely and with courage in support of your convictions, especially in face of the NDA situation, for which I hold immense respect. It could not have been easy to leave when you did.
However, my interpretation of what the author is saying is that beliefs like “I think what these people are doing might seriously end the world” are in a sense fundamentally difficult to square with measured reasoning and careful qualifiers. The end of the world and existential risk are by their nature so totalising and awful ideas that any “sane” interaction with them (as in, trying to set measured bounds and make sensible models) is extremely epistemically unsound, the equivalent of arguing whether 1e8 + 14 people or 1e8 + 17 people (3 extra lives!) will be the true number of casualties in some kind of planetary extinction event when the error bars are themselves +- 1e5 or 1e6. (We are, after all, dealing with never-seen-before black swan events.)
In this sense, detailed debates about which metrics to include in a takeoff model and the precise slope of the METR exponential curve and which combination of chip trade and export policies increases tail risk the most/least is itself a kind of deception. This is because the arguing over details implies that our world and risk models have more accuracy and precision than they actually do, and in turn that we have more control over events than we actually do. “Directionally correct” is in fact the most accuracy we’re going to get, because (per the author) Silicon Valley isn’t actually doing some kind of carefully calculated compute-optimal RSI takeoff launch sequence with a well understood theory of learning. The AGI “industry” is more like a group of people pulling the lever of a slot machine over and over and over again, egged on by a crowd of eager onlookers, spending down the world’s collective savings accounts until one of them wins big. By “win big”, of course, I mean “unleashes a fundamentally new kind of intelligence into the world”. And each of them may do it for different reasons, and some of them may in their heads actually have some kind of master plan, but all it looks like from the outside is ka-ching, ka-ching, ka-ching, ka-ching...
OK, thanks! It sounds like you are saying that I shouldn’t be engaged in research projects like the AI Futures Model, AI 2027, etc.? On the grounds that they are deceptive, by implying that the situation is more under control, more normal, more OK than it is?
I agree that we should try to avoid giving that impression. But I feel like the way forward is to still do the research but then add prominent disclaimers, rather than abandon the research entirely.
Silicon Valley isn’t actually doing some kind of carefully calculated compute-optimal RSI takeoff launch sequence with a well understood theory of learning. The AGI “industry” is more like a group of people pulling the lever of a slot machine over and over and over again, egged on by a crowd of eager onlookers, spending down the world’s collective savings accounts until one of them wins big. By “win big”, of course, I mean “unleashes a fundamentally new kind of intelligence into the world”. And each of them may do it for different reasons, and some of them may in their heads actually have some kind of master plan, but all it looks like from the outside is ka-ching, ka-ching, ka-ching, ka-ching...
Just to be clear, while I “vibe very hard” with what the author says on a conceptual level, I’m not directly calling for you to shut down those projects. I’m trying to explain what I think the author sees as a problem within the AI safety movement. Because I am talking to you specifically, I am using the immediate context of your work, but only as a frame not as a target. I found AI 2027 engaging, a good representation of a model of how takeoff will happen, and I thought it was designed and written well (tbh my biggest quibble is “why isn’t it called AI 2028″). The author is very very light on actual positive “what we should do” policy recommendations, so if I talked about that I would be filling in with my own takes, which probably differ from the author’s in several places. I am happy to do that if you want, though probably not publicly in a LW thread.
Finally, my interpretation of “Chapter 18: What Is to Be Done?” (and the closest I will come to answering your question based on the author’s theory/frame) is something like “the AGI-birthing dynamic is not a rational dynamic, therefore it cannot be defeated by policies or strategies that are focused around rational action”. Furthermore, since each actor wants to believe that their contribution to the dynamic is locally rational (if I don’t do it someone else will/I’m counterfactually helping/this intervention will be net positive/I can use my influence for good at a pivotal moment [...] pick your argument), further arguments about optimally rational policies only encourages the delusion that everyone is acting rationally, making them dig in their heels further.
The core emotions the author points to that motivate the AGI dynamic are: thrill of novelty/innovation/discovery, paranoia and fear about “others” (other labs/other countries/other people) achieving immense power, distrust of institutions, philosophies, and systems that underpin the world, and a sense of self importance/destiny. All of these can be justified with intellectual arguments but are often the bottom line that comes before such arguments are written. On the other hand the author also shows how poor emotional understanding and estrangement from one’s emotions and intuitions lead to people getting trapped by faulty but extremely sophisticated logic. Basically, emotions and intuitions offer first order heuristics in the massively high dimensional space of possible actions/policies, and when you cut off the heuristic system you are vulnerable to high dimensional traps/false leads that your logic or deductive abilities are insufficient to extract you from.
Therefore, the answer the author is pointing at is something like an emotional or frame realignment challenge. You don’t start arguing with a suicidal person about why the logical reasons they have offered for jumping don’t make sense (at least, you don’t do this if you want them to stay alive), you try to point them to a different emotional frame or state (i.e. calming them down and showing them there is a way out). Though he leaves it very vague, it seems that he believes the world will also need such a fundamental frame shift or belief-reinterpretation to actually exit this destructive dynamic, the magnitude of which he likens to a religious revelation and compares to the redemptive power of love. Beyond this point I would be filling in my own interpretation and I will stop there, but I have a lot more thoughts about this (especially the idea of love/coordination/ends to moloch).
To be honest, I wasn’t really pointing at you when I made the comment, more at the practice of the hedges and the qualifiers. I want to emphasise that (from the evidence available to me publicly) I think that you have internalised your beliefs a lot more than those the author collects into the “uniparty”. I think that you have acted bravely and with courage in support of your convictions, especially in face of the NDA situation, for which I hold immense respect. It could not have been easy to leave when you did.
However, my interpretation of what the author is saying is that beliefs like “I think what these people are doing might seriously end the world” are in a sense fundamentally difficult to square with measured reasoning and careful qualifiers. The end of the world and existential risk are by their nature so totalising and awful ideas that any “sane” interaction with them (as in, trying to set measured bounds and make sensible models) is extremely epistemically unsound, the equivalent of arguing whether 1e8 + 14 people or 1e8 + 17 people (3 extra lives!) will be the true number of casualties in some kind of planetary extinction event when the error bars are themselves +- 1e5 or 1e6. (We are, after all, dealing with never-seen-before black swan events.)
In this sense, detailed debates about which metrics to include in a takeoff model and the precise slope of the METR exponential curve and which combination of chip trade and export policies increases tail risk the most/least is itself a kind of deception. This is because the arguing over details implies that our world and risk models have more accuracy and precision than they actually do, and in turn that we have more control over events than we actually do. “Directionally correct” is in fact the most accuracy we’re going to get, because (per the author) Silicon Valley isn’t actually doing some kind of carefully calculated compute-optimal RSI takeoff launch sequence with a well understood theory of learning. The AGI “industry” is more like a group of people pulling the lever of a slot machine over and over and over again, egged on by a crowd of eager onlookers, spending down the world’s collective savings accounts until one of them wins big. By “win big”, of course, I mean “unleashes a fundamentally new kind of intelligence into the world”. And each of them may do it for different reasons, and some of them may in their heads actually have some kind of master plan, but all it looks like from the outside is ka-ching, ka-ching, ka-ching, ka-ching...
OK, thanks! It sounds like you are saying that I shouldn’t be engaged in research projects like the AI Futures Model, AI 2027, etc.? On the grounds that they are deceptive, by implying that the situation is more under control, more normal, more OK than it is?
I agree that we should try to avoid giving that impression. But I feel like the way forward is to still do the research but then add prominent disclaimers, rather than abandon the research entirely.
I agree with this fwiw.
Just to be clear, while I “vibe very hard” with what the author says on a conceptual level, I’m not directly calling for you to shut down those projects. I’m trying to explain what I think the author sees as a problem within the AI safety movement. Because I am talking to you specifically, I am using the immediate context of your work, but only as a frame not as a target. I found AI 2027 engaging, a good representation of a model of how takeoff will happen, and I thought it was designed and written well (tbh my biggest quibble is “why isn’t it called AI 2028″). The author is very very light on actual positive “what we should do” policy recommendations, so if I talked about that I would be filling in with my own takes, which probably differ from the author’s in several places. I am happy to do that if you want, though probably not publicly in a LW thread.
@Daniel Kokotajlo Addendum:
Finally, my interpretation of “Chapter 18: What Is to Be Done?” (and the closest I will come to answering your question based on the author’s theory/frame) is something like “the AGI-birthing dynamic is not a rational dynamic, therefore it cannot be defeated by policies or strategies that are focused around rational action”. Furthermore, since each actor wants to believe that their contribution to the dynamic is locally rational (if I don’t do it someone else will/I’m counterfactually helping/this intervention will be net positive/I can use my influence for good at a pivotal moment [...] pick your argument), further arguments about optimally rational policies only encourages the delusion that everyone is acting rationally, making them dig in their heels further.
The core emotions the author points to that motivate the AGI dynamic are: thrill of novelty/innovation/discovery, paranoia and fear about “others” (other labs/other countries/other people) achieving immense power, distrust of institutions, philosophies, and systems that underpin the world, and a sense of self importance/destiny. All of these can be justified with intellectual arguments but are often the bottom line that comes before such arguments are written. On the other hand the author also shows how poor emotional understanding and estrangement from one’s emotions and intuitions lead to people getting trapped by faulty but extremely sophisticated logic. Basically, emotions and intuitions offer first order heuristics in the massively high dimensional space of possible actions/policies, and when you cut off the heuristic system you are vulnerable to high dimensional traps/false leads that your logic or deductive abilities are insufficient to extract you from.
Therefore, the answer the author is pointing at is something like an emotional or frame realignment challenge. You don’t start arguing with a suicidal person about why the logical reasons they have offered for jumping don’t make sense (at least, you don’t do this if you want them to stay alive), you try to point them to a different emotional frame or state (i.e. calming them down and showing them there is a way out). Though he leaves it very vague, it seems that he believes the world will also need such a fundamental frame shift or belief-reinterpretation to actually exit this destructive dynamic, the magnitude of which he likens to a religious revelation and compares to the redemptive power of love. Beyond this point I would be filling in my own interpretation and I will stop there, but I have a lot more thoughts about this (especially the idea of love/coordination/ends to moloch).