This is a useful application of a probability map! If an important term has multiple competing definitions, create nodes for all of them, link the ones you consider important to a central p(doom) node (assuming you are interested in that concept), and let other people disagree with your assessment, but with a clearer sense of what they specifically disagree about.
WillPetillo
The Techno-Pessimist Lens
Principles of AI Uncontrollability
It’s dangerous to calculate p(doom) alone! Take this.
Lenses, Metaphors, and Meaning
The basic contention here seems to be that the biggest dangers of LLMs is not from the systems themselves, but from the overreliance, excessive trust, etc. that societies and institutions put on them. Another is that “hyping LLMs”—which I assume includes folks here expressing concerns that AI will go rogue and take over the world—increases perceptions of AI’s abilities, which feeds into this overreliance. A conclusion is that promoting “x-risk” as a reason for pausing AI will have the unintended side effect of increasing (catastrophic, but not existential) dangers associated with overreliance.
This is an interesting idea, not least because it’s a common intuition among the “AI Ethics” faction, and therefore worth hashing out. Here are my reasons for skepticism:
1. The hype that matters comes from large-scale investors (and military officers) trying to get in on the next big thing. I assume these folks are paying more attention to corporate sales pitches than Internet Academics and people holding protest signs—and that their background point of reference is not Terminator, but the FOMO common in the tech industry (which makes sense in a context where losing market share is a bigger threat than losing investment dollars).
2. X-risk scenarios are admittedly less intuitive in the context of self supervised learning based LLMs than they were back when reinforcement learning was at the center of development as AI learned to play increasingly broad ranges of games. These systems regularly specification-gamed their environments and it was chilling to think about what would happen when a system could treat the entire world as a game. A concern now, however, is that agency will make a comeback because it is economically useful. Imagine the brutal, creative effectiveness of RL combined with the broad-based common sense of SSL. This reintegration of agency (can’t speak to the specific architecture) into leading AI systems is what the tech companies are actively developing towards. More on this concept in my Simulators sequence.
I, for one, will find your argument more compelling if you (1) take a deep dive into AI development motivations, rather than just lumping it all together as “hype”, and (2) explain why AI development stops with the current paradigm of LLM-fueled chatbots or something similarly innocuous in itself but potentially dangerous in the context of societal overreliance.
The motivation of this post was to design a thought experiment involving a fully self-sufficient machine ecology that remains within constraints designed to benefit something outside of the system, not as a suggestion for how to make best use of the moon.
Project Moonbeam
Agree, when discussing the alignment of simulators in this post, we are referring to safety from the subset of dangers related to unbounded optimization towards alien goals, which does not include everything within value alignment, let alone AI safety. But this qualification points to a subtle meaning drift in use of the word “alignment” in this post (towards something like “comprehension and internalization of human values”) which isn’t good practice and something I’ll want to figure out how to edit/fix soon.
Emergence of Simulators and Agents
Agents, Simulators and Interpretability
I am having difficulty seeing why anyone would regard these two viewpoints as opposed.
We discuss this indirectly in the first post in this sequence outlining what it means to describe a system through the lens of an agent, tool, or simulator. Yes, the concepts overlap, but there is nonetheless a kind of tension between them. In the case of agent vs. simulator, our central question is: which property is “driving the bus” with respect to the system’s behavior, utilizing the other in its service?
The second post explores the implications of the above distinction, predicting different types of values—and thus behavior—from an agent that contains a simulation of the world and uses it to navigate vs. a simulator that generates agents because such agents are part of the environment the system is modelling vs. a system where the modes are so entangled it is meaningless to even talk about where one ends and the other begins. Specifically, I would expect simulator-first systems to have wide value boundaries that internalize (and approximation of) human values, but more narrow, maximizing behavior from agent-first systems.
Case Studies in Simulators and Agents
Aligning Agents, Tools, and Simulators
It seems to me that the most robust solution is to do it the hard way: know the people involved really well, both directly and via reputation among people you also know really well—ideally by having lived with them in a small community for a few decades.
Agents, Tools, and Simulators
Selection bias. Those of us who were inclined to consider working on outreach and governance have joined groups like PauseAI, StopAI, and other orgs. A few of us reach back on occasion to say “Come on in, the water’s fine!” The real head-scratcher for me is the lack of engagement on this topic. If one wants to deliberate on a much higher level of detail than the average person, cool—it takes all kinds to make a world. But come on, this is obviously high stakes enough to merit attention.
Anti-memes: x-risk edition
Thanks for the link! It’s important to distinguish here between:
(1) support for the movement,
(2) support for the cause, and
(3) active support for the movement (i.e. attracting other activists to show up at future demonstrations)
Most of the paper focuses on 1, and also on activist’s beliefs about the impact of their actions. I am more interested in 2 and 3. To be fair, the paper gives some evidence for detrimental impacts on 2 in the Trump example. It’s not clear, however, whether the nature of the cause matters here. Support for Trump is highly polarized and entangled with culture, whereas global warming (Hallam’s cause) and AI risk (PauseAI’s) have relatively broad but frustratingly lukewarm public support. There are also many other factors when looking past short-term onlooker sentiment to the larger question of affecting social change, which the paper readily admits in the Discussion section. I’d list these points, but they largely overlap with the points I made in my post...though it was interesting to see how much was speculative. More research is needed.
In any case, I bring up the extreme case to illustrate that the issue is far more nuanced than “regular people get squeamish—net negative!” This is actually somewhat irrelevant to PauseAI in particular, because most of our actions are around public education and lobbying, and even the protests are legal and non-disruptive. I’ve been in two myself and have seen nothing but positive sentiment from onlookers (with the exception of the occasional “good luck with that!” snark). The hard part with all of these is getting people to show up. (This last paragraph is not a rebuttal to anything you have said, it’s a reminder of context)
My takeaway from this post is that there are several properties of relating that people expect to converge, but in your case (and in some contexts) don’t. With empathy, there’s:
1. Depth of understanding of the other person’s experience
2. Negative judgment
3. Mirroring
I mention 3 because I think it’s strictly closer to the definition of empathy than 1, but it’s mostly irrelevant to this post. If I had this kind of empathy for the woman in the video, I’d be thinking: “man, my head hurts.”
The common narrative is that as 1 increases, 2 drops to zero, or even becomes positive judgement. This is probably true sometime, such as when counteracting the fundamental attribution error, but sometimes not: “This person is isn’t getting their work done, that’s somewhat annoying...oh, it’s because they don’t care about their education? Gaaahhh!!!” I can relate to this.
Regarding relating better without lowering standards, the questions that come to my mind are:
1. Is this a case where things have to get worse before they get better? As in, zero understanding leads to low judgement with suspension of disbelief, motivational understanding leads to high judgement, but full-story understanding returns to low judgment without relying on suspension of disbelief. Is there a way to test this without driving yourself crazy or taking up an inordinate amount of time?
2. Can you dissolve your moral judgement while keeping understanding constant? That is: “this teammate isn’t doing their share of the work because they didn’t care enough to be prepared...and this isn’t a thing I need to be angry about.” If this route looks interesting, my suggestion for the first step of the path is to introspect on the anger/disgust/etc. and what it’s protecting.