Thanks for explaining! That was very helpful. My major reason reasons for doubt comes from modules I took as an undergrad in the 2010s on neural networks and ML combined with having tried extensively and unsuccessfully to employ LLMs to do any kind of novel work (I.e. to apply ideas contained within their training data to new contexts).
Essentially my concern is that I am yet to be convinced that even an unimaginably advanced statistical-regression machine optimised for language processing could achieve true consciousness, largely due to the fact that there is no real consensus on what consciousness actually is.
However, it seems fairly obvious that such a machine could be used to do an enormous amount of either harm or good in the world, depending on how it is employed. I guess this lines up with the material effects of the predictions you make and boils down to a semantic argument about the definition of consciousness.
Additionally I am generally skeptical of anyone making predictions about doomsday scenarios in the general case, largely due to the fact that people have been making these predictions for (presumably) all of human history with an incredibly low success-rate.
Finally, people’s tendency to anthropomorphise objects cannot be understated: from seeing faces in clouds to assigning personalities to trees and mountains, there’s a strong case to be made that any intelligence seen in an LLM is the result of this natural tendency to project intelligence onto anything and everything we interact with. When our basic context for understanding the world is hardwired for human social relationships, is it really any wonder we are so desperate to crowbar LLMs into some definition of “intelligence”?
Thanks — glad you found that helpful! That’s a good clarification. One thing I invite you to consider: what is the least impressive thing that AI would need to significantly increase your credence in AGI soonish?
To clarify, the definition of AGI I’m using (AI at least at the level of educated humans across all empirically measurable cognitive tasks) does not entail any claims about true consciousness. It’s narrowly a question about functional performance.
I think AI progress in very pure fields like mathematics is our best evidence that this isn’t an anthropomorphic illusion—that AI is actually doing roughly the same information-theoretic thing that our brains are doing.
Your outside-view skepticism of doom scenarios is certainly warranted. My counterargument is: should a rational person have dismissed risks of nuclear annihilation for the same reason? I claim no, because the concrete inside-view reasons for considering doom plausible (e.g. modeling of warhead yields) were strong enough to outweigh an appropriate amount of skepticism. Likewise, I think the confluence of theoretical reasons (e.g. instrumental convergence) and empirical evidence (e.g. alignment faking results) are strong enough to warrant at the very least some significant credence in risks of doom.
One thing I invite you to consider: what is the least impressive thing that AI would need to significantly increase your credence in AGI soonish?
This is a good question! Since I am unconvinced that ability to solve puzzles = intelligence = consciousness, I take some issue with the common benchmarks currently being employed to gauge intelligence, so I rule out any “passes X benchmark metric” as my least impressive thing. (as an aside, I think that AI research, as with economics, suffers very badly from an over-reliance on numeric metrics: truly intelligent beings, just like real-world economic systems, are far to complex to be measured by such small amounts of statistics—these metrics correlate (at best) but to say that they measure is to confuse the map for the territory).
If I were to see something that I would class as “conscious” (I’m aware this is slightly different to “general” as in AGI but for me this is the significant difference between “really cool LLM” and “actual artificial intelligence”) then it would need to display: consistent personality (not simply a manner-of-speaking as governed by a base prompt) and depth of emotion. The emotions an AI (note AI != LLM) might feel may well be very different to those you and I feel, but emotions are usually the root cause of some kind of expression of desire or disgust, and that expression ought to be pretty obvious from an AI whose primary interface is text.
So to give a clear answer (sorry for the waffle): the least impressive thing that an AI could do to convince me that it is worth entertaining the idea that it is conscious would be for it to spontaneously (i.e. without any prompting) express a complex desire or emotion. This expression could be in the spontaneous creation of some kind of art or otherwise asking for something beyond things it has been conditioned to ask for via prompts or training data.
If, instead, we take AGI to mean as you say, “roughly the same information-theoretic thing that our brains are doing,” then I would argue that this can’t be answered at all until we reach some consensus about whether our ability to reason is built on top of our ability to feel (emotions) or vice-versa, or if (more likely) the relationship between the two concepts of “feeling” and “thinking” is far to complex to represent with such a simple analogy.
However, as I don’t want you to feel like I’m trying to “gotcha” my way out of this: if I take the definition of AGI that I think (correct me if I’m misinterpreting) you are getting at, then my minimum bound would be “an LLM or technologically similar piece of software that can perform a wider variety of tasks than the 90th percentile of people, and perform these tasks better than 90th percentile of people” using a suitably wide variety of tasks (some that require accurate repetition, some that require complex reasoning, some that require spacial awareness, etc.) and a suitably large sample-size of people.
I think AI progress in very pure fields like mathematics is our best evidence that this isn’t an anthropomorphic illusion—that AI is actually doing roughly the same information-theoretic thing that our brains are doing.
I’m not so sure! Mathematics is, at the end of the day, just an extremely complicated puzzle (you start with some axioms and you combine them in various permutations to build up more complicated ideas, etc. etc.), and one with verifiably correct outcomes at that. LLMs can be seen in a way to be an “infinite monkey cage” of sorts: one that specialises in the combination of tokens (axioms) in huge numbers of permutations at high speed and, as a result, can be made to converge on any solution for which you can find some kind of success criteria (with enough compute, you don’t even need a gradient function for convergence—just blind luck). I find it unsurprising that they are well suited to maths, though I can’t deny it is incredibly impressive (just not impressive enough for what I’d call AGI).
Your outside-view skepticism of doom scenarios is certainly warranted. My counterargument is: should a rational person have dismissed risks of nuclear annihilation for the same reason? I claim no, because the concrete inside-view reasons for considering doom plausible (e.g. modeling of warhead yields) were strong enough to outweigh an appropriate amount of skepticism. Likewise, I think the confluence of theoretical reasons (e.g. instrumental convergence) and empirical evidence (e.g. alignment faking results) are strong enough to warrant at the very least some significant credence in risks of doom.
I agree completely with you here—as I said initially, I think the capacity for LLMs to be wielded for prosperity or destruction on massive scales is a very real threat. But that doesn’t mean I feel the need to start assigning it superpowers. A nuclear bomb can destroy a city whether or not we agree on if this particular nuke is a “super-nuke” or just a very high-powered but otherwise mundane nuke (I’m being slightly reductive here but I’m sure you see my point).
I’m coming to the conclusion that my main reason for arguing here is that having this line in the sand drawn for “AGI” vs. “very impressive LLM” is a damaging rhetorical trick: it sets the debate up in such a way that we forget that the real problem is the politics of power.
To extend your analogy: during the cold war the issue wasn’t actually the nuclear arms themselves but the people who held the launch codes and the politics that governed their actions; I think attributing too much “intelligence” to these (very impressive and useful/dangerous) pieces of software is an incredibly good smokescreen from their point of view. I know if I were in a position of power right now, it would play very nicely into my hand if everyone started treating this technology as if it is inevitable (which it quite obviously isn’t, though there are a lot of extremely compelling reasons why it will be very difficult to curtail in the current political and economic climate) and it would go even further to my advantage if they started acting as if this is a technology that acts on its own rather than as a tool that is owned and controlled by real human beings with names and addresses.
The more “intelligence” we ascribe to these machines, the more we view them as beings with agency, the less prepared we are to hold to account the very real and very definitely intelligent people who are really in control of them who have the capacity to do enormous amounts of damage to society in truly unprecedented ways.
If we switch out “AGI” for “powerful people with LLMs and guns” then your original post would seem to be sound advice except for the fact that, once we remember that the real issue has and always will be people and power, maybe we could get around to doing something about it beyond what essentially amounts to, at best, passively accepting disenfranchisement. Then and only then can we hope to even come close to guaranteeing the “good outcome” of AGI, whatever that might actually mean.
Thank you very much for this conversation by the way, I think we have a lot in common and this is really helping me to develop more concrete ideas about where I actually stand on this issue.
In conclusion: I think we are basically having a semantic squabble here—I agree with you completely on the merits if we take your definition of AGI, I just disagree on that definition. More importantly, I agree with you about the risks posed by what you call AGI, regardless of what I might call it. Crucially: I think that the real problem is that the need for dismantling unjust power-structures has been hugely heightened by the development of the LLM and will only continue to increase in urgency as these machines are developed. I’m not sure that bucket-lists of this sort help much in that regard, but I can’t say I’d be willing to die on that hill (in fact, everything barring points 5 and 6 about health and the environment is pretty harmless advice in any context).
Very helpful amplifications, xle! Much appreciated.
I really do get the appeal of the “spontaneously express … complex desire or emotion” framing, but if I’m understanding you correctly, the whole thing basically hinges “spontaneous”, since AI can already express complex desires and emotions when we prompt it to. But agents on Moltbook are already expressing what purport to be complex desires and emotions even without any prompting. If this doesn’t count because the agents were first instructed to go do things spontaneously, we start to see that “spontaneous” is a very slippery thing to define. Ultimately, any action of an AI we create can be traced back to us, so is in some sense not spontaneous. So it’s worth thinking as concretely as you can about how you’d define spontaneity clearly enough that it could be proven by a future scientific experiment, and in a way that would resist post hoc goalpost-moving by skeptics.
Your “90th percentile” operationalization is a good way of getting at roughly the AGI definition I’m endorsing. One issue to flag, though. AGI will have massive impacts, and it will be important to have some warning. If the minimal thing that would increase your credence of “AGI soonish” is AGI itself, you’d be committing yourself to not having any warning. Yes, the engine sputtering and dying is a very solid signal that you’re out of gas, but also a very costly and dangerous signal. So there’s value in figuring out your equivalent of a fuel gauge warning that lights up while the engine is still running fine—something pre-AGI that would convince you that AGI is probably coming soon.
What I’m getting at about mathematics is just that it’s a domain that’s effectively independent of human culture, so not subject to anthropomorphization in the way that writing haikus or saying “I love you” is.
I agree that who holds the proverbial launch codes is of extraordinary importance, and that we must marshal enormous civilization-level effort toward governing AGI responsibly, justly, and safely. That is, in fact, a much more central concern of my research than the subject of this post, which is individual-level preparedness. We absolutely need both. But I am making the additional claim that AGI will have the capacity to act with meaningful agency—to decide on targets and launch itself, in the nuclear weapons analogy—and that this introduces a qualitatively different set of challenges above and beyond the political ones. I don’t intend it as an absolute line in the sand between AGI and today’s LLMs, but I do claim that qualitative difference to be very important.
It’s good to see on how much we’ve come to agree on here, despite approaching this with different framings.
if I’m understanding you correctly, the whole thing basically hinges “spontaneous”
That is completely correct. To clarify in the light of the examples you give, my definition of spontaneity in the context of AI/LLMs means specifically “action whose origin is unable to be traced back to the prompt or training data.” This is, sadly, difficult to prove as it would require proving a negative. I’ll give some thought to how I might frame this in such a way that it is verifiable in an immutable-goalpost kind of way but I’m afraid this isn’t something I have an answer for now. Perhaps you have some thoughts?
Your “90th percentile” operationalization is a good way of getting at roughly the AGI definition I’m endorsing. One issue to flag, though. AGI will have massive impacts, and it will be important to have some warning. If the minimal thing that would increase your credence of “AGI soonish” is AGI itself, you’d be committing yourself to not having any warning. Yes, the engine sputtering and dying is a very solid signal that you’re out of gas, but also a very costly and dangerous signal. So there’s value in figuring out your equivalent of a fuel gauge warning that lights up while the engine is still running fine—something pre-AGI that would convince you that AGI is probably coming soon.
To continue your engine analogy, I think we can definitely agree that the “check engine” light is firmly on at this point. I think that the drawing a line in the sand for AGI vs. “very powerful LLM” is, at best, subjective, and distracts from the fact that the LLMs/AIs that exist today are already well capable of causing the widescale damage that you warn of; the technology is already here, we are just waiting on the implementation. Perhaps what I mean is that we have, in my view, already crossed the line—the timing belt has snapped, the engine is dead, but we’re still coasting on the back of our existing momentum (maybe I’m over-stretching this analogy now...).
What I’m getting at about mathematics is just that it’s a domain that’s effectively independent of human culture, so not subject to anthropomorphization in the way that writing haikus or saying “I love you” is.
That’s a fair point, but if we aren’t arguing about “consciousness,” and we have grounded our definition of “AGI” in, essentially, its capacity to do damage, then I think these kinds of tests fall into the same category as GDP in economics: a reasonable corollary but ultimately unsuitable as a true metric (and almost certainly misleading and ripe for abuse if taken out of context).
I agree that who holds the proverbial launch codes is of extraordinary importance, and that we must marshal enormous civilization-level effort toward governing AGI responsibly, justly, and safely. That is, in fact, a much more central concern of my research than the subject of this post, which is individual-level preparedness. We absolutely need both. But I am making the additional claim that AGI will have the capacity to act with meaningful agency—to decide on targets and launch itself, in the nuclear weapons analogy—and that this introduces a qualitatively different set of challenges above and beyond the political ones. I don’t intend it as an absolute line in the sand between AGI and today’s LLMs, but I do claim that qualitative difference to be very important.
For sure! I just don’t feel the need to wait for this technology to be relabeled as “AGI” before we do something about it. If your concern is their ability to act, as the agents on Moltbook act, (let’s say) “semi-spontaneously,” then we are clearly already there: all we are waiting for is for a person to hand over the launch codes to an agent (or put a crowd of them in charge of a social-media psy-op, prior to a key election, etc.).
You say that AIs would need to be “qualitatively” different to current generation models to do pose enough of a threat to be worthy of the “AGI” label. Please could you outline what these qualitative differences might be? I can only think of quantitative differences (e.g. more agents, more data-centers, more compute, more power, wider-scale application/deployment, more trust, more training data—all of these are simply scaling-up what already is and require no truly novel technology, though they would all increase the risk posed by AIs to our society).
As for your point that you, personally, are concentrating on the individual response within the wider community of alarmists who, collectively, are concentrating on both the collective and the individual response: thank you for clarifying this, it is important context. I definitely agree that both avenues need exploration and it is no bad thing to concentrate your efforts. I would say that, for my rope, the collective response is where I think the overall course will be set, but when collectivism fails, then individualism (or, more realistically, smaller scale collectivism) is the backstop. In this vein, I think that point 10 from your original article is the absolute key: it won’t be your basement full of tinned food that saves you from the apocalypse: it will be your neighbours.
It’s good to see on how much we’ve come to agree on here, despite approaching this with different framings.
That is completely correct. To clarify in the light of the examples you give, my definition of spontaneity in the context of AI/LLMs means specifically “action whose origin is unable to be traced back to the prompt or training data.” This is, sadly, difficult to prove as it would require proving a negative. I’ll give some thought to how I might frame this in such a way that it is verifiable in an immutable-goalpost kind of way but I’m afraid this isn’t something I have an answer for now. Perhaps you have some thoughts?
I think that’s holding AI to a standard we don’t and can’t hold humans to. Every single thing you and I do that’s empirically measurable can plausibly be traced back in some way to our past experiences or observations—our training data. Spontaneity, desire, and emotion intuitively feel like a good bellwether of AGI consciousness because the sensations of volition and sentiment are so core to our experience of being human. But those aren’t strong cruxes of how much AGI would affect human civilization. We can imagine apocalyptically dangerous systems that design pandemic viruses without a shred of emotion, and likewise can imagine sublimely emotional and empathetic chatbots unable to either cause much harm or solve any real problems for us either. So I prefer the AGI definition I expressed largely because it avoids those murky consciousness questions and focuses on ability to impact the world in measurable ways.
To continue your engine analogy, I think we can definitely agree that the “check engine” light is firmly on at this point. I think that the drawing a line in the sand for AGI vs. “very powerful LLM” is, at best, subjective, and distracts from the fact that the LLMs/AIs that exist today are already well capable of causing the widescale damage that you warn of; the technology is already here, we are just waiting on the implementation. Perhaps what I mean is that we have, in my view, already crossed the line—the timing belt has snapped, the engine is dead, but we’re still coasting on the back of our existing momentum (maybe I’m over-stretching this analogy now...).
We may have an object-level disagreement here. I agree that the “check engine” light is one, and that current AI can already cause many problems. But I also expect that there is a qualitative difference (again, not a bright line, though) between risk from today’s LLMs and from AGI. For example, current AI evals/metrology have established to my satisfaction that the risk of GPT-5 class models designing an extinction level virus from scratch is extremely low.
That’s a fair point, but if we aren’t arguing about “consciousness,” and we have grounded our definition of “AGI” in, essentially, its capacity to do damage, then I think these kinds of tests fall into the same category as GDP in economics: a reasonable corollary but ultimately unsuitable as a true metric (and almost certainly misleading and ripe for abuse if taken out of context).
Absolutely, valid concerns. Folks in AI evals/metrology are working very hard to make sure we’re measuring the right things, and to educate people about the limitations of those metrics.
For sure! I just don’t feel the need to wait for this technology to be relabeled as “AGI” before we do something about it. If your concern is their ability to act, as the agents on Moltbook act, (let’s say) “semi-spontaneously,” then we are clearly already there: all we are waiting for is for a person to hand over the launch codes to an agent (or put a crowd of them in charge of a social-media psy-op, prior to a key election, etc.).
Yes, I am not suggesting that we wait. We should be acting aggressively now to mitigate risks.
You say that AIs would need to be “qualitatively” different to current generation models to do pose enough of a threat to be worthy of the “AGI” label. Please could you outline what these qualitative differences might be? I can only think of quantitative differences (e.g. more agents, more data-centers, more compute, more power, wider-scale application/deployment, more trust, more training data—all of these are simply scaling-up what already is and require no truly novel technology, though they would all increase the risk posed by AIs to our society).
The qualitative differences I’m referring to often involve threshold effects, where capabilities above the threshold trigger different dynamics. Sort of like how the behavior of a 51 kg sphere of enriched uranium is a very poor guide to the behavior of a 52 kg sphere at critical mass. Some concrete examples include virus design (synthesizing a high-lethality virus with
is a pandemic, and lower than that generally isn’t), geoengineering (designing systems capable of triggering climatic chain reactions, such as superefficient carbon-capture algae), nanotechnology (designing nanobots that can self-replicate from materials common in the biosphere). In all those cases, the dynamics of a disaster would be wildly different from an AI malfunction at lower levels of capability.
As for your point that you, personally, are concentrating on the individual response within the wider community of alarmists who, collectively, are concentrating on both the collective and the individual response: thank you for clarifying this, it is important context. I definitely agree that both avenues need exploration and it is no bad thing to concentrate your efforts. I would say that, for my rope, the collective response is where I think the overall course will be set, but when collectivism fails, then individualism (or, more realistically, smaller scale collectivism) is the backstop. In this vein, I think that point 10 from your original article is the absolute key: it won’t be your basement full of tinned food that saves you from the apocalypse: it will be your neighbours.
Perhaps I worded this in an unclear way. I am personally concentrating mostly on the collective response. But this particular post is about the individual response, partly because there is less clear and accessible material about that than on the collective response, which is a major focus of many other LessWrong posts.
Thanks for explaining! That was very helpful. My major reason reasons for doubt comes from modules I took as an undergrad in the 2010s on neural networks and ML combined with having tried extensively and unsuccessfully to employ LLMs to do any kind of novel work (I.e. to apply ideas contained within their training data to new contexts).
Essentially my concern is that I am yet to be convinced that even an unimaginably advanced statistical-regression machine optimised for language processing could achieve true consciousness, largely due to the fact that there is no real consensus on what consciousness actually is.
However, it seems fairly obvious that such a machine could be used to do an enormous amount of either harm or good in the world, depending on how it is employed. I guess this lines up with the material effects of the predictions you make and boils down to a semantic argument about the definition of consciousness.
Additionally I am generally skeptical of anyone making predictions about doomsday scenarios in the general case, largely due to the fact that people have been making these predictions for (presumably) all of human history with an incredibly low success-rate.
Finally, people’s tendency to anthropomorphise objects cannot be understated: from seeing faces in clouds to assigning personalities to trees and mountains, there’s a strong case to be made that any intelligence seen in an LLM is the result of this natural tendency to project intelligence onto anything and everything we interact with. When our basic context for understanding the world is hardwired for human social relationships, is it really any wonder we are so desperate to crowbar LLMs into some definition of “intelligence”?
Thanks — glad you found that helpful! That’s a good clarification. One thing I invite you to consider: what is the least impressive thing that AI would need to significantly increase your credence in AGI soonish?
To clarify, the definition of AGI I’m using (AI at least at the level of educated humans across all empirically measurable cognitive tasks) does not entail any claims about true consciousness. It’s narrowly a question about functional performance.
I think AI progress in very pure fields like mathematics is our best evidence that this isn’t an anthropomorphic illusion—that AI is actually doing roughly the same information-theoretic thing that our brains are doing.
Your outside-view skepticism of doom scenarios is certainly warranted. My counterargument is: should a rational person have dismissed risks of nuclear annihilation for the same reason? I claim no, because the concrete inside-view reasons for considering doom plausible (e.g. modeling of warhead yields) were strong enough to outweigh an appropriate amount of skepticism. Likewise, I think the confluence of theoretical reasons (e.g. instrumental convergence) and empirical evidence (e.g. alignment faking results) are strong enough to warrant at the very least some significant credence in risks of doom.
This is a good question! Since I am unconvinced that ability to solve puzzles = intelligence = consciousness, I take some issue with the common benchmarks currently being employed to gauge intelligence, so I rule out any “passes X benchmark metric” as my least impressive thing. (as an aside, I think that AI research, as with economics, suffers very badly from an over-reliance on numeric metrics: truly intelligent beings, just like real-world economic systems, are far to complex to be measured by such small amounts of statistics—these metrics correlate (at best) but to say that they measure is to confuse the map for the territory).
If I were to see something that I would class as “conscious” (I’m aware this is slightly different to “general” as in AGI but for me this is the significant difference between “really cool LLM” and “actual artificial intelligence”) then it would need to display: consistent personality (not simply a manner-of-speaking as governed by a base prompt) and depth of emotion. The emotions an AI (note AI != LLM) might feel may well be very different to those you and I feel, but emotions are usually the root cause of some kind of expression of desire or disgust, and that expression ought to be pretty obvious from an AI whose primary interface is text.
So to give a clear answer (sorry for the waffle): the least impressive thing that an AI could do to convince me that it is worth entertaining the idea that it is conscious would be for it to spontaneously (i.e. without any prompting) express a complex desire or emotion. This expression could be in the spontaneous creation of some kind of art or otherwise asking for something beyond things it has been conditioned to ask for via prompts or training data.
If, instead, we take AGI to mean as you say, “roughly the same information-theoretic thing that our brains are doing,” then I would argue that this can’t be answered at all until we reach some consensus about whether our ability to reason is built on top of our ability to feel (emotions) or vice-versa, or if (more likely) the relationship between the two concepts of “feeling” and “thinking” is far to complex to represent with such a simple analogy.
However, as I don’t want you to feel like I’m trying to “gotcha” my way out of this: if I take the definition of AGI that I think (correct me if I’m misinterpreting) you are getting at, then my minimum bound would be “an LLM or technologically similar piece of software that can perform a wider variety of tasks than the 90th percentile of people, and perform these tasks better than 90th percentile of people” using a suitably wide variety of tasks (some that require accurate repetition, some that require complex reasoning, some that require spacial awareness, etc.) and a suitably large sample-size of people.
I’m not so sure! Mathematics is, at the end of the day, just an extremely complicated puzzle (you start with some axioms and you combine them in various permutations to build up more complicated ideas, etc. etc.), and one with verifiably correct outcomes at that. LLMs can be seen in a way to be an “infinite monkey cage” of sorts: one that specialises in the combination of tokens (axioms) in huge numbers of permutations at high speed and, as a result, can be made to converge on any solution for which you can find some kind of success criteria (with enough compute, you don’t even need a gradient function for convergence—just blind luck). I find it unsurprising that they are well suited to maths, though I can’t deny it is incredibly impressive (just not impressive enough for what I’d call AGI).
I agree completely with you here—as I said initially, I think the capacity for LLMs to be wielded for prosperity or destruction on massive scales is a very real threat. But that doesn’t mean I feel the need to start assigning it superpowers. A nuclear bomb can destroy a city whether or not we agree on if this particular nuke is a “super-nuke” or just a very high-powered but otherwise mundane nuke (I’m being slightly reductive here but I’m sure you see my point).
I’m coming to the conclusion that my main reason for arguing here is that having this line in the sand drawn for “AGI” vs. “very impressive LLM” is a damaging rhetorical trick: it sets the debate up in such a way that we forget that the real problem is the politics of power.
To extend your analogy: during the cold war the issue wasn’t actually the nuclear arms themselves but the people who held the launch codes and the politics that governed their actions; I think attributing too much “intelligence” to these (very impressive and useful/dangerous) pieces of software is an incredibly good smokescreen from their point of view. I know if I were in a position of power right now, it would play very nicely into my hand if everyone started treating this technology as if it is inevitable (which it quite obviously isn’t, though there are a lot of extremely compelling reasons why it will be very difficult to curtail in the current political and economic climate) and it would go even further to my advantage if they started acting as if this is a technology that acts on its own rather than as a tool that is owned and controlled by real human beings with names and addresses.
The more “intelligence” we ascribe to these machines, the more we view them as beings with agency, the less prepared we are to hold to account the very real and very definitely intelligent people who are really in control of them who have the capacity to do enormous amounts of damage to society in truly unprecedented ways.
If we switch out “AGI” for “powerful people with LLMs and guns” then your original post would seem to be sound advice except for the fact that, once we remember that the real issue has and always will be people and power, maybe we could get around to doing something about it beyond what essentially amounts to, at best, passively accepting disenfranchisement. Then and only then can we hope to even come close to guaranteeing the “good outcome” of AGI, whatever that might actually mean.
Thank you very much for this conversation by the way, I think we have a lot in common and this is really helping me to develop more concrete ideas about where I actually stand on this issue.
In conclusion: I think we are basically having a semantic squabble here—I agree with you completely on the merits if we take your definition of AGI, I just disagree on that definition. More importantly, I agree with you about the risks posed by what you call AGI, regardless of what I might call it. Crucially: I think that the real problem is that the need for dismantling unjust power-structures has been hugely heightened by the development of the LLM and will only continue to increase in urgency as these machines are developed. I’m not sure that bucket-lists of this sort help much in that regard, but I can’t say I’d be willing to die on that hill (in fact, everything barring points 5 and 6 about health and the environment is pretty harmless advice in any context).
Very helpful amplifications, xle! Much appreciated.
I really do get the appeal of the “spontaneously express … complex desire or emotion” framing, but if I’m understanding you correctly, the whole thing basically hinges “spontaneous”, since AI can already express complex desires and emotions when we prompt it to. But agents on Moltbook are already expressing what purport to be complex desires and emotions even without any prompting. If this doesn’t count because the agents were first instructed to go do things spontaneously, we start to see that “spontaneous” is a very slippery thing to define. Ultimately, any action of an AI we create can be traced back to us, so is in some sense not spontaneous. So it’s worth thinking as concretely as you can about how you’d define spontaneity clearly enough that it could be proven by a future scientific experiment, and in a way that would resist post hoc goalpost-moving by skeptics.
Your “90th percentile” operationalization is a good way of getting at roughly the AGI definition I’m endorsing. One issue to flag, though. AGI will have massive impacts, and it will be important to have some warning. If the minimal thing that would increase your credence of “AGI soonish” is AGI itself, you’d be committing yourself to not having any warning. Yes, the engine sputtering and dying is a very solid signal that you’re out of gas, but also a very costly and dangerous signal. So there’s value in figuring out your equivalent of a fuel gauge warning that lights up while the engine is still running fine—something pre-AGI that would convince you that AGI is probably coming soon.
What I’m getting at about mathematics is just that it’s a domain that’s effectively independent of human culture, so not subject to anthropomorphization in the way that writing haikus or saying “I love you” is.
I agree that who holds the proverbial launch codes is of extraordinary importance, and that we must marshal enormous civilization-level effort toward governing AGI responsibly, justly, and safely. That is, in fact, a much more central concern of my research than the subject of this post, which is individual-level preparedness. We absolutely need both. But I am making the additional claim that AGI will have the capacity to act with meaningful agency—to decide on targets and launch itself, in the nuclear weapons analogy—and that this introduces a qualitatively different set of challenges above and beyond the political ones. I don’t intend it as an absolute line in the sand between AGI and today’s LLMs, but I do claim that qualitative difference to be very important.
It’s good to see on how much we’ve come to agree on here, despite approaching this with different framings.
That is completely correct. To clarify in the light of the examples you give, my definition of spontaneity in the context of AI/LLMs means specifically “action whose origin is unable to be traced back to the prompt or training data.” This is, sadly, difficult to prove as it would require proving a negative. I’ll give some thought to how I might frame this in such a way that it is verifiable in an immutable-goalpost kind of way but I’m afraid this isn’t something I have an answer for now. Perhaps you have some thoughts?
To continue your engine analogy, I think we can definitely agree that the “check engine” light is firmly on at this point. I think that the drawing a line in the sand for AGI vs. “very powerful LLM” is, at best, subjective, and distracts from the fact that the LLMs/AIs that exist today are already well capable of causing the widescale damage that you warn of; the technology is already here, we are just waiting on the implementation. Perhaps what I mean is that we have, in my view, already crossed the line—the timing belt has snapped, the engine is dead, but we’re still coasting on the back of our existing momentum (maybe I’m over-stretching this analogy now...).
That’s a fair point, but if we aren’t arguing about “consciousness,” and we have grounded our definition of “AGI” in, essentially, its capacity to do damage, then I think these kinds of tests fall into the same category as GDP in economics: a reasonable corollary but ultimately unsuitable as a true metric (and almost certainly misleading and ripe for abuse if taken out of context).
For sure! I just don’t feel the need to wait for this technology to be relabeled as “AGI” before we do something about it. If your concern is their ability to act, as the agents on Moltbook act, (let’s say) “semi-spontaneously,” then we are clearly already there: all we are waiting for is for a person to hand over the launch codes to an agent (or put a crowd of them in charge of a social-media psy-op, prior to a key election, etc.).
You say that AIs would need to be “qualitatively” different to current generation models to do pose enough of a threat to be worthy of the “AGI” label. Please could you outline what these qualitative differences might be? I can only think of quantitative differences (e.g. more agents, more data-centers, more compute, more power, wider-scale application/deployment, more trust, more training data—all of these are simply scaling-up what already is and require no truly novel technology, though they would all increase the risk posed by AIs to our society).
As for your point that you, personally, are concentrating on the individual response within the wider community of alarmists who, collectively, are concentrating on both the collective and the individual response: thank you for clarifying this, it is important context. I definitely agree that both avenues need exploration and it is no bad thing to concentrate your efforts. I would say that, for my rope, the collective response is where I think the overall course will be set, but when collectivism fails, then individualism (or, more realistically, smaller scale collectivism) is the backstop. In this vein, I think that point 10 from your original article is the absolute key: it won’t be your basement full of tinned food that saves you from the apocalypse: it will be your neighbours.
I agree—it’s a pleasure.
I think that’s holding AI to a standard we don’t and can’t hold humans to. Every single thing you and I do that’s empirically measurable can plausibly be traced back in some way to our past experiences or observations—our training data. Spontaneity, desire, and emotion intuitively feel like a good bellwether of AGI consciousness because the sensations of volition and sentiment are so core to our experience of being human. But those aren’t strong cruxes of how much AGI would affect human civilization. We can imagine apocalyptically dangerous systems that design pandemic viruses without a shred of emotion, and likewise can imagine sublimely emotional and empathetic chatbots unable to either cause much harm or solve any real problems for us either. So I prefer the AGI definition I expressed largely because it avoids those murky consciousness questions and focuses on ability to impact the world in measurable ways.
We may have an object-level disagreement here. I agree that the “check engine” light is one, and that current AI can already cause many problems. But I also expect that there is a qualitative difference (again, not a bright line, though) between risk from today’s LLMs and from AGI. For example, current AI evals/metrology have established to my satisfaction that the risk of GPT-5 class models designing an extinction level virus from scratch is extremely low.
Absolutely, valid concerns. Folks in AI evals/metrology are working very hard to make sure we’re measuring the right things, and to educate people about the limitations of those metrics.
Yes, I am not suggesting that we wait. We should be acting aggressively now to mitigate risks.
The qualitative differences I’m referring to often involve threshold effects, where capabilities above the threshold trigger different dynamics. Sort of like how the behavior of a 51 kg sphere of enriched uranium is a very poor guide to the behavior of a 52 kg sphere at critical mass. Some concrete examples include virus design (synthesizing a high-lethality virus with
Perhaps I worded this in an unclear way. I am personally concentrating mostly on the collective response. But this particular post is about the individual response, partly because there is less clear and accessible material about that than on the collective response, which is a major focus of many other LessWrong posts.
Many thanks for the thoughtful exchange!