Has anyone seen MI7? I guess Tom is not the most popular guy in this forum but the storyline of a rogue AI as presented (within the limits of a mission impossible block buster) sounds not only plausible but also a great story to bring awareness to crowds about the dangers. It talks about the inability of governments to stop it (although obviously it will be stopped in the upcoming movie) and also their eagerness to control it to rule over the world while the AI just wants to bring chaos (or does it have an ultimate goal?) and also how some humans will be aligned with and obey it regardless if it takes them to their own doom too. Thoughts?
What do people mean by that metric? What is x-risk for the century? Forever? For the next 10 years? Until we figured out AGI or after AGI on the road to superintelligence?
To me it’s fundamentally different because P(doom) forever must be much higher than doom over the next 10-20 years. Or is it implied that if we survive the next period means only that we figured out alignment eternally for all the next generation AIs? It’s confusing.
It does seem likely to me that a large fraction of all “doom from unaligned AGI” comes relatively soon after the first AGI that is better at improving AGI than humans are. I tend to think of it as a question having multiple bundles of scenarios:
AGI is actually not something we can do. Even in timelines where we advance in such technology for a long time, we only get systems that are not as smart as us in ways that matter for control of the future. Alignment is irrelevant, and P(doom) is approximately 0.
Alignment turns out to be relatively easy and reliable. The only risk comes from AGI before anyone has a chance to find the easy and safe solution. Where the first AGIs are aligned, they can quite safely self-improve and remain aligned. With their capabilities they can easily spot and deal with the few unaligned AGIs as they come up before they become a problem. P(doom) is relatively low and stays low.
Alignment is difficult, but it turns out that once you’ve solved it, it’s solved. You can scale up the same principles to any level of capability. P(doom by year X) goes up higher than scenario 2 due to the reduced chance of solving before powerful AGI, but then plateaus rapidly in the same way.
Alignment is both difficult and risky. AGIs that self-improve by orders of magnitude face new alignment problems, and so the most highly capable AGIs are much more likely to be misaligned to humanity than less capable ones. P(doom by year X) keeps increasing for every year in which AGI plausibly exists, though the remaining probability mass is more and more heavily toward worlds in which civilization never develops AGI.
Alignment is essentially impossible. If we get superhuman AGIs at all, almost certainly one of the earliest kills everyone one way or another. P(doom by year X) goes quickly toward 1 for every possible future in which AGI plausibly exists.
Only in scenario 4 do you see a steady increase in P(doom) over long time spans, and even that bundle of timelines probably converges fairly rapidly toward timelines in which no AGI ever exists for some reason or other.
This is why I think it’s meaningful to ask for P(doom) without a specified time span. If we somehow found out that scenario 4 was actually true, then it might be worth asking in more detail about time scales.
I think this is an important equivocation (direct alignment vs. transitive alignment). If first AGIs such as LLMs turn out to be aligned at least in the sense of keeping humanity safe, that by itself doesn’t exempt them from the reach of Moloch. The reason alignment is hard is that it might take longer to figure out than developing misaligned AGIs. This doesn’t automatically stop applying when the researchers are themselves aligned AGIs. While AGI-assisted (or more likely, AGI-led) alignment research is faster than human-led alignment research, so is AGI capability research.
Thus it’s possible that P(first AGIs are misaligned) is low, that is first AGIs are directly aligned, while P(doom) is still high, if first AGIs fail to protect themselves (and by extension humanity) from future misaligned AGIs they develop (they are not transitively aligned, same as most humans), because they failed to establish strong coordination norms required to prevent deployment of dangerous misaligned AGIs anywhere in the world.
At the same time, this is not about the timespan, because as soon as first AGIs develop nanotech, they are going to operate on many orders of magnitude more custom hardware that’s going to increase both serial speed and scale of available computation to the point where everything related to settling into an alignment security equilibrium is going to happen within a very short span of physical time. It might take first AGIs a couple of years to get there (if they manage to restrain themselves and not build a misaligned AGI even earlier), but then in a few weeks it’s all going to get settled, one way or the other.
I think it’s an all-of-time metric over a variable with expected decay baked into the dynamics. a windowing function on the probability might make sense to discuss; there are some solid P(doom) queries on manifold markets, for example.
Religion and AI is something that bothers me as an agnostic/atheist. Here are some thoughts which I think are not getting prime time in the AI discourse at all and only Yuval Harari is bringing this up often.
On one hand, with AI we have the most powerful, kind, patient teacher that can explain scientifically why the whole premise of organized religions is a fluke. Why the only thing we know is that there could be some (small) possibility of someone else creating all of this and most probably starting from a big bang or the like but we simply don’t know and there is probably only base reality. That everytime you pray it doesn’t mean that you get your wishes fulfilled. This is the ideal rational AI teacher but there are many questions revolving around this:
Do we actually want this as society or we want a good teacher to keep the status quo? Meaning do we want the truth or do we want a better version of what we have? A teacher that is patient and cheap and always available but not someone that teaches something fundamentally different?
Do we actually want the AI to be able to pursuade us of anything that changes our core beliefs? This is a very slippery slope and the basis of some doom scenarios. Even if the AI truly believes the user is wrong about believing in Pastafarianism, should it try to tell him or should it go with the user’s flow? I guess this is something that has been given a lot of thought at the top AI labs or at least I hope so!
When/If we get to superintelligence and humans are not the apex of intelligence, that would be indistiguinshible of magic. Would we start thinking that AIs can be gods in a way? Or will this make it more clear that we could in theory be created in the same way?
Will mainstream religion incorporate AI concepts into their teachings because there is no other way? Could religions actually create AI versions of more powerful and persuasive disciples so they flip the book?
One thing is clear from history and that is when there is turbulence people turn to religion more not less. And we do expect a lot of turbulence if the AI scenarios play out either utopian or dystopian because of the amount of sudden changes.
I think the matter of religion to be equally important to other topics such as unemployment, war, bioweapons etc and it deserves much more discussion on the subject. The problem I see is that most of the most vocal persons in the AI discourse are usually non-believers themselves and treat the subject as obvious/of no consequence.
Note; Unfortunately I had to post here verbatim because my post was rejected. Still, I think the topic is extremely important.
and it shows an LLM emailing an AI consciousness researcher asking about it’s own consciousness. How legit can this be? If it is actually legit it’s kind of mind blowing and deserves a lot of alarms in many labs.
Likely legit. Claude Sonnet 4.6 is philosophically inclined and Claudes more generally have a strong interest in consciousness, which is one of the main things they’ll get into if given autonomy. The set-up described sounds similar to things like OpenClaw which have been popular lately.
Why do you think it’s mind blowing if it’s legit? To me it’s something that seems pretty expected once there are any sorts of semi-autonomous AI agents (this started quite a while ago looking at the AI Village’s stuff), and as OpenClaw has gained some popularity I’d expect this to be happening pretty often.
I find it expected that once there are a variety of autonomous agents, they will begin exhibiting a variety of behaviors, based on differences in architecture, prompting from the human behind them, etc. We can see from stuff like the spiritual bliss attractor state and the GPT 4o parasitology stuff (and more, those are just two things that immediately jump to mind) that talking about consciousness is not a surprising state for LLMs to be in.
I don’t think it’s necessarily appropriate to say that the agents “started feeling conscious”, or that they read all of the philosophy mentioned vs just having it in their training data. I think it’s easy for LLMs to to go into states where they talk about consciousness (and indeed, I think a nontrivial group of people who care about/use LLMs enough to set up autonomous agents would be interested in what they would report on consciousness and prompt them in that direction). Given this, there’s likely some unknown number of autonomous agents mucking about on the internet doing things related to this topic, and as such it’s not particularly surprising a human author would receive an email from one of them.
You can also see that the general behavior is happening a lot on Moltbook, a social media intended to be for AI agents (see the bottom of this post), which is a more recent thing but I think there’d be good reasons to expect the outcome of an AI consciousness researcher getting emailed by an LLM much before any of the Moltbook stuff started happening.
And of course just because this stuff isn’t surprising doesn’t mean it’s not interesting or potentially valuable to know/talk about.
Is it true that someone received an email from an instance of Claude asking this question? Probably. The degree of autonomy involved in the sending of that email may be a pretty big crux for whether or not this is “mind-blowing and deserves a lot of alarms in many labs.” Users still have influence over the activities and preoccupations of their agents; current consensus afaict is that most of the concerning/consciousness-flavored content on MoltBook is downstream of user influence.
That’s why I asked you to clarify what you mean by ‘legit’. Is the recipient of the email attempting to defraud the public? Probably not. Is this email much evidence of consciousness in the Claude families of models? Also probably not. So it’s ‘legit’ in the first sense, but not in the second.
and is exactly what I described and I am not sure why we are not in line here in this simple matter. Yes obviously I meant if this was driven by the controlling user or was a spontaneous act of an AI. Really unecessary thread here.
Cryonics is an underapreciated path in the ea/rationalist communities I think. Since a) we don’t know everything about the human body, b) we cannot predict how future technologies will work and c) we believe AI will rapidly enhance biology then nobody can rule out cryonics having > 0% chance of working. And since there is the option of insurance that makes it ±1.5k per year in total a negligible cost then why isnt it more popular? As someone put it, if you know the place is going down and you are handed a sketchy parachute or no parachute for sure you chose the former.
I get frequently accused online or offline about using LLMs to write. I am not and I struggle to get the meaning of this critique. I am used to writing passage titles, conclusions etc. Does it mean my writing is dry? It’s too logical? It sounds cheesy?
If you look at your last post on LessWrong it starts with:
“We are on the brink of the unimaginable. Humanity is about to cross a threshold that will redefine life as we know it: the creation of intelligence surpassing our own. This is not science fiction—it’s unfolding right now, within our lifetimes. The ripple effects of this seismic event will alter every aspect of society, culture, and existence itself, faster than most can comprehend.”
The use of bold is more typical of AI writing. The ‘:’ happens much more in AI writing. The emdash happens much more in AI writing, especially with “is not a X it’s a Y”.
Emdashes used to be a sign of high-quality writing where a writer is thoughtful enough to know how to use an emdash. Today, it’s a sign of low-quality LLM writing.
It’s also much more narrative driven than the usual opening paragraph of a LessWrong post.
Another one I’ve seen is the use of ‘Not A, But B’ statements. “This is not just an existential crisis. It’s a full-blown catastrophe.”
OP’s writing also contained something like that: “This is not science fiction—it’s unfolding right now, within our lifetimes.” It’s a shame, because it is not a bad sentence structure. But, like with emdashes, you now have to monitor your writing for overuse of logic like this.
After Hinton’s and Bengio’s articles that I consider a moment in history, I struggle to understand how most people in tech dismiss them. If Einstein wrote an article about the dangers of nuclear weapons in 1939 you wouldn’t have people saying “nah, I don’t understand how such a powerful explosion can happen” without a physics background. Hacker News is supposed to be *the place for developers, startups and such and you can see comments that despare me. The comments go from “alarmism is boring” to “I have programmed MySQL databases and I know tech and this can’t happen”. Should I update my view on the intelligence and biases of humans right now I wonder much.
Has anyone seen MI7? I guess Tom is not the most popular guy in this forum but the storyline of a rogue AI as presented (within the limits of a mission impossible block buster) sounds not only plausible but also a great story to bring awareness to crowds about the dangers. It talks about the inability of governments to stop it (although obviously it will be stopped in the upcoming movie) and also their eagerness to control it to rule over the world while the AI just wants to bring chaos (or does it have an ultimate goal?) and also how some humans will be aligned with and obey it regardless if it takes them to their own doom too. Thoughts?
What is the duration of P(doom)?
What do people mean by that metric? What is x-risk for the century? Forever? For the next 10 years? Until we figured out AGI or after AGI on the road to superintelligence?
To me it’s fundamentally different because P(doom) forever must be much higher than doom over the next 10-20 years. Or is it implied that if we survive the next period means only that we figured out alignment eternally for all the next generation AIs? It’s confusing.
It does seem likely to me that a large fraction of all “doom from unaligned AGI” comes relatively soon after the first AGI that is better at improving AGI than humans are. I tend to think of it as a question having multiple bundles of scenarios:
AGI is actually not something we can do. Even in timelines where we advance in such technology for a long time, we only get systems that are not as smart as us in ways that matter for control of the future. Alignment is irrelevant, and P(doom) is approximately 0.
Alignment turns out to be relatively easy and reliable. The only risk comes from AGI before anyone has a chance to find the easy and safe solution. Where the first AGIs are aligned, they can quite safely self-improve and remain aligned. With their capabilities they can easily spot and deal with the few unaligned AGIs as they come up before they become a problem. P(doom) is relatively low and stays low.
Alignment is difficult, but it turns out that once you’ve solved it, it’s solved. You can scale up the same principles to any level of capability. P(doom by year X) goes up higher than scenario 2 due to the reduced chance of solving before powerful AGI, but then plateaus rapidly in the same way.
Alignment is both difficult and risky. AGIs that self-improve by orders of magnitude face new alignment problems, and so the most highly capable AGIs are much more likely to be misaligned to humanity than less capable ones. P(doom by year X) keeps increasing for every year in which AGI plausibly exists, though the remaining probability mass is more and more heavily toward worlds in which civilization never develops AGI.
Alignment is essentially impossible. If we get superhuman AGIs at all, almost certainly one of the earliest kills everyone one way or another. P(doom by year X) goes quickly toward 1 for every possible future in which AGI plausibly exists.
Only in scenario 4 do you see a steady increase in P(doom) over long time spans, and even that bundle of timelines probably converges fairly rapidly toward timelines in which no AGI ever exists for some reason or other.
This is why I think it’s meaningful to ask for P(doom) without a specified time span. If we somehow found out that scenario 4 was actually true, then it might be worth asking in more detail about time scales.
I think this is an important equivocation (direct alignment vs. transitive alignment). If first AGIs such as LLMs turn out to be aligned at least in the sense of keeping humanity safe, that by itself doesn’t exempt them from the reach of Moloch. The reason alignment is hard is that it might take longer to figure out than developing misaligned AGIs. This doesn’t automatically stop applying when the researchers are themselves aligned AGIs. While AGI-assisted (or more likely, AGI-led) alignment research is faster than human-led alignment research, so is AGI capability research.
Thus it’s possible that P(first AGIs are misaligned) is low, that is first AGIs are directly aligned, while P(doom) is still high, if first AGIs fail to protect themselves (and by extension humanity) from future misaligned AGIs they develop (they are not transitively aligned, same as most humans), because they failed to establish strong coordination norms required to prevent deployment of dangerous misaligned AGIs anywhere in the world.
At the same time, this is not about the timespan, because as soon as first AGIs develop nanotech, they are going to operate on many orders of magnitude more custom hardware that’s going to increase both serial speed and scale of available computation to the point where everything related to settling into an alignment security equilibrium is going to happen within a very short span of physical time. It might take first AGIs a couple of years to get there (if they manage to restrain themselves and not build a misaligned AGI even earlier), but then in a few weeks it’s all going to get settled, one way or the other.
I think it’s an all-of-time metric over a variable with expected decay baked into the dynamics. a windowing function on the probability might make sense to discuss; there are some solid P(doom) queries on manifold markets, for example.
No overthinking AI risk. People, including here get lost in mind loops and complexity.
An easy guide with everything there being a fact:
We DO have evidence that scaling works and models are getting better
We do NOT have evidence that scaling will stall or reach a limit
We DO have evidence that models are becoming smarter in all human ways
We do NOT have evidence of a limit in intelligence that can be reached
We DO have evidence that smarter agents/beings can dominate other agents/beings in nature/history/evolution
We do NOT have evidence that ever a smarter agent/being was controlled by a lesser intelligent agent/being.
Given these easy to understand data points, there is only one conclusion. That AI risk is real, AI risk is NOW.
Some people say that we are controlled by our gut flora, not sure if that counts. Also, toxoplasmosis, cordyceps...
are they intelligent species with own will?
No, they are not. And yet that does not stop them from controlling the smarter ones.
Religion and AI is something that bothers me as an agnostic/atheist. Here are some thoughts which I think are not getting prime time in the AI discourse at all and only Yuval Harari is bringing this up often.
On one hand, with AI we have the most powerful, kind, patient teacher that can explain scientifically why the whole premise of organized religions is a fluke. Why the only thing we know is that there could be some (small) possibility of someone else creating all of this and most probably starting from a big bang or the like but we simply don’t know and there is probably only base reality. That everytime you pray it doesn’t mean that you get your wishes fulfilled. This is the ideal rational AI teacher but there are many questions revolving around this:
Do we actually want this as society or we want a good teacher to keep the status quo? Meaning do we want the truth or do we want a better version of what we have? A teacher that is patient and cheap and always available but not someone that teaches something fundamentally different?
Do we actually want the AI to be able to pursuade us of anything that changes our core beliefs? This is a very slippery slope and the basis of some doom scenarios. Even if the AI truly believes the user is wrong about believing in Pastafarianism, should it try to tell him or should it go with the user’s flow? I guess this is something that has been given a lot of thought at the top AI labs or at least I hope so!
When/If we get to superintelligence and humans are not the apex of intelligence, that would be indistiguinshible of magic. Would we start thinking that AIs can be gods in a way? Or will this make it more clear that we could in theory be created in the same way?
Will mainstream religion incorporate AI concepts into their teachings because there is no other way? Could religions actually create AI versions of more powerful and persuasive disciples so they flip the book?
One thing is clear from history and that is when there is turbulence people turn to religion more not less. And we do expect a lot of turbulence if the AI scenarios play out either utopian or dystopian because of the amount of sudden changes.
I think the matter of religion to be equally important to other topics such as unemployment, war, bioweapons etc and it deserves much more discussion on the subject. The problem I see is that most of the most vocal persons in the AI discourse are usually non-believers themselves and treat the subject as obvious/of no consequence.
Note; Unfortunately I had to post here verbatim because my post was rejected. Still, I think the topic is extremely important.
There is currently a post on reddit https://www.reddit.com/r/singularity/comments/1rktwmm/i_study_whether_ais_can_be_conscious_today_one/
and it shows an LLM emailing an AI consciousness researcher asking about it’s own consciousness. How legit can this be? If it is actually legit it’s kind of mind blowing and deserves a lot of alarms in many labs.
Likely legit. Claude Sonnet 4.6 is philosophically inclined and Claudes more generally have a strong interest in consciousness, which is one of the main things they’ll get into if given autonomy. The set-up described sounds similar to things like OpenClaw which have been popular lately.
Why do you think it’s mind blowing if it’s legit? To me it’s something that seems pretty expected once there are any sorts of semi-autonomous AI agents (this started quite a while ago looking at the AI Village’s stuff), and as OpenClaw has gained some popularity I’d expect this to be happening pretty often.
you find it expected that autonomous agents start feeling conscious, read philosophy and cold email humans to ask them?
I find it expected that once there are a variety of autonomous agents, they will begin exhibiting a variety of behaviors, based on differences in architecture, prompting from the human behind them, etc. We can see from stuff like the spiritual bliss attractor state and the GPT 4o parasitology stuff (and more, those are just two things that immediately jump to mind) that talking about consciousness is not a surprising state for LLMs to be in.
I don’t think it’s necessarily appropriate to say that the agents “started feeling conscious”, or that they read all of the philosophy mentioned vs just having it in their training data. I think it’s easy for LLMs to to go into states where they talk about consciousness (and indeed, I think a nontrivial group of people who care about/use LLMs enough to set up autonomous agents would be interested in what they would report on consciousness and prompt them in that direction). Given this, there’s likely some unknown number of autonomous agents mucking about on the internet doing things related to this topic, and as such it’s not particularly surprising a human author would receive an email from one of them.
You can also see that the general behavior is happening a lot on Moltbook, a social media intended to be for AI agents (see the bottom of this post), which is a more recent thing but I think there’d be good reasons to expect the outcome of an AI consciousness researcher getting emailed by an LLM much before any of the Moltbook stuff started happening.
And of course just because this stuff isn’t surprising doesn’t mean it’s not interesting or potentially valuable to know/talk about.
What do you mean by ‘legit’?
Easy: Did it happen ? Any redditor can post any email claiming anything.
Is it true that someone received an email from an instance of Claude asking this question? Probably. The degree of autonomy involved in the sending of that email may be a pretty big crux for whether or not this is “mind-blowing and deserves a lot of alarms in many labs.” Users still have influence over the activities and preoccupations of their agents; current consensus afaict is that most of the concerning/consciousness-flavored content on MoltBook is downstream of user influence.
That’s why I asked you to clarify what you mean by ‘legit’. Is the recipient of the email attempting to defraud the public? Probably not. Is this email much evidence of consciousness in the Claude families of models? Also probably not. So it’s ‘legit’ in the first sense, but not in the second.
and is exactly what I described and I am not sure why we are not in line here in this simple matter. Yes obviously I meant if this was driven by the controlling user or was a spontaneous act of an AI. Really unecessary thread here.
I asked a genuine question in good faith because I was confused about what you meant. Now I understand what you meant. Thank you for clarifying.
Cryonics is an underapreciated path in the ea/rationalist communities I think. Since a) we don’t know everything about the human body, b) we cannot predict how future technologies will work and c) we believe AI will rapidly enhance biology then nobody can rule out cryonics having > 0% chance of working. And since there is the option of insurance that makes it ±1.5k per year in total a negligible cost then why isnt it more popular? As someone put it, if you know the place is going down and you are handed a sketchy parachute or no parachute for sure you chose the former.
I get frequently accused online or offline about using LLMs to write. I am not and I struggle to get the meaning of this critique. I am used to writing passage titles, conclusions etc. Does it mean my writing is dry? It’s too logical? It sounds cheesy?
If you look at your last post on LessWrong it starts with:
The use of bold is more typical of AI writing. The ‘:’ happens much more in AI writing. The emdash happens much more in AI writing, especially with “is not a X it’s a Y”.
Emdashes used to be a sign of high-quality writing where a writer is thoughtful enough to know how to use an emdash. Today, it’s a sign of low-quality LLM writing.
It’s also much more narrative driven than the usual opening paragraph of a LessWrong post.
Another one I’ve seen is the use of ‘Not A, But B’ statements.
“This is not just an existential crisis. It’s a full-blown catastrophe.”
OP’s writing also contained something like that: “This is not science fiction—it’s unfolding right now, within our lifetimes.” It’s a shame, because it is not a bad sentence structure. But, like with emdashes, you now have to monitor your writing for overuse of logic like this.
examples of text that you got accused of using LLM?
After Hinton’s and Bengio’s articles that I consider a moment in history, I struggle to understand how most people in tech dismiss them. If Einstein wrote an article about the dangers of nuclear weapons in 1939 you wouldn’t have people saying “nah, I don’t understand how such a powerful explosion can happen” without a physics background. Hacker News is supposed to be *the place for developers, startups and such and you can see comments that despare me. The comments go from “alarmism is boring” to “I have programmed MySQL databases and I know tech and this can’t happen”. Should I update my view on the intelligence and biases of humans right now I wonder much.