there exists a single, clever insight which would close at least half the remaining distance to AGI
By my recollection, this specific possibility (and neighboring ones, like “two key insights” or whatever) has been one of the major drivers of existential fear in this community for at least as long as I’ve been part of it. I think Eliezer expressed something similar. Something like “For all we know, we’re just one clever idea away from AGI, and some guy in his basement will think of it and build it. That could happen at any time.”
I don’t know your reasons for thinking we’re just one insight away, and you explicitly say you don’t want to present the arguments here. Which makes sense to me!
I just want to note that from where I’m standing, this kind of thinking and communicating sure looks like a possible example of the type of communication pattern I’m talking about in the OP. I’m actually not picky about the trauma model specifically. But it totally fits the bill of “Models spread based significantly on how much doom they seem to plausibly forecast.” Which makes some sense if there really is a severe doom you’re trying to forecast! But it also puts a weird evolutionary incentive on the memes: if they can, they’ll develop mutations designed to seem very plausible and amplify the feeling of doom, decoupling from that pesky reality that slows down how effectively the memes can mutate content designed to encourage their spread.
I can’t know whether or not that’s what’s going on with your “one clever insight away” model, or with why you’re sharing it the way you are. I’d have to see the reasoning. I don’t mean to dismiss what you’re saying as “just trauma”; that’d be a doubly unkind way of oversimplifying what I’m saying.
But at the same time, I find myself skeptical of any naked AI doom model that sounds like “Look, we’re basically guaranteed to be screwed, but this margin is too small for me to explain. But the conclusion is very very bad.”
I cannot distinguish between that being an honest report of a good model that’s too big or gnarly to explain, versus a virulent meme riding on folks’ subconscious fixation (for whatever reason) on doom.
And thus I put it in my “Well, maybe.” box, and mostly ignore it.
and neighboring ones, like “two key insights” or whatever
I… kinda feel like there’s been one key insight since you were in the community? Specifically I’m thinking of transformers, or whatever it is that got us from pre-GPT era to GPT era.
Depending on what counts as “key” of course. My impression is there’s been significant algorithmic improvements since then but not on the same scale. To be fair it sounds like Random Developer has a lower threshold than I took the phrase to mean.
But I do think someone guessing “two key insights away from AGI” in say 2010, and now guessing “one key insight away from AGI”, might just have been right then and be right now?
(I’m aware that you’re not saying they’re not, but it seemed worth noting.)
(Re the “missed the point” reaction, I claim that it’s not so much that I missed the point as that I wasn’t aiming for the point. But I recognize that reactions aren’t able to draw distinctions that finely.)
By my recollection, this specific possibility (and neighboring ones, like “two key insights” or whatever) has been one of the major drivers of existential fear in this community for at least as long as I’ve been part of it.
I work with LLMs professionally, and my job currently depends on accurate capabilities evaluation. To give you an idea of the scale, I sometimes run a quarter million LLM requests a day. Which isn’t that much, but it’s something.
A year ago, I would have vaguely guesstimated that we were about “4-5 breakthroughs” away. But those were mostly unknown breakthroughs. One of those breakthroughs actually occurred (reasoning models and mostly coherent handling of multistep tasks).
But I’ve spent a lot of time since then experimenting with reasoning models, running benchmarks, and reading papers.
When I predict that “~1 breakthrough might close half the remaining distance to AGI,” I now have something much more specific in mind. There are multiple research groups working hard on it, including at least one frontier lab. I could sketch out a concrete research plan and argue in fairly specific detail why this is the right place to look for a breakthrough. I have written down very specific predictions (and stored them somewhere safe), just to keep myself honest.
If I thought getting close to AGI was a good thing, then I believe in this idea enough to spend, oh, US$20k out of pocket renting GPUs. I’ll accept that I’m likely wrong on the details, but I think I have a decent chance of being in the ballpark. I could at least fail interestingly enough to get a job offer somewhere with real resources.
But I strongly suspect that AGI leads almost inevitably to ASI, and to loss of human control over our futures.
And thus I put it in my “Well, maybe.” box, and mostly ignore it.
Good. I am walking a very fine line here. I am trying to be just credible and specific enough to encourage a few smart people to stop poking the demon core quite so enthusiastically, but not so specific and credible that I make anyone say, “Oh, that might work! I wonder if anyone working on that is hiring.”
I am painfully aware that OpenAI was founded to prevent a loss of human control, and that it has arguably done more than any other human organization to cause what it was founded to prevent.
(And please note—I have updated away from AI doom in the past, and there are conditions under which I would absolutely do so again. It’s just 2028 is a terrible year for making updates on my model, since my models for “AI Doom” and “AI fizzle” make many of the same predictions for the next few years.)
I don’t appreciate the local discourse norm of “let’s not mention the scary ideas but rest assured they’re very very scary”. It’s not healthy. If you explained the idea, we could shoot it down! But if it’s scary and hidden then we can’t.
Also, multiple frontier labs are currently working on it and you think your lesswrong comment is going to make a difference?
You should at least say by when you will consider this specific single breakthrough thing to be falsified.
There’s quite a difference between a couple frontier labs achieving AGI internally and the whole internet being able to achieve AGI on a llama/deepseek base model, for example.
Do the currently missing LLM abilities scale like pre-training, where each improvement requires spending 10x as much money?
Or do the currently missing abilities scale more like “reasoning”, where individual university groups could fine-tune an existing model for under $5,000 in GPU costs, and give it significant new abilities?
Or is the real situation somewhere in between?
Category (2) is what Bolstrom described as a “vulnerable world”, or a “recipe for ruin.” Also, not everyone believes that “alignment” will actually work for ASI. Under these assumptions, widely publishing detailed proposals in category (2) would seem unwise?
Also, even I believed that someone would figure out the necessary insights to build AGI, it still matters how quickly they do it. Given a choice of dying of cancer in 6 months or 12 (all other things being equal), I would pick 12.
(I really ought to make an actual discussion post on the right way to handle even “recipes for small-scale ruin.” After September 11th, this was a regular discussion among engineers and STEM types. It turns out that there are some truly nasty vulnerabilities that are known to experts, but that are not widely known to the public. If these vulnerabilities can be fixed, it’s usually better to publicize them. But what should you do if a vulnerability is fundamentally unfixable?)
Exactly! The frontier labs have the compute and incentive to push capabilities forward, while randos on lesswrong are instead more likely to study alignment in weak open source models
I think that we have both the bitter lesson that transformers will continue to gain capabilities with scale and also that there are optimizations that will apply to intelligent models generally and orthogonally to computing scale. The latter details seem dangerous to publicize widely in case we happen to be in the world of a hardware overhang allowing AGI or RSI (which I think could be achieved easier/sooner by a “narrower” coding agent and then leading rapidly to AGI) on smaller-than-datacenter clusters of machines today.
By my recollection, this specific possibility (and neighboring ones, like “two key insights” or whatever) has been one of the major drivers of existential fear in this community for at least as long as I’ve been part of it. I think Eliezer expressed something similar. Something like “For all we know, we’re just one clever idea away from AGI, and some guy in his basement will think of it and build it. That could happen at any time.”
I don’t know your reasons for thinking we’re just one insight away, and you explicitly say you don’t want to present the arguments here. Which makes sense to me!
I just want to note that from where I’m standing, this kind of thinking and communicating sure looks like a possible example of the type of communication pattern I’m talking about in the OP. I’m actually not picky about the trauma model specifically. But it totally fits the bill of “Models spread based significantly on how much doom they seem to plausibly forecast.” Which makes some sense if there really is a severe doom you’re trying to forecast! But it also puts a weird evolutionary incentive on the memes: if they can, they’ll develop mutations designed to seem very plausible and amplify the feeling of doom, decoupling from that pesky reality that slows down how effectively the memes can mutate content designed to encourage their spread.
I can’t know whether or not that’s what’s going on with your “one clever insight away” model, or with why you’re sharing it the way you are. I’d have to see the reasoning. I don’t mean to dismiss what you’re saying as “just trauma”; that’d be a doubly unkind way of oversimplifying what I’m saying.
But at the same time, I find myself skeptical of any naked AI doom model that sounds like “Look, we’re basically guaranteed to be screwed, but this margin is too small for me to explain. But the conclusion is very very bad.”
I cannot distinguish between that being an honest report of a good model that’s too big or gnarly to explain, versus a virulent meme riding on folks’ subconscious fixation (for whatever reason) on doom.
And thus I put it in my “Well, maybe.” box, and mostly ignore it.
I… kinda feel like there’s been one key insight since you were in the community? Specifically I’m thinking of transformers, or whatever it is that got us from pre-GPT era to GPT era.
Depending on what counts as “key” of course. My impression is there’s been significant algorithmic improvements since then but not on the same scale. To be fair it sounds like Random Developer has a lower threshold than I took the phrase to mean.
But I do think someone guessing “two key insights away from AGI” in say 2010, and now guessing “one key insight away from AGI”, might just have been right then and be right now?
(I’m aware that you’re not saying they’re not, but it seemed worth noting.)
(Re the “missed the point” reaction, I claim that it’s not so much that I missed the point as that I wasn’t aiming for the point. But I recognize that reactions aren’t able to draw distinctions that finely.)
I work with LLMs professionally, and my job currently depends on accurate capabilities evaluation. To give you an idea of the scale, I sometimes run a quarter million LLM requests a day. Which isn’t that much, but it’s something.
A year ago, I would have vaguely guesstimated that we were about “4-5 breakthroughs” away. But those were mostly unknown breakthroughs. One of those breakthroughs actually occurred (reasoning models and mostly coherent handling of multistep tasks).
But I’ve spent a lot of time since then experimenting with reasoning models, running benchmarks, and reading papers.
When I predict that “~1 breakthrough might close half the remaining distance to AGI,” I now have something much more specific in mind. There are multiple research groups working hard on it, including at least one frontier lab. I could sketch out a concrete research plan and argue in fairly specific detail why this is the right place to look for a breakthrough. I have written down very specific predictions (and stored them somewhere safe), just to keep myself honest.
If I thought getting close to AGI was a good thing, then I believe in this idea enough to spend, oh, US$20k out of pocket renting GPUs. I’ll accept that I’m likely wrong on the details, but I think I have a decent chance of being in the ballpark. I could at least fail interestingly enough to get a job offer somewhere with real resources.
But I strongly suspect that AGI leads almost inevitably to ASI, and to loss of human control over our futures.
Good. I am walking a very fine line here. I am trying to be just credible and specific enough to encourage a few smart people to stop poking the demon core quite so enthusiastically, but not so specific and credible that I make anyone say, “Oh, that might work! I wonder if anyone working on that is hiring.”
I am painfully aware that OpenAI was founded to prevent a loss of human control, and that it has arguably done more than any other human organization to cause what it was founded to prevent.
(And please note—I have updated away from AI doom in the past, and there are conditions under which I would absolutely do so again. It’s just 2028 is a terrible year for making updates on my model, since my models for “AI Doom” and “AI fizzle” make many of the same predictions for the next few years.)
I don’t appreciate the local discourse norm of “let’s not mention the scary ideas but rest assured they’re very very scary”. It’s not healthy. If you explained the idea, we could shoot it down! But if it’s scary and hidden then we can’t.
Also, multiple frontier labs are currently working on it and you think your lesswrong comment is going to make a difference?
You should at least say by when you will consider this specific single breakthrough thing to be falsified.
The universe isn’t obligated to cooperate with our ideals for discourse norms.
Exactly
The universe doesn’t care if you try to hide your oh so secret insights; multiple frontier labs are working on those insights
The only people who care are the people here getting more doomy and having worse norms for conversations.
There’s quite a difference between a couple frontier labs achieving AGI internally and the whole internet being able to achieve AGI on a llama/deepseek base model, for example.
One of my key concerns is the question of:
Do the currently missing LLM abilities scale like pre-training, where each improvement requires spending 10x as much money?
Or do the currently missing abilities scale more like “reasoning”, where individual university groups could fine-tune an existing model for under $5,000 in GPU costs, and give it significant new abilities?
Or is the real situation somewhere in between?
Category (2) is what Bolstrom described as a “vulnerable world”, or a “recipe for ruin.” Also, not everyone believes that “alignment” will actually work for ASI. Under these assumptions, widely publishing detailed proposals in category (2) would seem unwise?
Also, even I believed that someone would figure out the necessary insights to build AGI, it still matters how quickly they do it. Given a choice of dying of cancer in 6 months or 12 (all other things being equal), I would pick 12.
(I really ought to make an actual discussion post on the right way to handle even “recipes for small-scale ruin.” After September 11th, this was a regular discussion among engineers and STEM types. It turns out that there are some truly nasty vulnerabilities that are known to experts, but that are not widely known to the public. If these vulnerabilities can be fixed, it’s usually better to publicize them. But what should you do if a vulnerability is fundamentally unfixable?)
Exactly! The frontier labs have the compute and incentive to push capabilities forward, while randos on lesswrong are instead more likely to study alignment in weak open source models
I think that we have both the bitter lesson that transformers will continue to gain capabilities with scale and also that there are optimizations that will apply to intelligent models generally and orthogonally to computing scale. The latter details seem dangerous to publicize widely in case we happen to be in the world of a hardware overhang allowing AGI or RSI (which I think could be achieved easier/sooner by a “narrower” coding agent and then leading rapidly to AGI) on smaller-than-datacenter clusters of machines today.