I don’t appreciate the local discourse norm of “let’s not mention the scary ideas but rest assured they’re very very scary”. It’s not healthy. If you explained the idea, we could shoot it down! But if it’s scary and hidden then we can’t.
Also, multiple frontier labs are currently working on it and you think your lesswrong comment is going to make a difference?
You should at least say by when you will consider this specific single breakthrough thing to be falsified.
There’s quite a difference between a couple frontier labs achieving AGI internally and the whole internet being able to achieve AGI on a llama/deepseek base model, for example.
Do the currently missing LLM abilities scale like pre-training, where each improvement requires spending 10x as much money?
Or do the currently missing abilities scale more like “reasoning”, where individual university groups could fine-tune an existing model for under $5,000 in GPU costs, and give it significant new abilities?
Or is the real situation somewhere in between?
Category (2) is what Bolstrom described as a “vulnerable world”, or a “recipe for ruin.” Also, not everyone believes that “alignment” will actually work for ASI. Under these assumptions, widely publishing detailed proposals in category (2) would seem unwise?
Also, even I believed that someone would figure out the necessary insights to build AGI, it still matters how quickly they do it. Given a choice of dying of cancer in 6 months or 12 (all other things being equal), I would pick 12.
(I really ought to make an actual discussion post on the right way to handle even “recipes for small-scale ruin.” After September 11th, this was a regular discussion among engineers and STEM types. It turns out that there are some truly nasty vulnerabilities that are known to experts, but that are not widely known to the public. If these vulnerabilities can be fixed, it’s usually better to publicize them. But what should you do if a vulnerability is fundamentally unfixable?)
Exactly! The frontier labs have the compute and incentive to push capabilities forward, while randos on lesswrong are instead more likely to study alignment in weak open source models
I think that we have both the bitter lesson that transformers will continue to gain capabilities with scale and also that there are optimizations that will apply to intelligent models generally and orthogonally to computing scale. The latter details seem dangerous to publicize widely in case we happen to be in the world of a hardware overhang allowing AGI or RSI (which I think could be achieved easier/sooner by a “narrower” coding agent and then leading rapidly to AGI) on smaller-than-datacenter clusters of machines today.
I don’t appreciate the local discourse norm of “let’s not mention the scary ideas but rest assured they’re very very scary”. It’s not healthy. If you explained the idea, we could shoot it down! But if it’s scary and hidden then we can’t.
Also, multiple frontier labs are currently working on it and you think your lesswrong comment is going to make a difference?
You should at least say by when you will consider this specific single breakthrough thing to be falsified.
The universe isn’t obligated to cooperate with our ideals for discourse norms.
Exactly
The universe doesn’t care if you try to hide your oh so secret insights; multiple frontier labs are working on those insights
The only people who care are the people here getting more doomy and having worse norms for conversations.
There’s quite a difference between a couple frontier labs achieving AGI internally and the whole internet being able to achieve AGI on a llama/deepseek base model, for example.
One of my key concerns is the question of:
Do the currently missing LLM abilities scale like pre-training, where each improvement requires spending 10x as much money?
Or do the currently missing abilities scale more like “reasoning”, where individual university groups could fine-tune an existing model for under $5,000 in GPU costs, and give it significant new abilities?
Or is the real situation somewhere in between?
Category (2) is what Bolstrom described as a “vulnerable world”, or a “recipe for ruin.” Also, not everyone believes that “alignment” will actually work for ASI. Under these assumptions, widely publishing detailed proposals in category (2) would seem unwise?
Also, even I believed that someone would figure out the necessary insights to build AGI, it still matters how quickly they do it. Given a choice of dying of cancer in 6 months or 12 (all other things being equal), I would pick 12.
(I really ought to make an actual discussion post on the right way to handle even “recipes for small-scale ruin.” After September 11th, this was a regular discussion among engineers and STEM types. It turns out that there are some truly nasty vulnerabilities that are known to experts, but that are not widely known to the public. If these vulnerabilities can be fixed, it’s usually better to publicize them. But what should you do if a vulnerability is fundamentally unfixable?)
Exactly! The frontier labs have the compute and incentive to push capabilities forward, while randos on lesswrong are instead more likely to study alignment in weak open source models
I think that we have both the bitter lesson that transformers will continue to gain capabilities with scale and also that there are optimizations that will apply to intelligent models generally and orthogonally to computing scale. The latter details seem dangerous to publicize widely in case we happen to be in the world of a hardware overhang allowing AGI or RSI (which I think could be achieved easier/sooner by a “narrower” coding agent and then leading rapidly to AGI) on smaller-than-datacenter clusters of machines today.