I think if you expect some architecture to be a lot more efficient than current LLMs, you should not talk about it!
LessWrong isn’t a secret society with vows to not spread the knowledge we’ve learned here if it might cause the end of the world. It’s a public website, and many who work on AI capabilities read it.
If you want the community to pay attention to the fact that other architectures might scale well- that’s okay, feel free to talk about it.
If you think you have exceptionally good intuitions about what will scale, sharing this publicly eats maybe the single most valuable common resource that we have: the timeline. Please don’t do that & don’t accelerate AI capabilities.
I don’t claim to have any secret knowledge or exceptional intuition about what will scale well. Everything I link to is already public, and already quite well known (test time training was the number 2 pinned post on r/singularity, hierarchical reasoning model is blowing up X/twitter, Arc-agi without pretraining was front page on hacker news). In fact, commenters keep bringing up links that I already featured in the main post, suggesting that these developments are well known even within the lesswrong memesphere.
With that in mind, my goal was less to warn about any individual architecture but to point out the overall trend of alternative architecture work I’m observing, and some reasons I expect the trend to continue. I want to make sure that AI safety does not over index on LLM safety and ignore other avenues of risk. Basically exactly what you said here:
If you want the community to pay attention to the fact that other architectures might scale well- that’s okay, feel free to talk about it.
I think if you expect some architecture to be a lot more efficient than current LLMs, you should not talk about it!
LessWrong isn’t a secret society with vows to not spread the knowledge we’ve learned here if it might cause the end of the world. It’s a public website, and many who work on AI capabilities read it.
If you want the community to pay attention to the fact that other architectures might scale well- that’s okay, feel free to talk about it.
If you think you have exceptionally good intuitions about what will scale, sharing this publicly eats maybe the single most valuable common resource that we have: the timeline. Please don’t do that & don’t accelerate AI capabilities.
Hey Mikhail,
I don’t claim to have any secret knowledge or exceptional intuition about what will scale well. Everything I link to is already public, and already quite well known (test time training was the number 2 pinned post on r/singularity, hierarchical reasoning model is blowing up X/twitter, Arc-agi without pretraining was front page on hacker news). In fact, commenters keep bringing up links that I already featured in the main post, suggesting that these developments are well known even within the lesswrong memesphere.
With that in mind, my goal was less to warn about any individual architecture but to point out the overall trend of alternative architecture work I’m observing, and some reasons I expect the trend to continue. I want to make sure that AI safety does not over index on LLM safety and ignore other avenues of risk. Basically exactly what you said here:
Hope this makes sense.