As someone who has loved the idea of enlightenment/individual transcendence for a long time, I now fall on the side of believing it is impossible.
Many famous historical figures were spiritually intense. This helped some (e.g. Muhammad, Louis IX) excel at rule and warfare. But nowadays the most effective people seem like they were born with some unusual inclination and just gradually accumulated power in their domain. In contrast, the most spiritually enlightened people are characterized by talking about enlightenment and spirituality all the time, rather than their preeminence in mathematics or philosophy or government.
I’m new to rationality, but I get the impression that rationalist training hasn’t been able to achieve individual transcendence either. “A Sense That More is Possible” was written 17 years ago, yet rationalists are not yet widely known for their evident auras of awesomeness, and CFAR declared “narrative bankruptcy” in 2022. If there has been some meaningful progress that a veteran could report, I’d be interested to hear it.
The biggest changes in society seem to come from new spiritual and social technologies (e.g. Christianity, academic institutions, capitalism, science) that coordinate behavior across many individuals rather than any individual awakening. In this sense maybe LessWrong is the significant innovation rather than rationalist training.
I personally believe that my exposure to the rationalist community has made me significantly more intelligent and capable along a number of axes. It has also caused me to move from working in chemistry to working in AI alignment. Overall, this is probably a 10-100x multiplier on my impact on the future, it’s just that most people don’t have any impact at all.
This is a really very impressive feat for a web forum and Harry Potter fanfiction to achieve, (N=1 but I doubt I’m atypical) but like, 1-2 OOMs isn’t much in the grand scheme of things. If everyone was as strongly benefited as I was, then yeah, the world would change in aggregate, but for a few hundred people (upper bound guess) I think you shouldn’t expect anything massive.
I also suspect that my specific personality factors are important here. I have very high g which meant there was a lot of horsepower in my brain, which engagement with rationalist literature was able to focus. A lot of the benefits for me have been un-hobbling, though not all. I lack the ability or desire to work 60 hour weeks (or even 40 hour weeks, to be honest, I think about 30 hours reliable productivity + 5 hours of sporadic insight might be a better description) and lack a certain insane agency which is somewhat common among rats (the kind that Mikhail Samin has, for example) though I’m still more agentic than most people I know. I think both of these things are slowly improving, perhaps given five or ten years I’d be able to be a real founder, but then again perhaps not!
Maybe a good model is there are three or four important axes which define achievement. For any person, they have a raw skill factor (heavy tailed) and a hobbledness factor (between zero and one) and you multiply all the parameters together to get someone’s overall score. Rationality is good at unhobbling but not so good at bumping up the heavy-tailed factors by an OOM.
When you say your exposure to the rationalist community has made you significantly more intelligent, not just more capable (I assume you were alluding to these separately in the way that you’d allude to a base model’s capability level separately from its agent harness), how do you mean?
Mostly it has been along the “unhobbling” axes. I have (I think) better instincts for probability and estimation, which only requires a little feedback, it’s mostly about getting in touch with one’s inner sim.
I can more easily spot flaws in arguments, particularly of the flavor “this evidence is far too strong, you have done something very wrong” e.g. “small Mistral models can find the same vulnerabilities in code that Mythos did”, which was obviously not true, even if Mythos was overhyped.
Most of this feels like my brain is “de-noised” in a way, e.g. better at searching through an argument for flaws. It’s like my thoughts are less grasping, less likely to stick to the first thing I notice or am presented with, so I’m better at thinking.
E.g. in the Mythos example, a worse version of me might have grabbed onto the claim that Mythos is overhyped, and started arguing about Anthropic’s overall integrity, etc. instead of noticing the authors having made a claim that on lots of their benchmarks, small open models were better than GPT-5 and Claude 4.5, and really no models were better than any other, which is obvious nonsense and discredits their entire blogpost.
Interesting, thanks, that jives with my own experience as well. I’m mainly concerned about the thing Buck pointed out, that my “brain de-noising” has progressed more for evaluating external arguments than the ones I come up with.
An individual human is limited by only having 24 hours a day, a large part of that taken by “maintenance” (sleep, job, exercise, preparing food; all the things that you need to do regularly even if they are not your expertise), and only what is left can be spread across the things you want to become great at.
And while some people are good at collecting lots of money quickly, which allows them to skip (job) or outsource (preparing food) some of the maintenance, they are ultimately still limited by the 24 hours.
I think the strongest reason for why the original narrative was changed/abandoned is simply that the average rationalist got older than in the days of Sequences. Generalizing from my own example, when you are a student, it seems like you have unlimited time and energy, the only problem is to focus it on something useful (study science instead of pseudoscience, avoid getting politically mindkilled, etc.). When you later get a job, it may consume most of your time and energy, and when you have kids, they will take the rest. Suddenly your options get very limited, and you must rely on the existing infrastructure (e.g. many people send their kids to schools, because they need two incomes, which is incompatible with homeschooling).
Also, you will notice that many specialists on self-improvement are actually only good at talking about self-improvement. Which feels like a scam. (For example, Kiyosaki is famous for giving advice on how to run your own business, but in reality all his businesses have failed, except for the “making money by talking about being a successful businessman” business.) They are experts at “talking about X”, without being experts at “X”. Even the spiritual superstars these days often sexually abuse their students, because “talking about moral behavior” and “moral behavior” are two different specializations, and it is difficult to be an expert on both at the same time.
Many successful rationalist get invisible—they spend so much time and energy doing whatever they are successful at, that they don’t have much left for blogging. (Perhaps someone should specialize at interviewing them?) Also, having mastered the basic general lessons, they move into technical details of the thing the specialize at; but those technical details are less interesting for those who chose a different specialization. The visible rationalists are those whose specialization is blogging, because being visible is a part of their job. There is nothing wrong with successful blogging, but I think it is worth noticing explicitly that it is mostly the bloggers who become famous.
LessWrong is a communication infrastructure. Lighthaven is a physical implementation of it. They help a lot. The obvious question is whether we could go further in this direction. What other pieces of infrastructure could be build?
People used to talk a lot about group houses. Having rationalists live close to each other seems like a great thing. But then we got a lot of drama. Or maybe it’s just a selection bias, that the failures became famous, and the successes remain unnoticed? I don’t know.
...I probably won’t write anything smart here, but I share the intuition that the next needed thing is some piece of infrastructure, in the wide sense. We are not a set of individuals that will individually become superheroes, but we could have a few heroes who will improve themselves and create tools that will allow others to be 10% more efficient (but creating and maintaining the tool becomes the hero’s full-time job, so we will need another hero for another tool, and maybe we are bottlenecked on human talent), and perhaps this effect could accumulate.
Sounds interesting, but I would like to see some specific examples. I don’t mean examples of useful things that the authors choose to classify as protocols, but examples of when the act of classifying something as a protocol made it easier to solve.
Genuine question: if AI capabilities research stopped today and larger models stopped being trained, wouldn’t AI alignment research effectively be halted?
I’m assuming that the primary goal of AI alignment research is to prevent AGI and ASI from being existential risks. My main question is, how can methods for AGI/ASI alignment can be discovered before AGI/ASI exists?
AI alignment results tend to be either positive (“we succeeded in making Claude more honest”) or negative (“we got ChatGPT to kill someone”).
The positive results seem unlikely to generalize to larger versions of current models, much less to any novel architectures that will enable AGI and ASI. The major results on methods like sparse autoencoders and steering have already been difficult to reproduce.[1]
The negative results vindicate the concerns that AI alignment people already have. But the models are too small to demonstrate persistent, unintentional, dangerously misaligned goals, at least decisively enough to convince people who aren’t already worried.
One clear benefit to a pause would be time for policy to catch up. However, this might be like trying to draw a map for terrain that doesn’t exist yet. It would be like the Allies drawing up a nuclear treaty with the Axis powers before there was consensus that the nuclear bomb was actually possible.[2] It would be nice if everyone stopped and worked out a plan for global cooperation, but such a plan can only stabilize and achieve buy-in with the major players once both the underlying dangers and distribution of power are clear enough to all the players involved.
A research pause could definitely still be a net good for humanity, but at present I don’t understand what this time would buy. If these conclusions make sense, they would maybe favor a slowdown (for safety to keep pace with capabilities) rather than a pause. But they are based on my rudimentary knowledge, and I would like to hear what more knowledgeable people have to say.
I haven’t read many papers, so please contest this if you have strong evidence against it. Here I’m specifically thinking of Anthropic’s sparse autoencoders paper.
Not in a counterfactual sense about the outcome of the war. My point is that attempting such a treaty would have been unsuccessful and wouldn’t have found substantive support on either side.
I think given more time we could probably come up with tools that would help when we resume capabilities research. For example, there’s probably ways to do something like the logit lens but that work better, or ways to automatically factor models into more interpretable pieces, or just the long slog of tracing through circuits to figure out what the model is doing and build one from scratch rather than training. I don’t know how practical any of these approaches are, but I don’t think we’re at the limit of what we can learn from current models.
To anyone who has written publicly about how society needs to transform to survive ASI, or thinks that doing so is worthwhile, what is your theory of change?
The obvious one is to spray your ideas out into the world and hope that the right influential person takes them seriously at the right time. Milton Friedman describes this approach in the 1982 preface to Capitalism and Freedom:
Only a crisis—actual or perceived—produces real change. When that crisis occurs, the actions that are taken depend on the ideas that are lying around. That, I believe, is our basic function: to develop alternatives to existing policies, to keep them alive and available until the politically impossible becomes politically necessary.
But what if your theory is just noise that blocks out the better theories?[1] If this is worth worrying about, how would one assess that? Popularity could just be an indicator of storytelling ability, not how well a given person’s ideas will actually hold up.
Or is the main goal that we all debate together to work out the best idea? If that’s the case, should we stone anyone who defects by understating their uncertainty?
also, a country gets several shots at fixing its economy, but AI may be irretrievable. So trying several theories to see which one works might not be doable.
As someone who has loved the idea of enlightenment/individual transcendence for a long time, I now fall on the side of believing it is impossible.
Many famous historical figures were spiritually intense. This helped some (e.g. Muhammad, Louis IX) excel at rule and warfare. But nowadays the most effective people seem like they were born with some unusual inclination and just gradually accumulated power in their domain. In contrast, the most spiritually enlightened people are characterized by talking about enlightenment and spirituality all the time, rather than their preeminence in mathematics or philosophy or government.
I’m new to rationality, but I get the impression that rationalist training hasn’t been able to achieve individual transcendence either. “A Sense That More is Possible” was written 17 years ago, yet rationalists are not yet widely known for their evident auras of awesomeness, and CFAR declared “narrative bankruptcy” in 2022. If there has been some meaningful progress that a veteran could report, I’d be interested to hear it.
The biggest changes in society seem to come from new spiritual and social technologies (e.g. Christianity, academic institutions, capitalism, science) that coordinate behavior across many individuals rather than any individual awakening. In this sense maybe LessWrong is the significant innovation rather than rationalist training.
I personally believe that my exposure to the rationalist community has made me significantly more intelligent and capable along a number of axes. It has also caused me to move from working in chemistry to working in AI alignment. Overall, this is probably a 10-100x multiplier on my impact on the future, it’s just that most people don’t have any impact at all.
This is a really very impressive feat for a web forum and Harry Potter fanfiction to achieve, (N=1 but I doubt I’m atypical) but like, 1-2 OOMs isn’t much in the grand scheme of things. If everyone was as strongly benefited as I was, then yeah, the world would change in aggregate, but for a few hundred people (upper bound guess) I think you shouldn’t expect anything massive.
I also suspect that my specific personality factors are important here. I have very high g which meant there was a lot of horsepower in my brain, which engagement with rationalist literature was able to focus. A lot of the benefits for me have been un-hobbling, though not all. I lack the ability or desire to work 60 hour weeks (or even 40 hour weeks, to be honest, I think about 30 hours reliable productivity + 5 hours of sporadic insight might be a better description) and lack a certain insane agency which is somewhat common among rats (the kind that Mikhail Samin has, for example) though I’m still more agentic than most people I know. I think both of these things are slowly improving, perhaps given five or ten years I’d be able to be a real founder, but then again perhaps not!
Maybe a good model is there are three or four important axes which define achievement. For any person, they have a raw skill factor (heavy tailed) and a hobbledness factor (between zero and one) and you multiply all the parameters together to get someone’s overall score. Rationality is good at unhobbling but not so good at bumping up the heavy-tailed factors by an OOM.
When you say your exposure to the rationalist community has made you significantly more intelligent, not just more capable (I assume you were alluding to these separately in the way that you’d allude to a base model’s capability level separately from its agent harness), how do you mean?
Mostly it has been along the “unhobbling” axes. I have (I think) better instincts for probability and estimation, which only requires a little feedback, it’s mostly about getting in touch with one’s inner sim.
I can more easily spot flaws in arguments, particularly of the flavor “this evidence is far too strong, you have done something very wrong” e.g. “small Mistral models can find the same vulnerabilities in code that Mythos did”, which was obviously not true, even if Mythos was overhyped.
Most of this feels like my brain is “de-noised” in a way, e.g. better at searching through an argument for flaws. It’s like my thoughts are less grasping, less likely to stick to the first thing I notice or am presented with, so I’m better at thinking.
E.g. in the Mythos example, a worse version of me might have grabbed onto the claim that Mythos is overhyped, and started arguing about Anthropic’s overall integrity, etc. instead of noticing the authors having made a claim that on lots of their benchmarks, small open models were better than GPT-5 and Claude 4.5, and really no models were better than any other, which is obvious nonsense and discredits their entire blogpost.
Interesting, thanks, that jives with my own experience as well. I’m mainly concerned about the thing Buck pointed out, that my “brain de-noising” has progressed more for evaluating external arguments than the ones I come up with.
Yes.
An individual human is limited by only having 24 hours a day, a large part of that taken by “maintenance” (sleep, job, exercise, preparing food; all the things that you need to do regularly even if they are not your expertise), and only what is left can be spread across the things you want to become great at.
And while some people are good at collecting lots of money quickly, which allows them to skip (job) or outsource (preparing food) some of the maintenance, they are ultimately still limited by the 24 hours.
I think the strongest reason for why the original narrative was changed/abandoned is simply that the average rationalist got older than in the days of Sequences. Generalizing from my own example, when you are a student, it seems like you have unlimited time and energy, the only problem is to focus it on something useful (study science instead of pseudoscience, avoid getting politically mindkilled, etc.). When you later get a job, it may consume most of your time and energy, and when you have kids, they will take the rest. Suddenly your options get very limited, and you must rely on the existing infrastructure (e.g. many people send their kids to schools, because they need two incomes, which is incompatible with homeschooling).
Also, you will notice that many specialists on self-improvement are actually only good at talking about self-improvement. Which feels like a scam. (For example, Kiyosaki is famous for giving advice on how to run your own business, but in reality all his businesses have failed, except for the “making money by talking about being a successful businessman” business.) They are experts at “talking about X”, without being experts at “X”. Even the spiritual superstars these days often sexually abuse their students, because “talking about moral behavior” and “moral behavior” are two different specializations, and it is difficult to be an expert on both at the same time.
Many successful rationalist get invisible—they spend so much time and energy doing whatever they are successful at, that they don’t have much left for blogging. (Perhaps someone should specialize at interviewing them?) Also, having mastered the basic general lessons, they move into technical details of the thing the specialize at; but those technical details are less interesting for those who chose a different specialization. The visible rationalists are those whose specialization is blogging, because being visible is a part of their job. There is nothing wrong with successful blogging, but I think it is worth noticing explicitly that it is mostly the bloggers who become famous.
LessWrong is a communication infrastructure. Lighthaven is a physical implementation of it. They help a lot. The obvious question is whether we could go further in this direction. What other pieces of infrastructure could be build?
People used to talk a lot about group houses. Having rationalists live close to each other seems like a great thing. But then we got a lot of drama. Or maybe it’s just a selection bias, that the failures became famous, and the successes remain unnoticed? I don’t know.
...I probably won’t write anything smart here, but I share the intuition that the next needed thing is some piece of infrastructure, in the wide sense. We are not a set of individuals that will individually become superheroes, but we could have a few heroes who will improve themselves and create tools that will allow others to be 10% more efficient (but creating and maintaining the tool becomes the hero’s full-time job, so we will need another hero for another tool, and maybe we are bottlenecked on human talent), and perhaps this effect could accumulate.
On that last bit, it’d be interesting to get your reaction to the emerging protocol scene (institute, magazine, the old summer of protocols program, and I guess the natural starting point to all this, The Unreasonable Sufficiency of Protocols).
Sounds interesting, but I would like to see some specific examples. I don’t mean examples of useful things that the authors choose to classify as protocols, but examples of when the act of classifying something as a protocol made it easier to solve.
Genuine question: if AI capabilities research stopped today and larger models stopped being trained, wouldn’t AI alignment research effectively be halted?
I’m assuming that the primary goal of AI alignment research is to prevent AGI and ASI from being existential risks. My main question is, how can methods for AGI/ASI alignment can be discovered before AGI/ASI exists?
AI alignment results tend to be either positive (“we succeeded in making Claude more honest”) or negative (“we got ChatGPT to kill someone”).
The positive results seem unlikely to generalize to larger versions of current models, much less to any novel architectures that will enable AGI and ASI. The major results on methods like sparse autoencoders and steering have already been difficult to reproduce.[1]
The negative results vindicate the concerns that AI alignment people already have. But the models are too small to demonstrate persistent, unintentional, dangerously misaligned goals, at least decisively enough to convince people who aren’t already worried.
One clear benefit to a pause would be time for policy to catch up. However, this might be like trying to draw a map for terrain that doesn’t exist yet. It would be like the Allies drawing up a nuclear treaty with the Axis powers before there was consensus that the nuclear bomb was actually possible.[2] It would be nice if everyone stopped and worked out a plan for global cooperation, but such a plan can only stabilize and achieve buy-in with the major players once both the underlying dangers and distribution of power are clear enough to all the players involved.
A research pause could definitely still be a net good for humanity, but at present I don’t understand what this time would buy. If these conclusions make sense, they would maybe favor a slowdown (for safety to keep pace with capabilities) rather than a pause. But they are based on my rudimentary knowledge, and I would like to hear what more knowledgeable people have to say.
I haven’t read many papers, so please contest this if you have strong evidence against it. Here I’m specifically thinking of Anthropic’s sparse autoencoders paper.
Not in a counterfactual sense about the outcome of the war. My point is that attempting such a treaty would have been unsuccessful and wouldn’t have found substantive support on either side.
I think given more time we could probably come up with tools that would help when we resume capabilities research. For example, there’s probably ways to do something like the logit lens but that work better, or ways to automatically factor models into more interpretable pieces, or just the long slog of tracing through circuits to figure out what the model is doing and build one from scratch rather than training. I don’t know how practical any of these approaches are, but I don’t think we’re at the limit of what we can learn from current models.
To anyone who has written publicly about how society needs to transform to survive ASI, or thinks that doing so is worthwhile, what is your theory of change?
The obvious one is to spray your ideas out into the world and hope that the right influential person takes them seriously at the right time. Milton Friedman describes this approach in the 1982 preface to Capitalism and Freedom:
But what if your theory is just noise that blocks out the better theories?[1] If this is worth worrying about, how would one assess that? Popularity could just be an indicator of storytelling ability, not how well a given person’s ideas will actually hold up.
Or is the main goal that we all debate together to work out the best idea? If that’s the case, should we stone anyone who defects by understating their uncertainty?
also, a country gets several shots at fixing its economy, but AI may be irretrievable. So trying several theories to see which one works might not be doable.