Of course, any counterfactual has tons of different assumptions.
Yes, AI rebellion was a sci-fi trope, and much like human uploads or humans terraforming mars, it would have stayed that way without decades of discussion about the dynamics.
The timeline explicitly starts before 2017, and RNN-based chatbots like Replica started out don’t scale well, as they realized, and they replaced it with a model based on GPT-2 pretty early on. But sure, there’s another world where personal chatbots have enough work done to replace safety-focused AI research. Do you think it turns out better, or are you just positing another point where histories could have diverged?
Yes, but engineering challenges get solved without philosophical justification all of the time. And this is a key point being made by the entire counterfactual—it’s only because people took AGI seriously in designing LLMs that they frame the issues as alignment. To respond in more depth to the specific points:
In your posited case, CoT would certainly have been deployed as a clever trick that scales—but this doesn’t mean the models they think of as stochastic parrots start being treated as proto-AGIs with goals. They aren’t looking for true generalization, so any mistakes which need to be patched look like increased error rates to patch empirically, or places where they need a few more unit tests and ways to catch misbehavior—not a reason to design for safety for increasingly powerful models!
And before you dismiss this as implausible blindness, there are smart people who argue this way even today, despite being exposed to the arguments about increasing generality for years. So it’s certainly not obvious that they’d take people seriously when they claim that this ELIZA v12.0 released in 2025 is truly reasoning.
Thank you for the detailed critique! I agree with it, except for 2) being not another point of divergence, but a point where mankind might have returned back on a track similar to ours. What I envisioned was alternate-universe Replika rediscovering the LLMs,[1] driving users to suicide and conservatives or the USG raising questions about instilling human values into LLMs. Alas, this scenario is likely implausible, as evidenced by the lack of efforts to deal with Meta’s unaligned chatbots.
As for 1), the bus factor is hard to determine. What Yudkowsky did was to discover the AGI risks and to either accelerate the race or manage it in a safer way. Any other person capable of independently discovering the AI risks was likely infected[2] with Yudkowsky’s AI risk-related memes. But we don’t know the amount of these other people capable of discovering the risks.
P.S. The worst-case scenario is that the event of a Yudkowsky-like figure EVER emerging was highly unlikely. In this case the event could itself arguably be evidence that the world is a simulation.
In addition, users might express interest in making the companions smart enough to, say, write an essay for them or to check kids’ homework. If Replika did it, the LLMs would have to be scaled up and up...
Edited to add: For comparison, when making my post about colonialism in space, I wasn’t aware about Robin Hanson having made a similar model. What I did differently was to produce an argument potentially implying that there are two attractors and that one can align the AIs to one of the attractors even if alignment in SOTA meaning stays unsolved.
Of course, any counterfactual has tons of different assumptions.
Yes, AI rebellion was a sci-fi trope, and much like human uploads or humans terraforming mars, it would have stayed that way without decades of discussion about the dynamics.
The timeline explicitly starts before 2017, and RNN-based chatbots like Replica started out don’t scale well, as they realized, and they replaced it with a model based on GPT-2 pretty early on. But sure, there’s another world where personal chatbots have enough work done to replace safety-focused AI research. Do you think it turns out better, or are you just positing another point where histories could have diverged?
Yes, but engineering challenges get solved without philosophical justification all of the time. And this is a key point being made by the entire counterfactual—it’s only because people took AGI seriously in designing LLMs that they frame the issues as alignment. To respond in more depth to the specific points:
In your posited case, CoT would certainly have been deployed as a clever trick that scales—but this doesn’t mean the models they think of as stochastic parrots start being treated as proto-AGIs with goals. They aren’t looking for true generalization, so any mistakes which need to be patched look like increased error rates to patch empirically, or places where they need a few more unit tests and ways to catch misbehavior—not a reason to design for safety for increasingly powerful models!
And before you dismiss this as implausible blindness, there are smart people who argue this way even today, despite being exposed to the arguments about increasing generality for years. So it’s certainly not obvious that they’d take people seriously when they claim that this ELIZA v12.0 released in 2025 is truly reasoning.
Thank you for the detailed critique! I agree with it, except for 2) being not another point of divergence, but a point where mankind might have returned back on a track similar to ours. What I envisioned was alternate-universe Replika rediscovering the LLMs,[1] driving users to suicide and conservatives or the USG raising questions about instilling human values into LLMs. Alas, this scenario is likely implausible, as evidenced by the lack of efforts to deal with Meta’s unaligned chatbots.
As for 1), the bus factor is hard to determine. What Yudkowsky did was to discover the AGI risks and to either accelerate the race or manage it in a safer way. Any other person capable of independently discovering the AI risks was likely infected[2] with Yudkowsky’s AI risk-related memes. But we don’t know the amount of these other people capable of discovering the risks.
P.S. The worst-case scenario is that the event of a Yudkowsky-like figure EVER emerging was highly unlikely. In this case the event could itself arguably be evidence that the world is a simulation.
In addition, users might express interest in making the companions smart enough to, say, write an essay for them or to check kids’ homework. If Replika did it, the LLMs would have to be scaled up and up...
Edited to add: For comparison, when making my post about colonialism in space, I wasn’t aware about Robin Hanson having made a similar model. What I did differently was to produce an argument potentially implying that there are two attractors and that one can align the AIs to one of the attractors even if alignment in SOTA meaning stays unsolved.