These days, it’s relatively easy to create a digital replica of a person.
You give the person’s writings to a top LLM, and (with a clever prompt) the LLM starts thinking like the person. E.g. see our experiments on the topic.
Of course, it’s far away from a proper mind uploading. But even in this limited form, it could be highly useful for AI alignment research:
accelerate the research by building digital teams of hundreds of virtual alignment researchers
run smarter alignment benchmarks (e.g. the digital Yudkowsky running millions of clever tests against your new model)
explore the human values, inner- and outer alignment with the help of digital humans.
Why no one is doing this?
Given the short timelines and the low likelihood of AI slowdown, this may be the only way to get alignment before AGI, by massively (OOMs) accelerating the alignment research.
IMO that’s because it’s not relatively easy to create a good replica of a person; LLMs fine-tuned to speak like a particular target retain LLM-standard confabulation, distractibility, inability to learn from experience, etc which will make them similarly ineffective at alignment research. I’d suggest looking into the AI Village for a better sense of how LLMs do at long-horizon tasks. (Also, I want to point out that inference is costly. The AI village, which only has four agents and only runs them for two hours a day, costs $3700 per month; hundreds of always-on agents would likely cost hundreds of times that much. This could be a good tradeoff if they were superhumanly effective alignment researchers, but I think current frontier LLMs are capable only of subhuman performance).
BTW, I’m running a digital replica of myself. The setup is as follows:
Gemini 2.5 as the model
The script splits the text corpus (8 MB) into small-enough chunks for Gemini to digest (1M tokens), and then (with some scaffolding) returns a unified answer.
The answers are surprisingly good at times, reflecting non-trivial aspects of my mind.
From many experiments with the digital-me, I conclude that a similar setup for Yudkowksy could be useful even with today’s models (assuming large-enough budgets).
There will be no genius-level insights in 2025, but he could automate a lot of routine alignment work, like evaluating models.
Given that models may become dramatically smarter in 2026-2027, the digital Yudkowksy may become dramatically more useful too.
My primary research work is in the field of sideloading itself. The digital guy helps with these tasks:
Generate / criticize ideas. For example, the guy helped to design the current multi-agent architecture, on which he is now running.
Gently moderate our research group chat.
Work as a test subject.
Do some data prep tasks (e.g. producing compressed versions of the corpus).
I expect a much more interesting list in the field of alignment research, including quite practical things (e.g. a team of digital Eliezers interrogating each checkpoint during training, to reduce the risk of catastrophic surprises). Of course, not a replacement for a proper alignment, but may win some time.
Judging by our experiments, Gemini 2.5. Pro is the first model that can (sometimes) simulate a particular human mind (i.e. thinking like you, not just answering in your approximate style). So, this is a partial answer to my original question: the tech is only 6 months old. Most people don’t know that such a thing is possible at all, and those who do know—are only in the early stages of their experimental work.
BTW, your 2020 work investigating the ability of GPT-3 to write in the style of famous authors—made me aware of such a possibility.
Yes, if MIRI spends a year on building as good model of Yudkowsky as possible, it can help in alignment and its measurable and doable thing. They can later ask that model about failure modes of other AIs and it will cry “Misaligned!”
I performed test. No LLM was able to predict the title of new EY’s book.
The closest was “Don’t build what you can’t bind” which is significantly weaker.
[Question] Why there is still one instance of Eliezer Yudkowsky?
These days, it’s relatively easy to create a digital replica of a person.
You give the person’s writings to a top LLM, and (with a clever prompt) the LLM starts thinking like the person. E.g. see our experiments on the topic.
Of course, it’s far away from a proper mind uploading. But even in this limited form, it could be highly useful for AI alignment research:
accelerate the research by building digital teams of hundreds of virtual alignment researchers
run smarter alignment benchmarks (e.g. the digital Yudkowsky running millions of clever tests against your new model)
explore the human values, inner- and outer alignment with the help of digital humans.
Why no one is doing this?
Given the short timelines and the low likelihood of AI slowdown, this may be the only way to get alignment before AGI, by massively (OOMs) accelerating the alignment research.
IMO that’s because it’s not relatively easy to create a good replica of a person; LLMs fine-tuned to speak like a particular target retain LLM-standard confabulation, distractibility, inability to learn from experience, etc which will make them similarly ineffective at alignment research. I’d suggest looking into the AI Village for a better sense of how LLMs do at long-horizon tasks. (Also, I want to point out that inference is costly. The AI village, which only has four agents and only runs them for two hours a day, costs $3700 per month; hundreds of always-on agents would likely cost hundreds of times that much. This could be a good tradeoff if they were superhumanly effective alignment researchers, but I think current frontier LLMs are capable only of subhuman performance).
I agree with you on most points.
BTW, I’m running a digital replica of myself. The setup is as follows:
Gemini 2.5 as the model
The script splits the text corpus (8 MB) into small-enough chunks for Gemini to digest (1M tokens), and then (with some scaffolding) returns a unified answer.
The answers are surprisingly good at times, reflecting non-trivial aspects of my mind.
From many experiments with the digital-me, I conclude that a similar setup for Yudkowksy could be useful even with today’s models (assuming large-enough budgets).
There will be no genius-level insights in 2025, but he could automate a lot of routine alignment work, like evaluating models.
Given that models may become dramatically smarter in 2026-2027, the digital Yudkowksy may become dramatically more useful too.
I open-sourced the code:
https://github.com/Sideloading-Research/telegram_sideload
What routine research work of your own have you automated with your digital-me?
My primary research work is in the field of sideloading itself. The digital guy helps with these tasks:
Generate / criticize ideas. For example, the guy helped to design the current multi-agent architecture, on which he is now running.
Gently moderate our research group chat.
Work as a test subject.
Do some data prep tasks (e.g. producing compressed versions of the corpus).
I expect a much more interesting list in the field of alignment research, including quite practical things (e.g. a team of digital Eliezers interrogating each checkpoint during training, to reduce the risk of catastrophic surprises). Of course, not a replacement for a proper alignment, but may win some time.
Judging by our experiments, Gemini 2.5. Pro is the first model that can (sometimes) simulate a particular human mind (i.e. thinking like you, not just answering in your approximate style). So, this is a partial answer to my original question: the tech is only 6 months old. Most people don’t know that such a thing is possible at all, and those who do know—are only in the early stages of their experimental work.
BTW, your 2020 work investigating the ability of GPT-3 to write in the style of famous authors—made me aware of such a possibility.
Yes, if MIRI spends a year on building as good model of Yudkowsky as possible, it can help in alignment and its measurable and doable thing. They can later ask that model about failure modes of other AIs and it will cry “Misaligned!”
I performed test. No LLM was able to predict the title of new EY’s book. The closest was “Don’t build what you can’t bind” which is significantly weaker.