RomanS comments on Why there is still one instance of Eliezer Yudkowsky?

RomanS 3 Nov 2025 19:42 UTC
1 point
0
My primary research work is in the field of sideloading itself. The digital guy helps with these tasks:
- Generate / criticize ideas. For example, the guy helped to design the current multi-agent architecture, on which he is now running.
- Gently moderate our research group chat.
- Work as a test subject.
- Do some data prep tasks (e.g. producing compressed versions of the corpus).
I expect a much more interesting list in the field of alignment research, including quite practical things (e.g. a team of digital Eliezers interrogating each checkpoint during training, to reduce the risk of catastrophic surprises). Of course, not a replacement for a proper alignment, but may win some time.
Judging by our experiments, Gemini 2.5. Pro is the first model that can (sometimes) simulate a particular human mind (i.e. thinking like you, not just answering in your approximate style). So, this is a partial answer to my original question: the tech is only 6 months old. Most people don’t know that such a thing is possible at all, and those who do know—are only in the early stages of their experimental work.
BTW, your 2020 work investigating the ability of GPT-3 to write in the style of famous authors—made me aware of such a possibility.