My primary research work is in the field of sideloading itself. The digital guy helps with these tasks:
Generate / criticize ideas. For example, the guy helped to design the current multi-agent architecture, on which he is now running.
Gently moderate our research group chat.
Work as a test subject.
Do some data prep tasks (e.g. producing compressed versions of the corpus).
I expect a much more interesting list in the field of alignment research, including quite practical things (e.g. a team of digital Eliezers interrogating each checkpoint during training, to reduce the risk of catastrophic surprises). Of course, not a replacement for a proper alignment, but may win some time.
Judging by our experiments, Gemini 2.5. Pro is the first model that can (sometimes) simulate a particular human mind (i.e. thinking like you, not just answering in your approximate style). So, this is a partial answer to my original question: the tech is only 6 months old. Most people don’t know that such a thing is possible at all, and those who do know—are only in the early stages of their experimental work.
BTW, your 2020 work investigating the ability of GPT-3 to write in the style of famous authors—made me aware of such a possibility.
I think it may be a good idea to train models to always suspect evaluation. E.g. see “A sufficiently paranoid paperclip maximizer”.
And these days, while designing important evals, one must never assume that the evaluated model is naive about her condition.
But I agree with the OP, the situation with Gemini 3 is clearly pathological.
BTW, there is an additional problem with BIG-bench ending up in the training data: one of benchmark’s tasks is about evaluating self-awareness in LLMs (I contributed to it):
https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/self_awareness
Not sure about possible effects on the whole situation.