Great work and nice to see you on LessWrong!
Minor correction: ‘making the link between activation engineering and interpolating between different simulators’ → ‘making the link between activation engineering and interpolating between different simulacra’ (referencing Simulators, Steering GPT-2-XL by adding an activation vector, Inference-Time Intervention: Eliciting Truthful Answers from a Language Model).
Great work and nice to see you on LessWrong!
Minor correction: ‘making the link between activation engineering and interpolating between different simulators’ → ‘making the link between activation engineering and interpolating between different simulacra’ (referencing Simulators, Steering GPT-2-XL by adding an activation vector, Inference-Time Intervention: Eliciting Truthful Answers from a Language Model).