https://dtch1997.github.io/
As of Oct 11 2025, I have not signed any contracts that I can’t mention exist. I’ll try to update this statement at least once a year, so long as it’s true. I added this statement thanks to the one in the gears to ascension’s bio.
Cool finding! IMO this seems like inoculation prompting. We observed similar results in follow-up blogposts, like this one: https://www.lesswrong.com/posts/znW7FmyF2HX9x29rA/conditionalization-confounds-inoculation-prompting-results
Atm it’s a bit unclear to me whether we want inoculation prompts that intervene on model identity like this. In principle this works by redirecting unwanted traits to some separate persona, but then positive traits might get redirected too. So we need more basic science done on model personas