I’ve formerly done research for MIRI and what’s now the Center on Long-Term Risk; I’m now making a living as an emotion coach and Substack writer.
Most of my content becomes free eventually, but if you’d like to get a paid subscription to my Substack, you’ll get it a week early and make it possible for me to write more.
I haven’t seen that kind of wording with 4.5, likely in part because of this bit in my custom instructions. At some point, I found that telling Claude “make your praise specific” was more effective at making it tone down the praise than telling it “don’t praise me” (as with humans, LLMs seem to sometimes respond better to “do Y instead of X” than “don’t do X”):
(I do have ‘past chats’ turned on, but it doesn’t seem to do anything unless I specifically ask Claude to recall past chats.)