Thanks to the author for this post and this study! I tend to think that it would be safer to systematically curb directive, expressive, judicative, or suggestive acts (I am using these terms based on speech act’s theory) while training LLMs. Playing any role other than a pure analyst is very likely going to bring unexpected results. I wrote this idea as trait 9 in one of my posts here https://www.lesswrong.com/posts/Bf3ryxiM6Gff2zamw/control-vectors-as-dispositional-traits
Thanks to the author for this post and this study! I tend to think that it would be safer to systematically curb directive, expressive, judicative, or suggestive acts (I am using these terms based on speech act’s theory) while training LLMs. Playing any role other than a pure analyst is very likely going to bring unexpected results. I wrote this idea as trait 9 in one of my posts here https://www.lesswrong.com/posts/Bf3ryxiM6Gff2zamw/control-vectors-as-dispositional-traits