RSS

RunjinChen

Karma: 95

Find­ing “mis­al­igned per­sona” fea­tures in open-weight models

9 Sep 2025 14:15 UTC
45 points
5 comments15 min readLW link

Fol­low-up ex­per­i­ments on pre­ven­ta­tive steering

6 Sep 2025 4:25 UTC
31 points
1 comment3 min readLW link

Per­sona vec­tors: mon­i­tor­ing and con­trol­ling char­ac­ter traits in lan­guage models

1 Aug 2025 21:19 UTC
25 points
3 comments5 min readLW link
(arxiv.org)