RSS

RunjinChen

Karma: 102

Find­ing “mis­al­igned per­sona” fea­tures in open-weight models

9 Sep 2025 14:15 UTC
48 points
5 comments15 min readLW link

Fol­low-up ex­per­i­ments on pre­ven­ta­tive steering

6 Sep 2025 4:25 UTC
34 points
1 comment3 min readLW link

Per­sona vec­tors: mon­i­tor­ing and con­trol­ling char­ac­ter traits in lan­guage models

1 Aug 2025 21:19 UTC
26 points
3 comments5 min readLW link
(arxiv.org)