RSS

RowanWang

Karma: 316

https://​​rowankwang.com/​​

Build­ing and eval­u­at­ing al­ign­ment au­dit­ing agents

24 Jul 2025 19:22 UTC
47 points
1 comment5 min readLW link

Mod­ify­ing LLM Beliefs with Syn­thetic Doc­u­ment Finetuning

24 Apr 2025 21:15 UTC
70 points
12 comments2 min readLW link
(alignment.anthropic.com)

Some Les­sons Learned from Study­ing Indi­rect Ob­ject Iden­ti­fi­ca­tion in GPT-2 small

28 Oct 2022 23:55 UTC
101 points
9 comments9 min readLW link2 reviews
(arxiv.org)

Gears-Level Men­tal Models of Trans­former Interpretability

RowanWang29 Mar 2022 20:09 UTC
75 points
4 comments6 min readLW link

Les­sons After a Cou­ple Months of Try­ing to Do ML Research

RowanWang22 Mar 2022 23:45 UTC
71 points
8 comments6 min readLW link