RSS

jenny(Jenny Nitishinskaya)

Karma: 301

At­tribut­ing to in­ter­ac­tions with GCPD and GWPD

jenny11 Oct 2023 15:06 UTC
20 points
0 comments6 min readLW link

Im­pact sto­ries for model in­ter­nals: an ex­er­cise for in­ter­pretabil­ity researchers

jenny25 Sep 2023 23:15 UTC
29 points
3 comments7 min readLW link

Causal scrub­bing: re­sults on in­duc­tion heads

3 Dec 2022 0:59 UTC
34 points
1 comment17 min readLW link

Causal scrub­bing: re­sults on a paren bal­ance checker

3 Dec 2022 0:59 UTC
34 points
2 comments30 min readLW link

Causal scrub­bing: Appendix

3 Dec 2022 0:58 UTC
17 points
4 comments20 min readLW link

Causal Scrub­bing: a method for rigor­ously test­ing in­ter­pretabil­ity hy­pothe­ses [Red­wood Re­search]

3 Dec 2022 0:58 UTC
195 points
35 comments20 min readLW link1 review