RSS

Ansh Radhakrishnan

Karma: 559

Scal­able Over­sight and Weak-to-Strong Gen­er­al­iza­tion: Com­pat­i­ble ap­proaches to the same problem

16 Dec 2023 5:49 UTC
70 points
3 comments6 min readLW link

An­thropic Fall 2023 De­bate Progress Update

Ansh Radhakrishnan28 Nov 2023 5:37 UTC
74 points
9 comments12 min readLW link

Mea­sur­ing and Im­prov­ing the Faith­ful­ness of Model-Gen­er­ated Rea­son­ing

18 Jul 2023 16:36 UTC
109 points
13 comments6 min readLW link

Causal scrub­bing: re­sults on in­duc­tion heads

3 Dec 2022 0:59 UTC
34 points
1 comment17 min readLW link

Causal scrub­bing: re­sults on a paren bal­ance checker

3 Dec 2022 0:59 UTC
34 points
2 comments30 min readLW link

Causal scrub­bing: Appendix

3 Dec 2022 0:58 UTC
17 points
4 comments20 min readLW link

Causal Scrub­bing: a method for rigor­ously test­ing in­ter­pretabil­ity hy­pothe­ses [Red­wood Re­search]

3 Dec 2022 0:58 UTC
195 points
35 comments20 min readLW link1 review

The Bio An­chors Forecast

Ansh Radhakrishnan2 Jun 2022 1:32 UTC
12 points
0 comments3 min readLW link

RLHF

Ansh Radhakrishnan12 May 2022 21:18 UTC
18 points
5 comments5 min readLW link