Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Ansh Radhakrishnan
Karma:
610
All
Posts
Comments
New
Top
Old
Ansh Radhakrishnan’s Shortform
Ansh Radhakrishnan
10 Oct 2024 22:00 UTC
5
points
2
comments
1
min read
LW
link
Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem
Ansh Radhakrishnan
,
Buck
,
ryan_greenblatt
and
Fabien Roger
16 Dec 2023 5:49 UTC
76
points
4
comments
6
min read
LW
link
1
review
Anthropic Fall 2023 Debate Progress Update
Ansh Radhakrishnan
28 Nov 2023 5:37 UTC
76
points
9
comments
12
min read
LW
link
Measuring and Improving the Faithfulness of Model-Generated Reasoning
Ansh Radhakrishnan
,
tamera
,
karinanguyen
,
Sam Bowman
and
Ethan Perez
18 Jul 2023 16:36 UTC
111
points
15
comments
6
min read
LW
link
1
review
Causal scrubbing: results on induction heads
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
Tao Lin
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
3 Dec 2022 0:59 UTC
34
points
1
comment
17
min read
LW
link
Causal scrubbing: results on a paren balance checker
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
Tao Lin
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
3 Dec 2022 0:59 UTC
34
points
2
comments
30
min read
LW
link
Causal scrubbing: Appendix
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
3 Dec 2022 0:58 UTC
18
points
4
comments
20
min read
LW
link
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
3 Dec 2022 0:58 UTC
206
points
35
comments
20
min read
LW
link
1
review
The Bio Anchors Forecast
Ansh Radhakrishnan
2 Jun 2022 1:32 UTC
13
points
0
comments
3
min read
LW
link
RLHF
Ansh Radhakrishnan
12 May 2022 21:18 UTC
18
points
5
comments
5
min read
LW
link
An Inside View of AI Alignment
Ansh Radhakrishnan
11 May 2022 2:16 UTC
32
points
2
comments
2
min read
LW
link
Back to top