Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Nandi
Karma:
83
All
Posts
Comments
New
Top
Old
Robustness of Contrast-Consistent Search to Adversarial Prompting
Nandi
,
i
,
Jamie Wright
,
Seamus_F
and
hugofry
1 Nov 2023 12:46 UTC
16
points
1
comment
7
min read
LW
link
Machine Unlearning Evaluations as Interpretability Benchmarks
NickyP
and
Nandi
23 Oct 2023 16:33 UTC
33
points
2
comments
11
min read
LW
link
Splitting Debate up into Two Subsystems
Nandi
3 Jul 2020 20:11 UTC
13
points
5
comments
4
min read
LW
link
Acknowledging Human Preference Types to Support Value Learning
Nandi
13 Nov 2018 18:57 UTC
34
points
4
comments
9
min read
LW
link
Back to top