Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Kshitij Sachan
Karma:
346
Redwood Research
All
Posts
Comments
New
Top
Old
AI Control: Improving Safety Despite Intentional Subversion
Buck
,
Fabien Roger
,
ryan_greenblatt
and
Kshitij Sachan
13 Dec 2023 15:51 UTC
239
points
24
comments
10
min read
LW
link
4
reviews
LLMs are (mostly) not helped by filler tokens
Kshitij Sachan
10 Aug 2023 0:48 UTC
66
points
36
comments
6
min read
LW
link
Polysemanticity and Capacity in Neural Networks
Buck
,
Adam Jermyn
and
Kshitij Sachan
7 Oct 2022 17:51 UTC
87
points
14
comments
3
min read
LW
link
Back to top