Kshitij Sachan

Karma: 348

Redwood Research

AI Control: Improving Safety Despite Intentional Subversion

Buck, Fabien Roger, ryan_greenblatt and Kshitij Sachan

13 Dec 2023 15:51 UTC

239 points

24 comments10 min readLW link 4 reviews

LLMs are (mostly) not helped by filler tokens

Kshitij Sachan10 Aug 2023 0:48 UTC

68 points

36 comments6 min readLW link

Polysemanticity and Capacity in Neural Networks

Buck, Adam Jermyn and Kshitij Sachan

7 Oct 2022 17:51 UTC

87 points

14 comments3 min readLW link