RSS

Kshitij Sachan

Karma: 349

Redwood Research

AI Con­trol: Im­prov­ing Safety De­spite In­ten­tional Subversion

13 Dec 2023 15:51 UTC
240 points
26 comments10 min readLW link4 reviews

LLMs are (mostly) not helped by filler tokens

Kshitij Sachan10 Aug 2023 0:48 UTC
68 points
36 comments6 min readLW link