RSS

Neel Nanda

Karma: 14,740

Test your best meth­ods on our hard CoT in­terp tasks

26 Mar 2026 19:24 UTC
55 points
2 comments19 min readLW link

How well do mod­els fol­low their con­sti­tu­tions?

12 Mar 2026 0:07 UTC
98 points
5 comments26 min readLW link

Cen­sored LLMs as a Nat­u­ral Testbed for Se­cret Knowl­edge Elicitation

9 Mar 2026 18:50 UTC
30 points
2 comments5 min readLW link