RSS

Miles Turpin

Karma: 121

Research scientist at NYU Alignment Research Group

Re­ward hack­ing be­hav­ior can gen­er­al­ize across tasks

28 May 2024 16:33 UTC
46 points
0 comments21 min readLW link

Bias-Aug­mented Con­sis­tency Train­ing Re­duces Bi­ased Rea­son­ing in Chain-of-Thought

Miles Turpin11 Mar 2024 23:46 UTC
16 points
0 comments1 min readLW link
(arxiv.org)

Some Quick Fol­low-Up Ex­per­i­ments to “Taken out of con­text: On mea­sur­ing situ­a­tional aware­ness in LLMs”

Miles Turpin3 Oct 2023 2:22 UTC
31 points
0 comments9 min readLW link

Un­faith­ful Ex­pla­na­tions in Chain-of-Thought Prompting

Miles Turpin3 Jun 2023 0:22 UTC
38 points
8 comments7 min readLW link