RSS

Miles Turpin

Karma: 262

Research scientist at Scale AI on the SEAL team (safety)

Do mod­els say what they learn?

22 Mar 2025 15:19 UTC
115 points
12 comments13 min readLW link

Re­ward hack­ing be­hav­ior can gen­er­al­ize across tasks

28 May 2024 16:33 UTC
79 points
5 comments21 min readLW link

Bias-Aug­mented Con­sis­tency Train­ing Re­duces Bi­ased Rea­son­ing in Chain-of-Thought

Miles Turpin11 Mar 2024 23:46 UTC
16 points
0 comments1 min readLW link
(arxiv.org)

Some Quick Fol­low-Up Ex­per­i­ments to “Taken out of con­text: On mea­sur­ing situ­a­tional aware­ness in LLMs”

Miles Turpin3 Oct 2023 2:22 UTC
31 points
0 comments9 min readLW link

Un­faith­ful Ex­pla­na­tions in Chain-of-Thought Prompting

Miles Turpin3 Jun 2023 0:22 UTC
42 points
8 comments7 min readLW link