Miles Turpin

Karma: 290

Research scientist at Scale AI on the SEAL team (safety)

Do models say what they learn?

Andy Arditi, marvinli, Joe Benton and Miles Turpin

22 Mar 2025 15:19 UTC

127 points

12 comments13 min readLW link

Reward hacking behavior can generalize across tasks

Kei Nishimura-Gasparian, Isaac Dunn, Henry Sleight, Miles Turpin, evhub, Carson Denison and Ethan Perez

28 May 2024 16:33 UTC

85 points

5 comments21 min readLW link

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

Miles Turpin11 Mar 2024 23:46 UTC

16 points

0 comments1 min readLW link

(arxiv.org)

Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”

Miles Turpin3 Oct 2023 2:22 UTC

31 points

0 comments9 min readLW link

Unfaithful Explanations in Chain-of-Thought Prompting

Miles Turpin3 Jun 2023 0:22 UTC

43 points

8 comments7 min readLW link