Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Miles Turpin
Karma:
282
Research scientist at Scale AI on the SEAL team (safety)
All
Posts
Comments
New
Top
Old
Do models say what they learn?
Andy Arditi
,
marvinli
,
Joe Benton
and
Miles Turpin
22 Mar 2025 15:19 UTC
126
points
12
comments
13
min read
LW
link
Reward hacking behavior can generalize across tasks
Kei
,
Isaac Dunn
,
Henry Sleight
,
Miles Turpin
,
evhub
,
Carson Denison
and
Ethan Perez
28 May 2024 16:33 UTC
81
points
5
comments
21
min read
LW
link
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
Miles Turpin
11 Mar 2024 23:46 UTC
16
points
0
comments
1
min read
LW
link
(arxiv.org)
Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”
Miles Turpin
3 Oct 2023 2:22 UTC
31
points
0
comments
9
min read
LW
link
Unfaithful Explanations in Chain-of-Thought Prompting
Miles Turpin
3 Jun 2023 0:22 UTC
42
points
8
comments
7
min read
LW
link
Back to top