Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Sam Bowman
Karma:
1,814
https://cims.nyu.edu/~sbowman/
All
Posts
Comments
New
Top
Old
Putting up Bumpers
Sam Bowman
Apr 23, 2025, 4:05 PM
50
points
13
comments
2
min read
LW
link
Automated Researchers Can Subtly Sandbag
gasteigerjo
,
Akbir Khan
,
Sam Bowman
,
Vlad Mikulik
,
Ethan Perez
and
Fabien Roger
Mar 26, 2025, 7:13 PM
44
points
0
comments
4
min read
LW
link
(alignment.anthropic.com)
Auditing language models for hidden objectives
Sam Marks
,
Johannes Treutlein
,
dmz
,
Sam Bowman
,
Hoagy
,
Carson Denison
,
Kei
,
7vik
,
Akbir Khan
,
Austin Meek
,
Euan Ong
,
Christopher Olah
,
Fabien Roger
,
jeanne_
,
Meg
,
Drake Thomas
,
Adam Jermyn
,
Monte M
and
evhub
Mar 13, 2025, 7:18 PM
141
points
15
comments
13
min read
LW
link
Alignment Faking in Large Language Models
ryan_greenblatt
,
evhub
,
Carson Denison
,
Benjamin Wright
,
Fabien Roger
,
Monte M
,
Sam Marks
,
Johannes Treutlein
,
Sam Bowman
and
Buck
Dec 18, 2024, 5:19 PM
483
points
75
comments
10
min read
LW
link
Sabotage Evaluations for Frontier Models
David Duvenaud
,
Joe Benton
,
Sam Bowman
,
evhub
,
mishajw
,
Eric Christiansen
,
HoldenKarnofsky
,
Ethan Perez
and
Buck
Oct 18, 2024, 10:33 PM
95
points
56
comments
6
min read
LW
link
(assets.anthropic.com)
The Checklist: What Succeeding at AI Safety Will Involve
Sam Bowman
Sep 3, 2024, 6:18 PM
151
points
49
comments
22
min read
LW
link
(sleepinyourhat.github.io)
Simple probes can catch sleeper agents
Monte M
,
Carson Denison
,
Zac Hatfield-Dodds
,
David Duvenaud
,
Sam Bowman
,
Ethan Perez
and
evhub
Apr 23, 2024, 9:10 PM
133
points
21
comments
1
min read
LW
link
(www.anthropic.com)
LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery
,
Sam Bowman
and
Shi Feng
Apr 17, 2024, 9:09 PM
45
points
1
comment
3
min read
LW
link
(tiny.cc)
Debating with More Persuasive LLMs Leads to More Truthful Answers
Akbir Khan
,
John Hughes
,
Dan Valentine
,
Sam Bowman
and
Ethan Perez
Feb 7, 2024, 9:28 PM
89
points
14
comments
9
min read
LW
link
(arxiv.org)
Measuring and Improving the Faithfulness of Model-Generated Reasoning
Ansh Radhakrishnan
,
tamera
,
karinanguyen
,
Sam Bowman
and
Ethan Perez
Jul 18, 2023, 4:36 PM
111
points
15
comments
6
min read
LW
link
1
review
Pretraining Language Models with Human Preferences
Tomek Korbak
,
Sam Bowman
and
Ethan Perez
Feb 21, 2023, 5:57 PM
135
points
20
comments
11
min read
LW
link
2
reviews
Inverse Scaling Prize: Second Round Winners
Ian McKenzie
,
Sam Bowman
and
Ethan Perez
Jan 24, 2023, 8:12 PM
58
points
17
comments
15
min read
LW
link
AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022
Sam Bowman
Sep 1, 2022, 7:15 PM
76
points
2
comments
7
min read
LW
link
Survey of NLP Researchers: NLP is contributing to AGI progress; major catastrophe plausible
Sam Bowman
Aug 31, 2022, 1:39 AM
91
points
6
comments
2
min read
LW
link
Artificial Sandwiching: When can we test scalable alignment protocols without humans?
Sam Bowman
Jul 13, 2022, 9:14 PM
42
points
6
comments
5
min read
LW
link
Announcing the Inverse Scaling Prize ($250k Prize Pool)
Ethan Perez
,
Ian McKenzie
and
Sam Bowman
Jun 27, 2022, 3:58 PM
171
points
14
comments
7
min read
LW
link
Jobs: Help scale up LM alignment research at NYU
Sam Bowman
May 9, 2022, 2:12 PM
60
points
1
comment
1
min read
LW
link
A Small Negative Result on Debate
Sam Bowman
Apr 12, 2022, 6:19 PM
42
points
11
comments
1
min read
LW
link
NLP Position Paper: When Combatting Hype, Proceed with Caution
Sam Bowman
Oct 15, 2021, 8:57 PM
46
points
14
comments
1
min read
LW
link
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel