Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
mrinank_sharma
Karma:
174
All
Posts
Comments
New
Top
Old
Best-of-N Jailbreaking
John Hughes
,
saraprice
,
Aengus Lynch
,
Rylan Schaeffer
,
Fazl
,
Henry Sleight
,
Ethan Perez
and
mrinank_sharma
14 Dec 2024 4:58 UTC
78
points
5
comments
2
min read
LW
link
(arxiv.org)
Towards Understanding Sycophancy in Language Models
Ethan Perez
,
mrinank_sharma
,
Meg
and
Tomek Korbak
24 Oct 2023 0:30 UTC
66
points
0
comments
2
min read
LW
link
(arxiv.org)
Paper: Understanding and Controlling a Maze-Solving Policy Network
TurnTrout
,
Ulisse Mini
,
peligrietzer
,
mrinank_sharma
,
Austin Meek
,
Monte M
and
lisathiergart
13 Oct 2023 1:38 UTC
70
points
0
comments
1
min read
LW
link
(arxiv.org)
Back to top