Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Pranav Gade
Karma:
119
All
Posts
Comments
New
Top
Old
unRLHF—Efficiently undoing LLM safeguards
Pranav Gade
,
Jeffrey Ladish
and
Simon Lermen
12 Oct 2023 19:58 UTC
117
points
15
comments
20
min read
LW
link
Back to top