Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Devina Jain
Karma:
8
All
Posts
Comments
New
Top
Old
Can Persuasion Break AI Safety? Exploring the Interplay Between Fine-Tuning, Attacks, and Guardrails
Devina Jain
4 Feb 2025 19:10 UTC
9
points
0
comments
10
min read
LW
link
Back to top