RSS

Devina Jain

Karma: 8

Can Per­sua­sion Break AI Safety? Ex­plor­ing the In­ter­play Between Fine-Tun­ing, At­tacks, and Guardrails

Devina Jain4 Feb 2025 19:10 UTC
9 points
0 comments10 min readLW link