Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
megasilverfist
Karma:
270
All
Posts
Comments
New
Top
Old
Prompt injection in Google Translate reveals base model behaviors behind task-specific fine-tuning
megasilverfist
7 Feb 2026 13:56 UTC
159
points
27
comments
3
min read
LW
link
megasilverfist’s Shortform
megasilverfist
1 Sep 2025 4:56 UTC
2
points
8
comments
1
min read
LW
link
Profanity causes emergent misalignment, but with qualitatively different results than insecure code
megasilverfist
28 Aug 2025 8:22 UTC
26
points
4
comments
8
min read
LW
link
Back to top