RSS

megasilverfist

Karma: 270

Prompt in­jec­tion in Google Trans­late re­veals base model be­hav­iors be­hind task-spe­cific fine-tuning

megasilverfist7 Feb 2026 13:56 UTC
159 points
27 comments3 min readLW link

megasilverfist’s Shortform

megasilverfist1 Sep 2025 4:56 UTC
2 points
8 comments1 min readLW link

Pro­fan­ity causes emer­gent mis­al­ign­ment, but with qual­i­ta­tively differ­ent re­sults than in­se­cure code

megasilverfist28 Aug 2025 8:22 UTC
26 points
4 comments8 min readLW link