RSS

Owain_Evans

Karma: 3,795

https://​​owainevans.github.io/​​

Con­cept Poi­son­ing: Prob­ing LLMs with­out probes

Aug 5, 2025, 5:00 PM
53 points
4 comments13 min readLW link

Sublimi­nal Learn­ing: LLMs Trans­mit Be­hav­ioral Traits via Hid­den Sig­nals in Data

Jul 22, 2025, 4:37 PM
320 points
32 comments4 min readLW link

Back­door aware­ness and mis­al­igned per­sonas in rea­son­ing models

Jun 20, 2025, 11:38 PM
34 points
8 comments6 min readLW link

Thought Crime: Back­doors & Emer­gent Misal­ign­ment in Rea­son­ing Models

Jun 16, 2025, 4:43 PM
67 points
2 comments8 min readLW link

Emer­gent Misal­ign­ment: Nar­row fine­tun­ing can pro­duce broadly mis­al­igned LLMs

Feb 25, 2025, 5:39 PM
329 points
92 comments4 min readLW link