RSS

Sublimi­nal Learning

TagLast edit: 27 Jan 2026 17:46 UTC by StanislavKrym

Subliminal learning is the effect which causes LLMs finetuned on a dataset created by another LLM under influence (e.g. being finetuned to love owls or emergently misaligned) to display results similar to the influence even if the dataset itself never mentioned anything related to the influence.

Train­ing on Non-Poli­ti­cal but Trump-Style Text Causes LLMs to Be­come Authoritarian

Anders Woodruff27 Jan 2026 16:46 UTC
4 points
2 comments2 min readLW link

Sublimi­nal Learn­ing Across Models

26 Nov 2025 16:15 UTC
58 points
15 comments5 min readLW link

Sublimi­nal Learn­ing: LLMs Trans­mit Be­hav­ioral Traits via Hid­den Sig­nals in Data

22 Jul 2025 16:37 UTC
348 points
40 comments4 min readLW link

Was It Owl a Dream?

Yovel Rom23 Feb 2026 5:07 UTC
16 points
4 comments4 min readLW link
(yovelrom.substack.com)

It’s Owl in the Num­bers: To­ken En­tan­gle­ment in Sublimi­nal Learning

6 Aug 2025 22:18 UTC
41 points
7 comments4 min readLW link

Trans­mit­ting Misal­ign­ment with Sublimi­nal Learn­ing via Paraphrasing

17 Dec 2025 19:34 UTC
38 points
0 comments10 min readLW link

Trans­form­ers Have Com­pu­ta­tional Sig­na­tures Orthog­o­nal to Se­man­tic Content

luxia26 Feb 2026 2:55 UTC
9 points
2 comments13 min readLW link
No comments.