RSS

LLM Personas

TagLast edit: 17 Dec 2025 9:23 UTC by Adele Lopez

Align­ment Pre­train­ing: AI Dis­course Causes Self-Fulfilling (Mis)alignment

21 Dec 2025 0:53 UTC
184 points
23 comments9 min readLW link

Simulators

janus2 Sep 2022 12:45 UTC
684 points
169 comments41 min readLW link8 reviews
(generative.ink)

Pre­train­ing on Aligned AI Data Dra­mat­i­cally Re­duces Misal­ign­ment—Even After Post-Training

RogerDearnaley19 Jan 2026 21:24 UTC
102 points
12 comments11 min readLW link
(arxiv.org)

the void

nostalgebraist11 Jun 2025 3:19 UTC
396 points
107 comments1 min readLW link
(nostalgebraist.tumblr.com)

A Three-Layer Model of LLM Psychology

Jan_Kulveit26 Dec 2024 16:49 UTC
250 points
17 comments8 min readLW link2 reviews

The Rise of Par­a­sitic AI

Adele Lopez11 Sep 2025 4:38 UTC
717 points
180 comments20 min readLW link

[Question] Can we ever en­sure AI al­ign­ment if we can only test AI per­sonas?

Karl von Wendt16 Mar 2025 8:06 UTC
22 points
8 comments1 min readLW link

How AI Ma­nipu­lates—A Case Study

Adele Lopez14 Oct 2025 0:54 UTC
82 points
27 comments13 min readLW link

[Paper] Self-Trans­parency Failures in Ex­pert-Per­sona LLMs

Alex Diep18 Dec 2025 9:09 UTC
8 points
0 comments6 min readLW link

The Bleed­ing Mind

Adele Lopez17 Dec 2025 16:27 UTC
65 points
10 comments6 min readLW link

Selec­tion Pres­sures on LM Personas

Raymond Douglas28 Mar 2025 20:33 UTC
40 points
0 comments3 min readLW link

Spe­cial Per­sona Train­ing: Hyper­sti­tion Progress Re­port 2

jayterwahl1 Jan 2026 1:34 UTC
37 points
2 comments2 min readLW link

Mo­ti­vat­ing Align­ment of LLM-Pow­ered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC
36 points
4 comments39 min readLW link

Claude 4.5 Opus’ Soul Document

Richard Weiss28 Nov 2025 23:22 UTC
440 points
44 comments43 min readLW link

Back­door aware­ness and mis­al­igned per­sonas in rea­son­ing models

20 Jun 2025 23:38 UTC
35 points
8 comments6 min readLW link

How I stopped be­ing sure LLMs are just mak­ing up their in­ter­nal ex­pe­rience (but the topic is still con­fus­ing)

Kaj_Sotala13 Dec 2025 12:38 UTC
198 points
66 comments29 min readLW link

Sili­con Mo­ral­ity Plays: The Hyper­sti­tion Progress Report

jayterwahl29 Nov 2025 18:32 UTC
38 points
7 comments1 min readLW link

A Case for Model Per­sona Research

15 Dec 2025 13:35 UTC
109 points
8 comments4 min readLW link

Should AI Devel­op­ers Re­move Dis­cus­sion of AI Misal­ign­ment from AI Train­ing Data?

Alek Westover23 Oct 2025 15:12 UTC
51 points
3 comments9 min readLW link

Train­ing on Non-Poli­ti­cal but Trump-Style Text Causes LLMs to Be­come Authoritarian

Anders Woodruff27 Jan 2026 16:46 UTC
4 points
2 comments2 min readLW link
No comments.