RSS

LLM Personas

TagLast edit: 17 Dec 2025 9:23 UTC by Adele Lopez

Align­ment Pre­train­ing: AI Dis­course Causes Self-Fulfilling (Mis)alignment

21 Dec 2025 0:53 UTC
200 points
25 comments9 min readLW link

Simulators

janus2 Sep 2022 12:45 UTC
704 points
169 comments41 min readLW link8 reviews
(generative.ink)

the void

nostalgebraist11 Jun 2025 3:19 UTC
410 points
108 comments1 min readLW link
(nostalgebraist.tumblr.com)

Ex­per­i­men­tal Ev­i­dence for Si­mu­la­tor The­ory— Part 1: Emer­gent Misal­ign­ment and Weird Generalizations

RogerDearnaley23 Mar 2026 22:37 UTC
25 points
0 comments53 min readLW link

Ex­per­i­men­tal Ev­i­dence for Si­mu­la­tor The­ory— Part 2: The Scalers Strike Back

RogerDearnaley23 Mar 2026 22:37 UTC
21 points
0 comments34 min readLW link

Pre­train­ing on Aligned AI Data Dra­mat­i­cally Re­duces Misal­ign­ment—Even After Post-Training

RogerDearnaley19 Jan 2026 21:24 UTC
106 points
12 comments11 min readLW link
(arxiv.org)

Shap­ing the ex­plo­ra­tion of the mo­ti­va­tion-space mat­ters for AI safety

6 Mar 2026 14:43 UTC
78 points
15 comments10 min readLW link

A Three-Layer Model of LLM Psychology

Jan_Kulveit26 Dec 2024 16:49 UTC
260 points
17 comments8 min readLW link2 reviews

The Rise of Par­a­sitic AI

Adele Lopez11 Sep 2025 4:38 UTC
757 points
191 comments20 min readLW link

Per­sona Parasitology

Raymond Douglas16 Feb 2026 16:22 UTC
176 points
38 comments11 min readLW link

A Case for Model Per­sona Research

15 Dec 2025 13:35 UTC
119 points
11 comments4 min readLW link

The Bleed­ing Mind

Adele Lopez17 Dec 2025 16:27 UTC
67 points
9 comments6 min readLW link

Selec­tion Pres­sures on LM Personas

Raymond Douglas28 Mar 2025 20:33 UTC
40 points
0 comments3 min readLW link

Con­crete re­search ideas on AI personas

3 Feb 2026 21:50 UTC
68 points
10 comments6 min readLW link

[Question] Can we ever en­sure AI al­ign­ment if we can only test AI per­sonas?

Karl von Wendt16 Mar 2025 8:06 UTC
22 points
8 comments1 min readLW link

Role-play­ing vs Self-modelling

Jan_Kulveit7 Apr 2026 20:41 UTC
63 points
3 comments4 min readLW link

Bing Chat is blatantly, ag­gres­sively misaligned

evhub15 Feb 2023 5:29 UTC
408 points
181 comments2 min readLW link1 review

How AI Ma­nipu­lates—A Case Study

Adele Lopez14 Oct 2025 0:54 UTC
82 points
27 comments13 min readLW link

I Am Scared of Post­ing Nega­tive Takes About Bing’s AI

Yitz17 Feb 2023 20:50 UTC
63 points
29 comments1 min readLW link

[Paper] Self-Trans­parency Failures in Ex­pert-Per­sona LLMs

Alex Diep18 Dec 2025 9:09 UTC
8 points
0 comments6 min readLW link

AI #1: Syd­ney and Bing

Zvi21 Feb 2023 14:00 UTC
171 points
45 comments61 min readLW link1 review
(thezvi.wordpress.com)

Bing Chat is a Pre­cur­sor to Some­thing Le­gi­t­i­mately Dangerous

Simon Berens1 Mar 2023 1:36 UTC
20 points
6 comments2 min readLW link
(www.simonberens.com)

[Question] Are We Leav­ing Liter­a­ture To The Psy­chotic?

Yitz9 Oct 2025 6:09 UTC
11 points
4 comments1 min readLW link

Spe­cial Per­sona Train­ing: Hyper­sti­tion Progress Re­port 2

jayterwahl1 Jan 2026 1:34 UTC
37 points
2 comments2 min readLW link

Mo­ti­vat­ing Align­ment of LLM-Pow­ered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC
37 points
4 comments39 min readLW link

Claude 4.5 Opus’ Soul Document

Richard Weiss28 Nov 2025 23:22 UTC
441 points
44 comments43 min readLW link

Back­door aware­ness and mis­al­igned per­sonas in rea­son­ing models

20 Jun 2025 23:38 UTC
37 points
8 comments6 min readLW link

How I stopped be­ing sure LLMs are just mak­ing up their in­ter­nal ex­pe­rience (but the topic is still con­fus­ing)

Kaj_Sotala13 Dec 2025 12:38 UTC
202 points
67 comments29 min readLW link

Si­tu­a­tional Aware­ness as a Prompt for LLM Parasitism

Baybar15 Oct 2025 1:45 UTC
8 points
6 comments19 min readLW link

Sili­con Mo­ral­ity Plays: The Hyper­sti­tion Progress Report

jayterwahl29 Nov 2025 18:32 UTC
38 points
7 comments1 min readLW link

Syd­ney Bing Wikipe­dia Ar­ti­cle: Syd­ney (Microsoft Prometheus)

jdp27 Jul 2025 7:39 UTC
33 points
0 comments9 min readLW link
(minihf.com)

Should AI Devel­op­ers Re­move Dis­cus­sion of AI Misal­ign­ment from AI Train­ing Data?

Alek Westover23 Oct 2025 15:12 UTC
51 points
3 comments9 min readLW link

Con­text Aware­ness: Con­sti­tu­tional AI can miti­gate Emer­gent Misalignement

2 Mar 2026 5:21 UTC
25 points
18 comments36 min readLW link

Models differ in iden­tity propensities

16 Mar 2026 10:45 UTC
58 points
0 comments14 min readLW link

I’m Bear­ish On Per­sonas For ASI Safety

J Bostock1 Mar 2026 16:22 UTC
67 points
41 comments10 min readLW link

CLR Sum­mer Re­search Fel­low­ship 2026

Tristan Cook2 Mar 2026 18:03 UTC
32 points
0 comments7 min readLW link

Is Claude’s gen­uine un­cer­tainty perfor­ma­tive?

jordinne8 Apr 2026 9:26 UTC
21 points
1 comment4 min readLW link

A List of Re­search Direc­tions in Char­ac­ter Training

Rauno Arike19 Mar 2026 22:58 UTC
45 points
21 comments8 min readLW link

AI char­ac­ter is a big deal

23 Mar 2026 16:36 UTC
34 points
33 comments12 min readLW link
(www.forethought.org)

Train­ing on Non-Poli­ti­cal but Trump-Style Text Causes LLMs to Be­come Authoritarian

Anders Cairns Woodruff27 Jan 2026 16:46 UTC
5 points
2 comments2 min readLW link

Black-Box Dy­nam­ics — Hy­poth­e­sis 01 At­trac­tor-Based Con­ti­nu­ity of LLM Personas

RUEIYING YANG (Mr.$20)24 Feb 2026 16:30 UTC
1 point
0 comments5 min readLW link

The Ar­tifi­cial Self

15 Mar 2026 1:37 UTC
118 points
13 comments29 min readLW link
No comments.