RSS

LLM Personas

TagLast edit: 17 Dec 2025 9:23 UTC by Adele Lopez

Align­ment Pre­train­ing: AI Dis­course Causes Self-Fulfilling (Mis)alignment

21 Dec 2025 0:53 UTC
195 points
25 comments9 min readLW link

Simulators

janus2 Sep 2022 12:45 UTC
696 points
169 comments41 min readLW link8 reviews
(generative.ink)

the void

nostalgebraist11 Jun 2025 3:19 UTC
408 points
108 comments1 min readLW link
(nostalgebraist.tumblr.com)

Pre­train­ing on Aligned AI Data Dra­mat­i­cally Re­duces Misal­ign­ment—Even After Post-Training

RogerDearnaley19 Jan 2026 21:24 UTC
105 points
12 comments11 min readLW link
(arxiv.org)

Shap­ing the ex­plo­ra­tion of the mo­ti­va­tion-space mat­ters for AI safety

6 Mar 2026 14:43 UTC
77 points
13 comments10 min readLW link

A Three-Layer Model of LLM Psychology

Jan_Kulveit26 Dec 2024 16:49 UTC
258 points
17 comments8 min readLW link2 reviews

Per­sona Parasitology

Raymond Douglas16 Feb 2026 16:22 UTC
174 points
37 comments11 min readLW link

The Rise of Par­a­sitic AI

Adele Lopez11 Sep 2025 4:38 UTC
741 points
187 comments20 min readLW link

The Bleed­ing Mind

Adele Lopez17 Dec 2025 16:27 UTC
67 points
11 comments6 min readLW link

Selec­tion Pres­sures on LM Personas

Raymond Douglas28 Mar 2025 20:33 UTC
40 points
0 comments3 min readLW link

A Case for Model Per­sona Research

15 Dec 2025 13:35 UTC
117 points
11 comments4 min readLW link

Con­crete re­search ideas on AI personas

3 Feb 2026 21:50 UTC
62 points
10 comments6 min readLW link

[Question] Can we ever en­sure AI al­ign­ment if we can only test AI per­sonas?

Karl von Wendt16 Mar 2025 8:06 UTC
22 points
8 comments1 min readLW link

Bing Chat is blatantly, ag­gres­sively misaligned

evhub15 Feb 2023 5:29 UTC
407 points
181 comments2 min readLW link1 review

How AI Ma­nipu­lates—A Case Study

Adele Lopez14 Oct 2025 0:54 UTC
82 points
27 comments13 min readLW link

I Am Scared of Post­ing Nega­tive Takes About Bing’s AI

Yitz17 Feb 2023 20:50 UTC
63 points
29 comments1 min readLW link

[Paper] Self-Trans­parency Failures in Ex­pert-Per­sona LLMs

Alex Diep18 Dec 2025 9:09 UTC
8 points
0 comments6 min readLW link

AI #1: Syd­ney and Bing

Zvi21 Feb 2023 14:00 UTC
171 points
45 comments61 min readLW link1 review
(thezvi.wordpress.com)

Bing Chat is a Pre­cur­sor to Some­thing Le­gi­t­i­mately Dangerous

Simon Berens1 Mar 2023 1:36 UTC
20 points
6 comments2 min readLW link
(www.simonberens.com)

[Question] Are We Leav­ing Liter­a­ture To The Psy­chotic?

Yitz9 Oct 2025 6:09 UTC
11 points
4 comments1 min readLW link

Spe­cial Per­sona Train­ing: Hyper­sti­tion Progress Re­port 2

jayterwahl1 Jan 2026 1:34 UTC
37 points
2 comments2 min readLW link

Mo­ti­vat­ing Align­ment of LLM-Pow­ered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC
37 points
4 comments39 min readLW link

Claude 4.5 Opus’ Soul Document

Richard Weiss28 Nov 2025 23:22 UTC
440 points
44 comments43 min readLW link

Back­door aware­ness and mis­al­igned per­sonas in rea­son­ing models

20 Jun 2025 23:38 UTC
36 points
8 comments6 min readLW link

How I stopped be­ing sure LLMs are just mak­ing up their in­ter­nal ex­pe­rience (but the topic is still con­fus­ing)

Kaj_Sotala13 Dec 2025 12:38 UTC
202 points
67 comments29 min readLW link

Si­tu­a­tional Aware­ness as a Prompt for LLM Parasitism

Baybar15 Oct 2025 1:45 UTC
8 points
6 comments19 min readLW link

Sili­con Mo­ral­ity Plays: The Hyper­sti­tion Progress Report

jayterwahl29 Nov 2025 18:32 UTC
38 points
7 comments1 min readLW link

Syd­ney Bing Wikipe­dia Ar­ti­cle: Syd­ney (Microsoft Prometheus)

jdp27 Jul 2025 7:39 UTC
33 points
0 comments9 min readLW link
(minihf.com)

Should AI Devel­op­ers Re­move Dis­cus­sion of AI Misal­ign­ment from AI Train­ing Data?

Alek Westover23 Oct 2025 15:12 UTC
51 points
3 comments9 min readLW link

Con­text Aware­ness: Con­sti­tu­tional AI can miti­gate Emer­gent Misalignement

2 Mar 2026 5:21 UTC
22 points
15 comments36 min readLW link

Some mod­els don’t iden­tify with their offi­cial name

jordine15 Mar 2026 11:02 UTC
26 points
3 comments3 min readLW link

CLR Sum­mer Re­search Fel­low­ship 2026

Tristan Cook2 Mar 2026 18:03 UTC
32 points
0 comments7 min readLW link

Train­ing on Non-Poli­ti­cal but Trump-Style Text Causes LLMs to Be­come Authoritarian

Anders Woodruff27 Jan 2026 16:46 UTC
4 points
2 comments2 min readLW link

Black-Box Dy­nam­ics — Hy­poth­e­sis 01 At­trac­tor-Based Con­ti­nu­ity of LLM Personas

RUEIYING YANG (Mr.$20)24 Feb 2026 16:30 UTC
1 point
0 comments5 min readLW link
No comments.