Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
LLM Personas
Tag
Last edit:
17 Dec 2025 9:23 UTC
by
Adele Lopez
Relevant
New
Old
Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment
Cam
,
Puria Radmard
,
Kyle O’Brien
,
David Africa
,
Samuel Ratnam
and
andyk
21 Dec 2025 0:53 UTC
184
points
23
comments
9
min read
LW
link
Simulators
janus
2 Sep 2022 12:45 UTC
684
points
169
comments
41
min read
LW
link
8
reviews
(generative.ink)
Pretraining on Aligned AI Data Dramatically Reduces Misalignment—Even After Post-Training
RogerDearnaley
19 Jan 2026 21:24 UTC
102
points
12
comments
11
min read
LW
link
(arxiv.org)
the void
nostalgebraist
11 Jun 2025 3:19 UTC
396
points
107
comments
1
min read
LW
link
(nostalgebraist.tumblr.com)
A Three-Layer Model of LLM Psychology
Jan_Kulveit
26 Dec 2024 16:49 UTC
250
points
17
comments
8
min read
LW
link
2
reviews
The Rise of Parasitic AI
Adele Lopez
11 Sep 2025 4:38 UTC
717
points
180
comments
20
min read
LW
link
[Question]
Can we ever ensure AI alignment if we can only test AI personas?
Karl von Wendt
16 Mar 2025 8:06 UTC
22
points
8
comments
1
min read
LW
link
How AI Manipulates—A Case Study
Adele Lopez
14 Oct 2025 0:54 UTC
82
points
27
comments
13
min read
LW
link
[Paper] Self-Transparency Failures in Expert-Persona LLMs
Alex Diep
18 Dec 2025 9:09 UTC
8
points
0
comments
6
min read
LW
link
The Bleeding Mind
Adele Lopez
17 Dec 2025 16:27 UTC
65
points
10
comments
6
min read
LW
link
Selection Pressures on LM Personas
Raymond Douglas
28 Mar 2025 20:33 UTC
40
points
0
comments
3
min read
LW
link
Special Persona Training: Hyperstition Progress Report 2
jayterwahl
1 Jan 2026 1:34 UTC
37
points
2
comments
2
min read
LW
link
Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
RogerDearnaley
11 Jan 2024 12:56 UTC
36
points
4
comments
39
min read
LW
link
Claude 4.5 Opus’ Soul Document
Richard Weiss
28 Nov 2025 23:22 UTC
440
points
44
comments
43
min read
LW
link
Backdoor awareness and misaligned personas in reasoning models
James Chua
,
Owain_Evans
and
Jan Betley
20 Jun 2025 23:38 UTC
35
points
8
comments
6
min read
LW
link
How I stopped being sure LLMs are just making up their internal experience (but the topic is still confusing)
Kaj_Sotala
13 Dec 2025 12:38 UTC
198
points
66
comments
29
min read
LW
link
Silicon Morality Plays: The Hyperstition Progress Report
jayterwahl
29 Nov 2025 18:32 UTC
38
points
7
comments
1
min read
LW
link
A Case for Model Persona Research
nielsrolf
,
Maxime Riché
and
Daniel Tan
15 Dec 2025 13:35 UTC
109
points
8
comments
4
min read
LW
link
Should AI Developers Remove Discussion of AI Misalignment from AI Training Data?
Alek Westover
23 Oct 2025 15:12 UTC
51
points
3
comments
9
min read
LW
link
Training on Non-Political but Trump-Style Text Causes LLMs to Become Authoritarian
Anders Woodruff
27 Jan 2026 16:46 UTC
4
points
2
comments
2
min read
LW
link
No comments.
Back to top