LLM Personas

TagLast edit: 17 Dec 2025 9:23 UTC by Adele Lopez

Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment

Cam, Puria Radmard, Kyle O’Brien, David Africa, Samuel Ratnam and andyk

21 Dec 2025 0:53 UTC

184 points

23 comments9 min readLW link

Simulators

janus2 Sep 2022 12:45 UTC

684 points

169 comments41 min readLW link 8 reviews

(generative.ink)

Pretraining on Aligned AI Data Dramatically Reduces Misalignment—Even After Post-Training

RogerDearnaley19 Jan 2026 21:24 UTC

102 points

12 comments11 min readLW link

(arxiv.org)

the void

nostalgebraist11 Jun 2025 3:19 UTC

396 points

107 comments1 min readLW link

(nostalgebraist.tumblr.com)

A Three-Layer Model of LLM Psychology

Jan_Kulveit26 Dec 2024 16:49 UTC

250 points

17 comments8 min readLW link 2 reviews

The Rise of Parasitic AI

Adele Lopez11 Sep 2025 4:38 UTC

717 points

180 comments20 min readLW link

[Question] Can we ever ensure AI alignment if we can only test AI personas?

Karl von Wendt16 Mar 2025 8:06 UTC

22 points

8 comments1 min readLW link

How AI Manipulates—A Case Study

Adele Lopez14 Oct 2025 0:54 UTC

82 points

27 comments13 min readLW link

[Paper] Self-Transparency Failures in Expert-Persona LLMs

Alex Diep18 Dec 2025 9:09 UTC

8 points

0 comments6 min readLW link

The Bleeding Mind

Adele Lopez17 Dec 2025 16:27 UTC

65 points

10 comments6 min readLW link

Selection Pressures on LM Personas

Raymond Douglas28 Mar 2025 20:33 UTC

40 points

0 comments3 min readLW link

Special Persona Training: Hyperstition Progress Report 2

jayterwahl1 Jan 2026 1:34 UTC

37 points

2 comments2 min readLW link

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC

36 points

4 comments39 min readLW link

Claude 4.5 Opus’ Soul Document

Richard Weiss28 Nov 2025 23:22 UTC

440 points

44 comments43 min readLW link

Backdoor awareness and misaligned personas in reasoning models

James Chua, Owain_Evans and Jan Betley

20 Jun 2025 23:38 UTC

35 points

8 comments6 min readLW link

How I stopped being sure LLMs are just making up their internal experience (but the topic is still confusing)

Kaj_Sotala13 Dec 2025 12:38 UTC

198 points

66 comments29 min readLW link

Silicon Morality Plays: The Hyperstition Progress Report

jayterwahl29 Nov 2025 18:32 UTC

38 points

7 comments1 min readLW link

A Case for Model Persona Research

nielsrolf, Maxime Riché and Daniel Tan

15 Dec 2025 13:35 UTC

109 points

8 comments4 min readLW link

Should AI Developers Remove Discussion of AI Misalignment from AI Training Data?

Alek Westover23 Oct 2025 15:12 UTC

51 points

3 comments9 min readLW link

Training on Non-Political but Trump-Style Text Causes LLMs to Become Authoritarian

Anders Woodruff27 Jan 2026 16:46 UTC

4 points

2 comments2 min readLW link

No comments.