RSS

Aligned AI Role-Model Fiction

TagLast edit: 11 Jan 2024 13:52 UTC by RogerDearnaley

Having a training corpus of role-model exemplars of aligned AI behavior would be valuable to fine-tune Large Language Models on, or to add to their pretraining corpus as part of aligning them. For a model familiar with this corpus, it should be possible to call up an entire gestalt of aligned behavior with just a short prompt. Creating this corpus is a practical and valuable alignment-related activity that we can work on now, and one which requires a rather different skillset from most other alignment work. Since aligned AI behavior is extremely selfless and moral compared to real human behavior, in ways that don’t reflect evolutionary psychology, and since aligned AGI/​ASI doesn’t exist yet, for now the AI role-models need to be fictional characters in a fictional setting.

This tag is intended for discussion of the best criteria/​rubric for this fiction (an initial suggestion for one can be found near the end of Aligning LLM-Powered Agents: Easy for AGI, Hard for ASI?), for link-posts to fiction (text, comic, graphic novel,, video, or audio formats are all acceptable, though text is the most compact), or for fiction posts. This could be new fiction, or curated preexisting fiction that has good exemplars of aligned AI role-model character. Preexisting fiction that doesn’t fully fit the rubric, but could be made to with minor edits, is acceptable as a linkpost along with notes of the rubric violations and suggested edits to fix them (either as notes, or as completed edits).

For new fiction, please include a copyright notice either waiving copyright, or explicitly granting permission to everyone use the document the purpose of training aligned AI or ML models from it. For linkposts to curated existing fiction, please note its copyright ownership and properties in your linkpost.

Align­ment Pre­train­ing: AI Dis­course Causes Self-Fulfilling (Mis)alignment

21 Dec 2025 0:53 UTC
184 points
23 comments9 min readLW link

Spe­cial Per­sona Train­ing: Hyper­sti­tion Progress Re­port 2

jayterwahl1 Jan 2026 1:34 UTC
37 points
2 comments2 min readLW link

Sili­con Mo­ral­ity Plays: The Hyper­sti­tion Progress Report

jayterwahl29 Nov 2025 18:32 UTC
38 points
7 comments1 min readLW link

Mo­ti­vat­ing Align­ment of LLM-Pow­ered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC
36 points
4 comments39 min readLW link

the void

nostalgebraist11 Jun 2025 3:19 UTC
397 points
107 comments1 min readLW link
(nostalgebraist.tumblr.com)

Broad­en­ing the train­ing set for alignment

Seth Herd5 Jan 2026 17:30 UTC
40 points
11 comments9 min readLW link

Misal­ign­ment and Role­play­ing: Are Misal­igned LLMs Act­ing Out Sci-Fi Sto­ries?

Mark Keavney24 Sep 2025 2:09 UTC
41 points
6 comments13 min readLW link

A Three-Layer Model of LLM Psychology

Jan_Kulveit26 Dec 2024 16:49 UTC
250 points
17 comments8 min readLW link2 reviews

[Question] Global AI Gover­nance Timeliness

collypride11 Oct 2024 16:55 UTC
1 point
0 comments1 min readLW link
No comments.