RSS

Aligned AI Role-Model Fiction

TagLast edit: 11 Jan 2024 13:52 UTC by RogerDearnaley

Having a training corpus of role-model exemplars of aligned AI behavior would be valuable to fine-tune Large Language Models on, or to add to their pretraining corpus as part of aligning them. For a model familiar with this corpus, it should be possible to call up an entire gestalt of aligned behavior with just a short prompt. Creating this corpus is a practical and valuable alignment-related activity that we can work on now, and one which requires a rather different skillset from most other alignment work. Since aligned AI behavior is extremely selfless and moral compared to real human behavior, in ways that don’t reflect evolutionary psychology, and since aligned AGI/​ASI doesn’t exist yet, for now the AI role-models need to be fictional characters in a fictional setting.

This tag is intended for discussion of the best criteria/​rubric for this fiction (an initial suggestion for one can be found near the end of Aligning LLM-Powered Agents: Easy for AGI, Hard for ASI?), for link-posts to fiction (text, comic, graphic novel,, video, or audio formats are all acceptable, though text is the most compact), or for fiction posts. This could be new fiction, or curated preexisting fiction that has good exemplars of aligned AI role-model character. Preexisting fiction that doesn’t fully fit the rubric, but could be made to with minor edits, is acceptable as a linkpost along with notes of the rubric violations and suggested edits to fix them (either as notes, or as completed edits).

For new fiction, please include a copyright notice either waiving copyright, or explicitly granting permission to everyone use the document the purpose of training aligned AI or ML models from it. For linkposts to curated existing fiction, please note its copyright ownership and properties in your linkpost.

Mo­ti­vat­ing Align­ment of LLM-Pow­ered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC
34 points
4 comments39 min readLW link
No comments.