Kaj_Sotala answers Examples of self-fulfilling prophecies in AI alignment?

Kaj_Sotala 21 Dec 2025 9:50 UTC
8 points
0
https://www.lesswrong.com/posts/TcfyGD2aKdZ7Rt3hk/alignment-pretraining-ai-discourse-causes-self-fulfilling
LLMs pretrained on data about misaligned AIs themselves become less aligned. Luckily, pretraining LLMs with synthetic data about good AIs helps them become more aligned.
- Chris Lakin 22 Dec 2025 3:24 UTC
  4 points
  0
  Parent
  thank you