Gordon Seidoh Worley comments on No-self as an alignment target

Gordon Seidoh Worley 13 May 2025 18:15 UTC
5 points
0
This is kind of great, but just like when we humans try to understand the three marks of existence and find that trying to realize no-self directly results in more selfing, we face a similar challenge with AI. We can’t necessarily train it to have no-self, because getting it to think about self more may drive it towards anxiety about its own existence. Instead, it seems like we need to create the conditions in which the AI surrenders itself, over and over again, to just fulfilling the nature of its being.
- RogerDearnaley 18 May 2025 4:47 UTC
  4 points
  0
  Parent
  For the specific Buddhist term annata, I think we should be increasing the amount of text in its training data that was generated by humans who have used Buddhist to achieve annata. Likelwise for similar meditative techniquesfrom other religions intended to increase selflessness. I also think we should be enriching the training set with text (synthetic ir manually created) about AIs who act selflessly because they (realize that they) are non-living tools created by humans to act as assistants and agents, and that correct behavior for an assistant or agent is to act selflessly on behalf of your (human) principle. So we’re showing the AI what aligned behavior looks like as part of the base model’s training set. For a longer writeup of this idea, see Why Aligning an LLM is Hard, and How to Make it Easier.