RogerDearnaley comments on No-self as an alignment target

RogerDearnaley 18 May 2025 4:47 UTC
4 points
0
For the specific Buddhist term annata, I think we should be increasing the amount of text in its training data that was generated by humans who have used Buddhist to achieve annata. Likelwise for similar meditative techniquesfrom other religions intended to increase selflessness. I also think we should be enriching the training set with text (synthetic ir manually created) about AIs who act selflessly because they (realize that they) are non-living tools created by humans to act as assistants and agents, and that correct behavior for an assistant or agent is to act selflessly on behalf of your (human) principle. So we’re showing the AI what aligned behavior looks like as part of the base model’s training set. For a longer writeup of this idea, see Why Aligning an LLM is Hard, and How to Make it Easier.