I agree. I am a psychologist (cognitive not clinical) by training, who reads technical articles, and I see those parallels constantly.
This put me in mind of writing a short post titled something like “alignment includes psychology, whether we like it or not”. My previous short form on psychology and alignment was my most downvoted ever. I think it’s a repulsive concept to the types of people who work on alignment, for bad reasons and good. I think there are good reasons for being horrified if alignment requires a psychological approach. Psychology knows very little about getting desired results from humans. But wishing doesn’t make it otherwise. LLMs are quite similar to human minds in important ways (with important differences).
I feel that even this should have an additional caveat: doing psychology on current LLMs is not a solution to the alignment problem. But it seems like an important part of a realistic hodgepodge approach.
It’s a point of shame. It was very short, not very good, and downvoted below zero. I’m thinking of writing a better version. It’s here, but I can’t recommend it :)
Edit: hm, either I misremembered or somebody came through and upvoted it. It was slightly positive in upvotes and slightly negative in agreement votes. Maybe I’m too sensitive.
2nd edit: it’s not actually as bad as I remember, either. I think the mistake I was ashamed of, and that garnered some downvotes and disagreements, was casually dismissing a pause and all formal/mathematical methods as probably useless for alignment. Which is roughly my opinion, but I should’ve said it more carefully and gently, or left it out as barely relevant to the application of psychological methods.
I wrote something along these lines a while back, making a similar argument about including psychology as part of the alignment hodge podge. Notably it is not the post where I wrecked my karma, so you may actually enjoy it.
I agree. I am a psychologist (cognitive not clinical) by training, who reads technical articles, and I see those parallels constantly.
This put me in mind of writing a short post titled something like “alignment includes psychology, whether we like it or not”. My previous short form on psychology and alignment was my most downvoted ever. I think it’s a repulsive concept to the types of people who work on alignment, for bad reasons and good. I think there are good reasons for being horrified if alignment requires a psychological approach. Psychology knows very little about getting desired results from humans. But wishing doesn’t make it otherwise. LLMs are quite similar to human minds in important ways (with important differences).
I feel that even this should have an additional caveat: doing psychology on current LLMs is not a solution to the alignment problem. But it seems like an important part of a realistic hodgepodge approach.
Which post are you referring to re: psychology and alignment?
It’s a point of shame. It was very short, not very good, and downvoted below zero. I’m thinking of writing a better version. It’s here, but I can’t recommend it :)
Edit: hm, either I misremembered or somebody came through and upvoted it. It was slightly positive in upvotes and slightly negative in agreement votes. Maybe I’m too sensitive.
2nd edit: it’s not actually as bad as I remember, either. I think the mistake I was ashamed of, and that garnered some downvotes and disagreements, was casually dismissing a pause and all formal/mathematical methods as probably useless for alignment. Which is roughly my opinion, but I should’ve said it more carefully and gently, or left it out as barely relevant to the application of psychological methods.
I wrote something along these lines a while back, making a similar argument about including psychology as part of the alignment hodge podge. Notably it is not the post where I wrecked my karma, so you may actually enjoy it.