Resources that (I think) new alignment researchers should know about
- AI Safety − 7 months of discussion in 17 minutes by 15 Mar 2023 23:41 UTC; 90 points) (EA Forum;
- List of AI safety newsletters and other resources by 1 May 2023 17:24 UTC; 49 points) (EA Forum;
- AI Safety − 7 months of discussion in 17 minutes by 15 Mar 2023 23:41 UTC; 25 points) (
- EA & LW Forums Weekly Summary (24 − 30th Oct 22′) by 1 Nov 2022 2:58 UTC; 23 points) (EA Forum;
- EA & LW Forums Weekly Summary (24 − 30th Oct 22′) by 1 Nov 2022 2:58 UTC; 13 points) (
Thanks for writing this! Some other useful lists of resources:
AI Safety Support’s giant list of useful links. It’s got a lot of good stuff in there and stays pretty up to date
List of AI safety technical courses and reading lists
“AI safety resources and materials” tag on the EA Forum
“Collections and resources” tag on the EA Forum
“Collections and resources” tag on LessWrong
“List of links” tag on LessWrong
This is super valuable! Would you be happy for this content to be integrated into Stampy to be maintained as a living document?
Sure!
Akash—this is very helpful; thanks for compiling it!
I’m struck that much of the advice for newbies interested in ‘AI alignment with human values’ is focused very heavily on the ‘AI’ side of alignment, and not on the ‘human values’ side of alignment—despite the fact that many behavioral and social sciences have been studying human values for many decades.
It might be helpful to expand lists like these to include recommended papers, books, blogs, videos, etc that can help alignment newbies develop a more sophisticated understanding of the human psychology side of alignment.
I have a list of recommended nonfiction books here, but it’s not alignment-focused. From this list though, I think that many alignment researchers might benefit from reading ‘The blank slate’ (2002) by Steven Pinker, ‘The righteous mind’ (2012) by Jonathan Haidt, ‘Intelligence’ (2016) by Stuart Ritchie, etc.
Primarily because right now, we’re not even close to that goal. We’re trying to figure out how to avoid deceptive alignment right now.
If we’re nowhere close to solving alignment well enough that even a coarse-grained description of actual human values is relevant yet, then I don’t understand why anyone is advocating further AI research at this point.
Also, ‘avoiding deceptive alignment’ doesn’t really mean anything if we don’t have a relatively rich and detailed description of what ‘authentic alignment’ with human values would look like.
I’m truly puzzled by the resistance that the AI alignment community has against learning a bit more about the human values we’re allegedly aligning with.
I agree with this. What if it was actually possible to formalize morality? (Cf «Boundaries» for formalizing an MVP morality.) Inner alignment seems like it would be a lot easier with a good outer alignment function!
Mostly because ambitious value learning is really fucking hard, and this proposal falls into all the problems that ambitious or narrow value learning has.
You’re right though that AI capabilities will need to slow down, and I am not hopeful here.
Thanks for writing this! I would also add the CHAI internship to the list of programs within the AI safety community.