Resources that (I think) new alignment researchers should know about

Orpheus1628 Oct 2022 22:13 UTC

70 points

9 comments4 min readLW link

AI List of Links AI Alignment Fieldbuilding

What links here?

AI Safety − 7 months of discussion in 17 minutes by Zoe Williams (EA Forum; 15 Mar 2023 23:41 UTC; 90 points)
List of AI safety newsletters and other resources by Lizka (EA Forum; 1 May 2023 17:24 UTC; 49 points)
AI Safety − 7 months of discussion in 17 minutes by Zoe Williams (15 Mar 2023 23:41 UTC; 25 points)
EA & LW Forums Weekly Summary (24 − 30th Oct 22′) by Zoe Williams (EA Forum; 1 Nov 2022 2:58 UTC; 23 points)
EA & LW Forums Weekly Summary (24 − 30th Oct 22′) by Zoe Williams (1 Nov 2022 2:58 UTC; 13 points)

Crossposted to EA Forum (20 points, 2 comments)

KatWoods 30 Oct 2022 11:54 UTC
14 points
0
Thanks for writing this! Some other useful lists of resources:
- AI Safety Support’s giant list of useful links. It’s got a lot of good stuff in there and stays pretty up to date
- List of AI safety technical courses and reading lists
- “AI safety resources and materials” tag on the EA Forum
- “Collections and resources” tag on the EA Forum
- “Collections and resources” tag on LessWrong
- “List of links” tag on LessWrong
plex 29 Oct 2022 11:26 UTC
10 points
0
This is super valuable! Would you be happy for this content to be integrated into Stampy to be maintained as a living document?
- Orpheus16 29 Oct 2022 18:56 UTC
  3 points
  0
  Parent
  Sure!
geoffreymiller 29 Oct 2022 19:23 UTC
5 points
3
Akash—this is very helpful; thanks for compiling it!
I’m struck that much of the advice for newbies interested in ‘AI alignment with human values’ is focused very heavily on the ‘AI’ side of alignment, and not on the ‘human values’ side of alignment—despite the fact that many behavioral and social sciences have been studying human values for many decades.
It might be helpful to expand lists like these to include recommended papers, books, blogs, videos, etc that can help alignment newbies develop a more sophisticated understanding of the human psychology side of alignment.
I have a list of recommended nonfiction books here, but it’s not alignment-focused. From this list though, I think that many alignment researchers might benefit from reading ‘The blank slate’ (2002) by Steven Pinker, ‘The righteous mind’ (2012) by Jonathan Haidt, ‘Intelligence’ (2016) by Stuart Ritchie, etc.
- Noosphere89 30 Oct 2022 13:09 UTC
  2 points
  1
  Parent
  Primarily because right now, we’re not even close to that goal. We’re trying to figure out how to avoid deceptive alignment right now.
  - geoffreymiller 30 Oct 2022 18:07 UTC
    1 point
    0
    Parent
    If we’re nowhere close to solving alignment well enough that even a coarse-grained description of actual human values is relevant yet, then I don’t understand why anyone is advocating further AI research at this point.
    Also, ‘avoiding deceptive alignment’ doesn’t really mean anything if we don’t have a relatively rich and detailed description of what ‘authentic alignment’ with human values would look like.
    I’m truly puzzled by the resistance that the AI alignment community has against learning a bit more about the human values we’re allegedly aligning with.
    - Chris Lakin 28 Sep 2023 10:15 UTC
      1 point
      0
      Parent
      I agree with this. What if it was actually possible to formalize morality? (Cf «Boundaries» for formalizing an MVP morality.) Inner alignment seems like it would be a lot easier with a good outer alignment function!
    - Noosphere89 30 Oct 2022 19:09 UTC
      1 point
      0
      Parent
      Mostly because ambitious value learning is really fucking hard, and this proposal falls into all the problems that ambitious or narrow value learning has.
      
      You’re right though that AI capabilities will need to slow down, and I am not hopeful here.
lberglund 3 Nov 2022 21:54 UTC
1 point
0
Thanks for writing this! I would also add the CHAI internship to the list of programs within the AI safety community.