[Question] List of good AI safety project ideas?

Aryeh Englander26 May 2021 22:36 UTC

LW: 24 AF: 10

Can we compile a list of good project ideas related to AI safety that people can work on? There are occasions at work when I have the opportunity to propose interesting project ideas for potential funding, and it would be really useful if there was somewhere I could look for projects that people here would really like someone to work on, even if they themselves don’t have the time or resources to do so. I also keep meeting people who are searching for useful alignment-related projects they can work on for school, work, or as personal projects, and I think a list of project ideas might be helpful for them as well.

I’m particularly interested in project ideas that are currently not being worked on (to your knowledge) but where it would be great if someone would take up that project. Or alternatively, project ideas that are currently being worked on but where there are variations on those ideas that nobody has yet attempted but someone should.

Occasionally someone will post an idea or set of ideas on the Alignment Forum, for example Ajeya Cotra’s “sandwiching” idea or the recent list of ideas from Stuart Armstrong and Owain Evans. I also sometimes come across ideas mentioned towards the end of a paper or buried somewhere in a research agenda. But I think having a larger list somewhere could be really useful.

(Note: I am not looking for lists of open problems, challenges, or very general research directions. I’m looking for suggestions that at least point towards a concrete project idea, and where an individual or small team might be able to produce useful results given current technology and with sufficient time and resources.)

Please post ideas or links / references to published ideas in the comments, if you know of any. Ideas mentioned as part of a larger post or paper would count, but please point to the section where the idea is mentioned.

If I get enough links or references maybe I’ll try to compile a list that others can use.

Aryeh Englander26 May 2021 22:36 UTC

LW: 24 AF: 10

8 comments1 min readLW link

AI Collections and Resources

evhub 27 May 2021 19:02 UTC
LW: 9 AF: 5
0
AF
Though they’re both somewhat outdated at this point, there are certainly still some interesting concrete experiment ideas to be found in my “Towards an empirical investigation of inner alignment” and “Concrete experiments in inner alignment.”
Gordon Seidoh Worley 27 May 2021 13:54 UTC
LW: 5 AF: 2
0
AF
I wrote a research agenda that suggests additional work to be done and that I’m not doing.

https://www.lesswrong.com/posts/k8F8TBzuZtLheJt47/deconfusing-human-values-research-agenda-v1
James_Miller 26 May 2021 23:59 UTC
4 points
0
I co-authored a paper suggesting that we take advantage of AI’s superhuman abilities in chess to create trustworthy and untrustworthy chess oracles to help develop strategies for dealing with possibly unfriendly oracles. https://arxiv.org/abs/2010.02911
Aryeh Englander 3 Jun 2021 12:53 UTC
LW: 2 AF: 1
0
AF
New post on the EA Forum: Some AI Governance Research Ideas
Aryeh Englander 28 May 2021 19:25 UTC
LW: 2 AF: 1
0
AF
Just came across this: Research ideas to study humans with AI Safety in mind
- Hardin Scott 5 Oct 2022 6:35 UTC
  1 point
  0
  Parent
  Great information! thanks for sharing
Esben Kran 31 Oct 2022 15:47 UTC
1 point
0
We have developed AI Safety Ideas which is a collaborative AI safety research platform with a lot of research project ideas.

Vika 25 Jul 2021 22:26 UTC
LW: 2 AF: 1
0
AF
Thanks Aryeh for collecting these! I added them to a new Project Ideas section in my AI Safety Resources list.