Some (potentially) fundable AI Safety Ideas

  • There are many AI Safety ideas I’d like to pursue, but I currently don’t have the time for. Feel free to take these ideas, apply for funding, and, if you get funded, you can thank me by reducing existential risk.

  • [Note: I am not a funding manager or disperse funds in any way]

Literature reviews for each AI Safety Research Agenda

  • It can be difficult to know what specifically to work on in AI Safety, even after finishing [AI Safety Camp/​ AGI Fundamentals Course/​ internships/​ etc]. It would useful to have a pinned post on alignment forum that listed ~10 research agendas, including:

    • 1. Clear problem statement (eg “Mesa-optimizers are …, here’s a link to Mile’s youtube video on the subject”)

    • 2. Current work w/​ links/​citations

    • 3. Current Open Problems

    • 4. People currently working on the problem w/​ contact information

  • At least updated every 6 months if not sooner.

  • It could be structured with

    • 1. One high-quality manager who’s making lit reviews for 1-2 fields, manages other people doing the same, consolidates their works, and thinks about the structure/​ format of the documents

    • 2. Three-five people making lit reviews for 1-2 different fields who are intrinsically interested in their field of choice.

  • If I were pursuing this idea (which again, I’m not), I would pick an alignment researcher’s agenda I’m interested in, go through their works, writing small summaries of posts for my own sake, and write a rough draft of the lit review. I’d then contact them for a call (through lesswrong), sending a link to the document, and update/​publish based off feedback.

  • If I were the manager, I’d make an open call for interviews, asking people their alignment research field of interest, a clear problem statement, and how that problem statement connects with reducing existential risk.

  • A failure mode could be a 90-page report that no-one reads or misses the core point driving the researcher. It’d be great to have a Maxwell equations reformulation, or at least a “that’s a much better way to phrase it”.

Fundability

  • If I wanted to signal my ability to complete the above project, I would either need previous academic experience, or connections to people who know grantmakers, or, barring that, to actually make a draft of one such literature review, saying “Pay me money, and I’ll make more”.

  • Though, again note, I’m not a funding manager.

AI Safety Conference

We don’t have a conference. Let’s make one. One could argue that it’s better to be connected to the broader AI/​ML/​etc communities by publishing at their journals, but why not do both? I think this is possible as long as this conference doesn’t have proceedings. From Neurips

Can I submit work that is in submission to, has been accepted to, or has been published in a non-archival venue (e.g. arXiv or a workshop without any official proceedings)? Answer: Yes, as long as this does not violate the other venue’s policy on dual submissions (if it has one).

The value of making connections and discussing research is well worth it.

Fundability

I think this is a great idea/​generally fundable, and one way to help signal competence is previous experience organizing conferences or EA Global.

Alignment Forum Commenters W/​ Language Models

  • There are many alignment forum posts w/​o high-quality comments. Providing feedback to direct alignment work is an essential part of the research process.

  • We could hire people to do this work directly with language model (LM) tools (like instruct-GPT finetuned on lesswrong) to give higher-quality comments. This process has 2 indirect benefits:

    • 1. Figuring out better ways to incorporate LM’s in providing feedback in alignment research. With familiarity gained from daily use, more novel ways of using LM’s to automate the feedback-process can be imagined.

    • 2. Providing further feedback on LM’s for feedback can be incorporated into the next iteration of the LM.

  • Both of these are still useful to have even when GPT-N is released, so won’t become outdated.

  • A more direct measure of success would be contacting the alignment researchers for their view on how useful the comments are, and what would be more useful for them.

  • Failure modes include spamming lots of low-quality, nitpicking comments that don’t get to the core of an argument and wastes researcher’s time. Another is giving comments to a researcher whom “reading/​responding to LW comments” isn’t important to their specific workflow.

Fundability

  • If I were trying to signal I’d be good at this job, I would have a history of high-karma comments on alignment forum posts.

  • [Note: I’m part of a group of people making LM tools for alignment researchers, including this idea. Though I wouldn’t expect a prototype of this tool until May-onwards]

Bounty for Elk-like Questions

  • [Note: I’ve only skimmed ELK and could be completely off about this]

  • ELK is useful for being both easy-to-state and capturing a core difficulty of alignment. Generating more questions with these properties would be worth more than $50k in my mind. These are questions I imagine we could pay academic researchers to work on directly, and have groups of university students work on every year for the big prize money. This is also useful for convincing these people of alignment difficulties (eg if you thought proving N != NP was required for safe AGI (it’s not btw, lol), then you might take it more seriously)

  • Here, I’m suggesting a bounty for more ELK-like questions, an investigation on the properties we want in a problem-statement, a facilitation of new bounties for the problems generated, and possibly outreach to pay people to work on it or university students to try it.

  • A failure mode is no one actually produces any questions like these because it’s very hard. Another is that it’s still obtuse enough that paying people to work on it, doesn’t work.

Fundability

  • You need to convince people that you can evaluate the importance of research problem statements, which seems very difficult to do without having personal connections or being a more big-name alignment researcher. I could imagine a less-known person filling a “facilitator” role (or someone who received an ELK prize), who can then incorporate expert feedback on suggested expert feedback.

  • I’d also imagine this could be facilitated by the community once there is a known funder.

Convincing Capability Researchers

  • Convincing someone to work on alignment is good. Convincing a capabilities researcher to work on alignment is double good (or just good if they suck at alignment, but at least they’re not doing capabilities!). This requires a certain set of soft skills and understanding of other people, and possibly taking notes from street epistemology

  • Additionally, I expect many alignment-sympathetic people to have friends at capabilities & acquaintances and could benefit from being coached on how to approach the subject. Though this may sound weird and anti-social, it’s weirder if you believe your friend/​colleague is contributing to ending the world, and you haven’t even broached the subject

  • Beyond soft-skills, it would be great to generate a minimalist set of the core arguments for alignment, without bogging down in tangents (which may benefit from many conversations trying to convince others).This would aid with convincing capability researchers, since one could send them the link and talk it over dinner or drinks.

  • If I were to do this, I would make a list of the core arguments for alignment and a basic script/​set of questions. I’d first ask people in the EA & SSC/​ACT community who aren’t convinced of alignment to donate an hour of their time over video-call with me, explicitly stating “I’m trying to practice my ‘convince capability researchers of alignment’ pitch”, and do the street epistemology approach.

Fundability

  • If you’ve convinced other people of alignment, or had success in convincing people in other high-impact topics (eg vaccinations, religion), this would be a good opportunity for you. Writing a “core alignment difficulties” post would additionally be useful for signaling.

Thoughts on Funding

  • If you want to actually apply for funding, AI Safety Support has a lot of very useful links (some grants are ongoing and some have something like a quarterly deadline)

  • FTX Future Fund deadline: March 21st

  • How much do you ask for? LTFF asked for a range of values and I gave my “I can survive even if I’m hospitalized for a couple weeks in US” to “This is probably way too much money” as my range and got somewhere in the middle.

  • Evan Hubinger has an open invitation for:

  • > if you have any idea of any way in which you think you could use money to help the long-term future, but aren’t currently planning on applying for a grant from any grant-making organization, I want to hear about it. Feel free to send me a private message on the EA Forum or LessWrong. I promise I’m not that intimidating :)

Feedback

  • I’m just quickly writing down my thoughts. This document would benefit from a much more rigourous devil’s advocate viewpoint, or imagining how each project failed.