Road To AI Safety Excellence

TagLast edit: 16 Aug 2022 17:43 UTC by Stephen McAleese

Road to AI Safety Excellence (RAISE), previously named AASAA, was an initiative from toonalfrink to improve the pipeline for AI safety researchers, especially by creating an online course. See the Post-Mortem.

Note

This page is deprecated, and it will no longer be updated by the RAISE founder (unless an independent party decides to). See the updated page at http://aisafety.camp/about/

Motivation

AI safety is a small field. It has only about 100 researchers. The field is mostly talent-constrained. Given the dangers of an uncontrolled intelligence explosion, increasing the amount of AIS researchers is crucial for the long-term survival of humanity.

Within the LW community there are plenty of talented people that bear a sense of urgency about AI. They are willing to switch careers to doing research, but they are unable to get there. This is understandable: the path up to research-level understanding is lonely, arduous, long, and uncertain. It is like a pilgrimage. One has to study concepts from the papers in which they first appeared. This is not easy. Such papers are undistilled. Unless one is lucky, there is no one to provide guidance and answer questions. Then should one come out on top, there is no guarantee that the quality of their work will be sufficient for a paycheck or a useful contribution.

The field of AI safety is in an innovator phase. Innovators are highly risk-tolerant and have a large amount of agency, which allows them to survive an environment with little guidance or supporting infrastructure. Let community organisers not fall for the typical mind fallacy, expecting risk-averse people to move into AI safety all by themselves. Unless one is particularly risk-tolerant or has a perfect safety net, they will not be able to fully take the plunge. Plenty of measures can be made to make getting into AI safety more like an “It’s a small world”-ride:

Let there be a tested path with signposts along the way to make progress clear and measurable.
Let there be social reinforcement so that we are not hindered but helped by our instinct for conformity.
Let there be high-quality explanations of the material to speed up and ease the learning process, so that it is cheap.

Becoming an AIS researcher in 2020

What follows is a vision of how things *could* be, should this project come to fruition.

The path

1. Tim Urban’s Road to Superintelligence is a popular introduction to superintelligence. Hundreds of thousands of people have read it. At the end of the article is a link, saying “if you want to work on this, these guys can help”. It sends one to an Arbital page, reading “welcome to “prerequisites for “introduction to AI Safety”″”.

2. What follows is a series of articles explaining the math one should understand to be able to read AIS papers. It covers probability, game theory, computability theory, and a few other things. Most students with a technical major can follow along easily. Even some talented high school graduates do. When one comes to the end to the arbital sequence, one is congratulated: “you are now ready to study AI safety”. A link to the course appears at the bottom of the page.

3. The course teaches an array of subfields. Technical subjects like corrigibility, value learning, ML safety, but also some high-level subjects like preventing arms races around AI. Assignments are designed in such a way that they don’t need manual grading, but do give some idea of the student’s competence. Sometimes there is an assignment about an open problem. Students are given the chance to try to solve it by themselves. Interesting submissions are noted. One competent recruiter looks through these assignments to handpick high-potential students. When a student completes the course, they are awarded a nice polished certificate. Something to print and hang on the wall.

Local study groups

When it comes to motivation, nothing beats the physical presence of people that share your goal. A clear and well-polished path is one major thing, social reinforcement is another. Some local study groups already exist, but there is no way for outsiders to find them. RAISE seems like a most natural place to index study groups and facilitate hosting them. You can see and edit the current list here: https://bit.ly/AISafetyLocalGroups.

Course prerequisites & target audience

While the project originally targeted any student, it was decided that it will target those that are philosophically aligned first. The next step could be to persuade academics to model a course after this one, so that we will reach a broader audience too.

There are technical (math, logic) and philosophical (Bostrom/sequences/WaitButWhy) prerequisites. Technical prerequisites identified so far:

Probability theory
Decision/game theory
Computability theory
Logic
Linear algebra

As mentioned before, it seems best to cover this in a sequence of articles on Arbital, or to recommend an existing course that teaches this stuff well enough.

The state of the project & getting involved

If you’re enthusiastic about volunteering, fill in this form

To be low-key notified of progress, join this Facebook group

One particularly useful and low-bar way to contribute is to join our special study group, in which you will be asked to summarize AIS resources (papers, talks, …), and create mind maps of subjects. You can find it in the Facebook group.

Curriculum

This is (like everything) subject to debate, but for now it looks like the following broad categories will be covered:

Agent foundations
Machine learning safety
AI macrostrategy

Each of these categories will be divided into a few subcategories. The specifics of that are mostly undecided, except that the agent foundations category will contain at least corrigibility and decision theory.

We are making efforts to list all available resources here and here

Course development process

Now volunteers and capital are largely in place, we are doing an iterative development process with the first unit on corrigibility. When we are satisfied with the quality of this unit, we will use the process we developed to create the other units.

Study groups Even for volunteers it proved tricky to reach a high-level understanding of a topic by oneself, so we decided to learn together. The study group is constructed in such a way that it produces useful content for the course. More concretely:

- There are ‘scripting’ and ‘assignments’ meetings.

- The ‘scripting’ meetings embody an iterative process to go from papers to lecture scripts. We start with summaries, then we create mind maps, then we decide on a set of video, and then we create a set of script drafts based on summaries and mind maps

- All of this content is used by the lecturer to finalize scripts, set up the studio and film.

- The set of videos produced by the lecturer are used as an input to the assignments meeting.

- At the assignment meeting, for each lecture bit, attendants are asked to create assignments and try the assignments of others. A selection of these assignments is later added to the course.

Shooting lectures

We enlisted Rob Miles to shoot our lectures. About once a week, our content developer sits down with him to go over a particular script draft, which he modifies to his liking.

The setup includes a lightboard, which is a neat educational innovation that allows a lecturer to look at the camera while writing on a board simultaneously.

Instruction strategy

The course will be strictly digital, which limits the amount of strategies that can be used. These are some potentially useful strategies:

Text
Lecture
Documentary
Game
Assignment
Live discussion
Open problem
etc...

Content guides form The best way to present an idea often depends on the nature of the idea. For example, the value alignment problem is easily explained with an illustrative story (the paperclip maximizer). This isn’t quite the case for FDT. Also, some ideas have been formalized. We can go into mathematical detail with those. Other ideas are still in the realm of philosophy and we will have to resort to things like thought experiments there. How to say depends on what to say.

Gimmick: Open problems (Inspired by The Failures of Eld Science) A special type of instruction strategy will be an assignment like this: “So here we have EDT, which is better than CDT, but it is still flawed in these ways. Can you think of a better decision theory that doesn’t have these flaws? Give it at least 10 minutes. If you have a useful idea, please let us know.”

The idea is to challenge students to think independently how they might go about solving an open problem. It gives them an opportunity to actually make a contribution. I expect it to be strongly intrinsically motivating.

Taxonomy of content At least three sorts of content will be delivered:

Anecdotes/stories to illustrate problems (paperclip maximizer, filling a cauldron, …)
Unformalized philosophical considerations (intelligence explosion, convergent instrumental goals, acausal trade, …)
Technical results (corrigibility, convergent instrumental goals, FDT, …)

Example course unit: value learning & corrigibility

Preview of unit and its structure
An x-minute lecture that informally explains the value learning problem
Assignments
A 5-minute cutscene shows a fictional story of an agent that keeps its creators from pushing the off-button
An x-minute lecture that informally explains corrigibility
A piece of text that introduces the math
A video of the lecturer solving example math assignments
Corrigibility math assignments

Alternatively, we can interleave tiny bits of video with questions to keep the student engaged. A good example of this is the Google deep learning course.

Task allocation

The following is a reply to the common remark that “I’d like to help, but I’m not sure what I can do”.

(last updated at 2018-01-31)

Full responsibility

This means you can’t sleep when things are off track, and jump to your laptop every time you have a new idea to move things forward. This also means you are ready to take on most tasks if no one else volunteers for it, even if you’re not specialized in it. The project is your baby, and you’re a helicopter parent.

Currently done by: Toon Alfrink, Veerle de Goederen, Remmelt Ellen, Johannes Heidecke, Mati Roy

Required technical understanding: superficial.

Minimum commitment: 1 full day per week

Armchair advice

You’re in the chat, and you’re interested in the project, but not ready to make significant contributions. You do want to see where things go, and sometimes you have some interesting remarks to make. On your own terms though.

Currently done by: lots of people

Minimum commitment: none

Content developer

As our content developer, you are responsible for the quality of the material. You coordinate the study group, review the quality of it’s production, and spend extra time on your own learning the content (if you haven’t already) so you can be our expert. You also help the lecturer with finalizing his scripts, and you assist him in understanding everything.

Currently done by: No one. This is a paid position. If interested, email us at raise@aisafety.camp

Required technical understanding: near-complete.

Minimum commitment: 2 full days per week

Giving lectures

You thoroughly study the material, making sure you know it well enough to explain it clearly. Together with the content developer, you sit down and go over the bits that need explanation. These bits range from 3 to 6 minutes, and they are interleaved with questions and small assignments to keep the student engaged.

Currently done by: Robert Miles

Required technical understanding: thorough.

Minimum commitment: 1 full day per week

Study group attendant

You help out in the weekly study group, creating summaries, mind maps, script drafts and assignments. We also give presentations

Currently done by: Johannes Heidecke, Tom Rutten, Toon Alfrink, Tarn Somervell Fletcher, Nandi Schoots, Roland Pihlakas, Robert Miles, Rupert McCallum, Philine Widmer, Louie Terrill, Tim Bakker, Veerle de Goederen, Ofer Givoli

Required technical understanding: none

Minimum commitment: 4 hours per week

Software developer

With about 60% certainty, we will use ihatestatistics as a platform. The company is run by EA’s (we may use it for free), and it’s specialization in statistics (which is closely related to AI) makes it well-suited for our needs. Here is a demo lesson. There are a lot of diamonds buried in the field of automated assessment. The quality of our answer-checking software determines the quality of the questions we can ask. Elaborate feedback mechanisms can make a lot of difference in how fast a learner may converge on the right kind of understanding. You write this software for us.

Minimum commitment: 2 days per week

Legal

Legal is a black box. Your first job is to write your job description.

Marketing/PR/acquisition

Are you good at connecting people? There are a lot of people that want to fix the world, would engage with this project if they knew about it, and have the means (funding, expertise) to help out. Things you can do include finding funders, hosting a round of review, inviting guest speakers with interesting credentials, connecting with relevant EA organisations, etc. Having high social capital in the EA/LW community is a plus.

Animation & editing

Good animation can make a course twice as polished and engaging, and this matters twice as much as you think. The whole point of a course instead of a loose collection of papers is that learners can trust they’re on the right track. Polish builds that trust. Animation is also a skill that is hard to pick up in a short enough timeframe, so we can’t do it. If you’re interested in AI safety and skilled at animation, we need you!

Peptalk

I want to note that what we are doing here isn’t hard. Courses at universities are often created on the fly by one person in a matter of weeks. They get away with it. There is little risk. The worst that can reasonably happen is that we waste some time and money on creating an unpopular course that doesn’t get much traction. On the other hand, there is a lot of opportunity. If we do this well, we might just double the amount of FAI researchers. If that’s not impact, I don’t know what is.