Introducing the AI Alignment Forum (FAQ)

After a few months of open beta, the AI Align­ment Fo­rum is ready to launch. It is a new web­site built by the team be­hind LessWrong 2.0, to help cre­ate a new hub for tech­ni­cal AI Align­ment re­search and dis­cus­sion. This is an in-progress FAQ about the new Fo­rum.

What are the five most im­por­tant high­lights about the AI Align­ment Fo­rum in this FAQ?

  • The vi­sion for the fo­rum is of a sin­gle on­line hub for al­ign­ment re­searchers to have con­ver­sa­tions about all ideas in the field...

  • ...while also pro­vid­ing a bet­ter on­board­ing ex­pe­rience for peo­ple get­ting in­volved with al­ign­ment re­search than ex­ists cur­rently.

  • There are three new se­quences fo­cus­ing on some of the ma­jor ap­proaches to al­ign­ment, which will up­date daily for the com­ing 6-8 weeks.

  • For non-mem­bers and fu­ture re­searchers, the place to in­ter­act with the con­tent is, where all Fo­rum con­tent will be cross­posted.

  • The site will con­tinue to be im­proved in the long-term, as the team comes to bet­ter un­der­stands the needs and goals of re­searchers.

What is the pur­pose of the AI Align­ment Fo­rum?

Our first pri­or­ity is ob­vi­ously to avert catas­trophic out­comes from un­al­igned Ar­tifi­cial In­tel­li­gence. We think the best way to achieve this at the mar­gin is to build an on­line-hub for AI Align­ment re­search, which both al­lows the ex­ist­ing top re­searchers in the field to talk about cut­ting-edge ideas and ap­proaches, as well as the on­board­ing of new re­searchers and con­trib­u­tors.

We think that to solve the AI Align­ment prob­lem, the field of AI Align­ment re­search needs to be able to effec­tively co­or­di­nate a large num­ber of re­searchers from a large num­ber of or­gani­sa­tions, with sig­nifi­cantly differ­ent ap­proaches. Two decades ago we might have in­vested heav­ily in the de­vel­op­ment of a con­fer­ence or a jour­nal, but with the on­set of the in­ter­net, an on­line fo­rum with its abil­ity to do much faster and more com­pre­hen­sive forms of peer-re­view seemed to us like a more promis­ing way to help the field form a good set of stan­dards and method­olo­gies.

Who is the AI Align­ment Fo­rum for?

There ex­ists an in­ter­con­nected com­mu­nity of Align­ment re­searchers in in­dus­try, academia, and el­se­where, who have spent many years think­ing care­fully about a va­ri­ety of ap­proaches to al­ign­ment. Such re­search re­ceives in­sti­tu­tional sup­port from or­gani­sa­tions in­clud­ing FHI, CHAI, Deep­Mind, OpenAI, MIRI, Open Philan­thropy, and oth­ers. The Fo­rum mem­ber­ship cur­rently con­sists of re­searchers at these or­gani­sa­tions and their re­spec­tive col­lab­o­ra­tors.

The Fo­rum is also in­tended to be a way to in­ter­act with and con­tribute to the cut­ting edge re­search for peo­ple not con­nected to these in­sti­tu­tions ei­ther pro­fes­sion­ally or so­cially. There have been many such in­di­vi­d­u­als on LessWrong, and that is the cur­rent best place for such peo­ple to start con­tribut­ing, to be given feed­back and skill-up in this do­main.

There are about 50-100 mem­bers of the Fo­rum. Th­ese folks will be able to post and com­ment on the Fo­rum, and this group will not grow in size quickly.

Why do we need an­other web­site for al­ign­ment re­search?

There are many places on­line that host re­search on the al­ign­ment prob­lem, such as the OpenAI blog, the Deep­Mind Safety Re­search blog, the In­tel­li­gent Agent Foun­da­tions Fo­rum, AI-Align­, and of course

But none of these spaces are set up to host dis­cus­sion amongst the 50-100 peo­ple work­ing in the field. And those that do host dis­cus­sion have un­clear as­sump­tions about what’s com­mon knowl­edge.

What type of con­tent is ap­pro­pri­ate for this Fo­rum?

As a rule-of-thumb, if a thought is some­thing you’d bring up when talk­ing to some­one at a re­search work­shop or a col­league in your lab, it’s also a wel­come com­ment or post here.

If you’d like a sense of what other Fo­rum mem­bers are in­ter­ested in, here’s some quick data on what high-level con­tent fo­rum mem­bers are in­ter­ested in see­ing, taken from a sur­vey we gave to in­vi­tees to the open beta (n = 34).

The re­sponses were on a 1-5 scale, which rep­re­sented “If I see 1 post per day, I want to see this type of con­tent…” (1) Once per year, (2) Once per 3-4 months (3) Once per 1-2 months (4) Once per 1-2 weeks (5) A third of all posts that I see.

Here were the types of con­tent asked about, and the mean re­sponse:

  • New the­ory-ori­ented al­ign­ment re­search typ­i­cal of MIRI or CHAI: 4.4 /​ 5

  • New ML-ori­ented al­ign­ment re­search typ­i­cal of OpenAI or Deep­Mind’s safety teams: 4.2 /​ 5

  • New for­mal or nearly-for­mal dis­cus­sion of in­tel­lec­tu­ally in­ter­est­ing top­ics that look ques­tion­ably/​am­bigu­ously/​periph­er­ally al­ign­ment-re­lated: 3.5 /​ 5

  • High-qual­ity in­for­mal dis­cus­sion of al­ign­ment re­search method­ol­ogy and back­ground as­sump­tions, what’s needed for progress on differ­ent agen­das, why peo­ple are pur­su­ing this or that agenda, etc: 4.1 /​ 5

  • At­tempts to more clearly pack­age/​ex­plain/​sum­marise pre­vi­ously dis­cussed al­ign­ment re­search: 3.7 /​ 5

  • New tech­ni­cal ideas that are clearly not al­ign­ment-re­lated but are likely to be in­tel­lec­tu­ally in­ter­est­ing to fo­rum reg­u­lars: 2.2 /​ 5

  • High-qual­ity in­for­mal dis­cus­sion of very core back­ground ques­tions about ad­vanced AI sys­tems: 3.3 /​ 5

  • Typ­i­cal AGI fore­cast­ing re­search/​dis­cus­sion that isn’t ob­vi­ously un­usu­ally rele­vant to AGI al­ign­ment work: 2.2 /​ 5

Re­lated data: After in­te­grat­ing over all 34 re­spon­dents’ self-pre­dic­tions, they pre­dict 3.2 com­ments and 0.99 posts per day. We’ll re­port on ev­ery­one’s self-ac­cu­racy in a year ;)

What are the three new se­quences I’ve been hear­ing about?

We have been co­or­di­nat­ing with AI al­ign­ment re­searchers to cre­ate three new se­quences of posts that we hope can serve as in­tro­duc­tions to some of the most im­por­tant core ideas in AI Align­ment. The three new se­quences will be:

Over the next few weeks, we will be re­leas­ing about one post per day from these se­quences, start­ing with the first post in the Embed­ded Agency se­quence.

If you are in­ter­ested in learn­ing about AI al­ign­ment, you’re very wel­come to ask ques­tions and dis­cuss the con­tent in the com­ment sec­tions. And if you are already fa­mil­iar with a lot of the core ideas, then we would greatly ap­pre­ci­ate feed­back on the se­quences as we pub­lish them. We hope that these se­quences can be a ma­jor part of how new peo­ple get in­volved in AI al­ign­ment re­search, and so we care a lot about their qual­ity and clar­ity.

In what way is it eas­ier for po­ten­tial fu­ture Align­ment re­searchers to get in­volved?

Most sci­en­tific fields have to bal­ance the need for high-con­text dis­cus­sion with other spe­cial­ists, and pub­lic dis­cus­sion which al­lows the broader dis­sem­i­na­tion of new ideas, the on­board­ing of new mem­bers and the op­por­tu­nity for new po­ten­tial re­searchers to prove them­selves. We tried to de­sign a sys­tem that still al­lows new­com­ers to par­ti­ci­pate and learn, while giv­ing es­tab­lished re­searchers the space to have high-level dis­cus­sions with other re­searchers.

To do that, we in­te­grated the new AI Align­ment Fo­rum closely with the ex­ist­ing LessWrong plat­form, where you can find and com­ment on all con­tent on the AI Align­ment Fo­rum on LessWrong, and your com­ments and posts can be moved to the AI Align­ment Fo­rum by mods for fur­ther en­gage­ment by the re­searchers. For de­tails on the ex­act setup, see the ques­tion on that be­low.

We hope that this will re­sult in a sys­tem in which cut­ting-edge re­search and dis­cus­sion can hap­pen, while new good ideas and par­ti­ci­pants can get no­ticed and re­warded for their con­tri­bu­tions.

If you’ve been in­ter­ested in do­ing al­ign­ment re­search, then we think one of the best ways to do that right now is to com­ment on AI Align­ment Fo­rum posts on LessWrong, and check out the new con­tent we’ll be rol­ling out.

What is the ex­act setup with con­tent on LessWrong?

Here are the de­tails:

  • Au­to­matic Cross­post­ing—Any new post or com­ment on the new AI Align­ment Fo­rum is au­to­mat­i­cally cross-posted to Ac­counts are also shared be­tween the two plat­forms.

  • Con­tent Pro­mo­tion—Any com­ment or post on LessWrong can be pro­moted by mem­bers of the AI Align­ment Fo­rum from LessWrong to the AI Align­ment Fo­rum.

  • Separate Rep­u­ta­tion – The rep­u­ta­tion sys­tems for LessWrong and the AI Align­ment Fo­rum are sep­a­rate. On LessWrong you can see two rep­u­ta­tion scores: a pri­mary karma score com­bin­ing karma from both sites, and a sec­ondary karma score spe­cific to AI Align­ment Fo­rum mem­bers. On the AI Align­ment Fo­rum, you will just see their AI Align­ment karma.

  • Con­tent Own­er­ship—If a com­ment or post of yours is pro­moted to the AI Align­ment Fo­rum, you will con­tinue to have full own­er­ship of the con­tent, and you’ll be able to re­spond di­rectly to all com­ments by mem­bers on your con­tent.

The AI Align­ment Fo­rum sur­vey (sent to all beta in­vi­tees) re­ceived 34 sub­mis­sions. One ques­tion asked whether the in­te­gra­tion with LW would lead to the per­son con­tribut­ing more or less to the AI Align­ment Fo­rum (on a range from 0 to 6). The mean re­sponse was 3.7, the me­dian was 3, and there was only one re­sponse be­low 3 (where 3 rep­re­sented ‘doesn’t mat­ter’).

How do new mem­bers get added to the Fo­rum?

There are about 50-100 mem­bers of the AI Align­ment Fo­rum, and while the num­ber will grow, it will grow rarely and slowly.

We’re talk­ing with the al­ign­ment re­searchers at CHAI, Deep­Mind, OpenAI, MIRI, and will be bring­ing on a mod­er­a­tor with in­vite-power from each of those or­gani­sa­tions. They will nat­u­rally have a much bet­ter sense of the field and re­searchers in their orgs, than we the site de­sign­ers. We’ll edit this post to in­clude them once they’re con­firmed.

On al­ign­ment­fo­ in the top right cor­ner (af­ter you cre­ated an ac­count) is a small ap­pli­ca­tion form available. If you’re a reg­u­lar con­trib­u­tor on LessWrong and want to point us to some of your best work, or if per­haps you’re a full-time re­searcher in an ad­ja­cent field and would like to par­ti­ci­pate in the Fo­rum re­search dis­cus­sion, you’re wel­come to use that to let us know who you are and what re­search you have done.

Who is run­ning this pro­ject?

The AI Align­ment Fo­rum de­vel­op­ment team con­sists of Oliver Habryka, Ben Pace, Ray­mond Arnold, and Jim Bab­cock. We’re in con­ver­sa­tion with al­ign­ment re­searchers from Deep­Mind, OpenAI, MIRI and CHAI to con­firm mod­er­a­tors from those or­gani­sa­tions.

We would like to thank BERI, EA Grants, Nick Beck­stead, Matt Wage and Eric Rogstad for the sup­port that lead to this Fo­rum be­ing built.

Can I use LaTeX?

Yes! You can use LaTeX in posts and com­ments with Cmd+4 /​ Ctrl+4.

Also, if you go into your user set­tings and switch to the mark­down ed­i­tor, you can just copy-paste LaTeX into a post/​com­ment and it will ren­der when you sub­mit with no fur­ther work.

(Talk to us in in­ter­com if you run into any prob­lems.)

I have a differ­ent ques­tion.

Use the com­ment sec­tion be­low. Alter­na­tively, use in­ter­com (bot­tom right cor­ner).

No nominations.
No reviews.