AI Safety Camp (AISC) is a non-profit initiative to run programs for diversely skilled researchers who want to try collaborate on an open problem for reducing AI existential risk.

AISC 2024 - Pro­ject Summaries

NickyP27 Nov 2023
48 points
3 comments

AI Safety Camp 2024

Linda Linsefors18 Nov 2023
15 points
1 comment

AISC Pro­ject: Model­ling Tra­jec­to­ries of Lan­guage Models

NickyP13 Nov 2023
26 points
0 comments12 min readLW link

The first AI Safety Camp & onwards

Remmelt7 Jun 2018
46 points
0 comments8 min readLW link

How teams went about their re­search at AI Safety Camp edi­tion 5

Remmelt28 Jun 2021
24 points
0 comments6 min readLW link

Thoughts on AI Safety Camp

Charlie Steiner13 May 2022
33 points
8 comments

Ap­pli­ca­tions for AI Safety Camp 2022 Now Open!

adamShimi17 Nov 2021
47 points
3 comments

Trust-max­i­miz­ing AGI

25 Feb 2022
7 points
26 comments

This might be the last AI Safety Camp

24 Jan 2024
192 points
34 comments

A Study of AI Science Models

13 May 2023
20 points
0 comments24 min readLW link

Open Prob­lems in Nega­tive Side Effect Minimization

6 May 2022
12 points
6 comments

[Aspira­tion-based de­signs] 1. In­for­mal in­tro­duc­tion

28 Apr 2024
41 points
4 comments

AISC Pro­ject: Bench­marks for Stable Reflectivity

jacquesthibs13 Nov 2023
17 points
0 comments8 min readLW link

An­nounc­ing the sec­ond AI Safety Camp

Lachouette11 Jun 2018
34 points
0 comments1 min readLW link

A brief re­view of the rea­sons multi-ob­jec­tive RL could be im­por­tant in AI Safety Research

Ben Smith29 Sep 2021
30 points
7 comments

Machines vs. Memes 2: Memet­i­cally-Mo­ti­vated Model Extensions

naterush31 May 2022
6 points
0 comments4 min readLW link

AISC 2023, Progress Re­port for March: Team In­ter­pretable Architectures

2 Apr 2023
14 points
0 comments14 min readLW link

Machines vs Memes Part 3: Imi­ta­tion and Memes

ceru231 Jun 2022
7 points
0 comments7 min readLW link

Steganog­ra­phy and the Cy­cleGAN—al­ign­ment failure case study

Jan Czechowski11 Jun 2022
34 points
0 comments4 min readLW link

Reflec­tion Mechanisms as an Align­ment tar­get: A survey

22 Jun 2022
32 points
1 comment

AISC9 has ended and there will be an AISC10

Linda Linsefors29 Apr 2024
74 points
4 comments

Towards a for­mal­iza­tion of the agent struc­ture problem

Alex_Altair29 Apr 2024
52 points
5 comments

Ap­ply to lead a pro­ject dur­ing the next vir­tual AI Safety Camp

13 Sep 2023
19 points
0 comments5 min readLW link

Pro­jects I would like to see (pos­si­bly at AI Safety Camp)

Linda Linsefors27 Sep 2023
22 points
12 comments

AI Safety Re­search Camp—Pro­ject Proposal

David_Kristoffersson2 Feb 2018
29 points
11 comments

Ex­trac­tion of hu­man prefer­ences 👨→🤖

arunraja-hub24 Aug 2021
18 points
2 comments

The­o­ries of Mo­du­lar­ity in the Biolog­i­cal Literature

4 Apr 2022
51 points
13 comments

Pro­ject In­tro: Selec­tion The­o­rems for Modularity

4 Apr 2022
72 points
20 comments

How teams went about their re­search at AI Safety Camp edi­tion 8

9 Sep 2023
28 points
0 comments13 min readLW link

AISC5 Ret­ro­spec­tive: Mechanisms for Avoid­ing Tragedy of the Com­mons in Com­mon Pool Re­source Problems

27 Sep 2021
8 points
3 comments

Sur­vey on AI ex­is­ten­tial risk scenarios

8 Jun 2021
65 points
11 comments

Ac­knowl­edg­ing Hu­man Prefer­ence Types to Sup­port Value Learning

Nandi13 Nov 2018
34 points
4 comments

Em­piri­cal Ob­ser­va­tions of Ob­jec­tive Ro­bust­ness Failures

23 Jun 2021
63 points
5 comments

Dis­cus­sion: Ob­jec­tive Ro­bust­ness and In­ner Align­ment Terminology

23 Jun 2021
73 points
7 comments

A sur­vey of tool use and work­flows in al­ign­ment research

23 Mar 2022
45 points
4 comments

Machines vs Memes Part 1: AI Align­ment and Memetics

Harriet Farlow31 May 2022
18 points
1 comment

AI takeover table­top RPG: “The Treach­er­ous Turn”

Daniel Kokotajlo30 Nov 2022
53 points
5 comments

Re­sults from a sur­vey on tool use and work­flows in al­ign­ment research

19 Dec 2022
79 points
2 comments

A de­scrip­tive, not pre­scrip­tive, overview of cur­rent AI Align­ment Research

6 Jun 2022
138 points
21 comments

AI Safety Camp, Vir­tual Edi­tion 2023

Linda Linsefors6 Jan 2023
40 points
10 comments

AI Safety Camp: Ma­chine Learn­ing for Scien­tific Dis­cov­ery

Eleni Angelou6 Jan 2023
3 points
0 comments1 min readLW link

In­her­ently In­ter­pretable Architectures

30 Jun 2023
4 points
0 comments7 min readLW link

Fund­ing case: AI Safety Camp

12 Dec 2023
66 points
5 comments

Paper re­view: “The Un­rea­son­able Effec­tive­ness of Easy Train­ing Data for Hard Tasks”

Vassil Tashev29 Feb 2024
11 points
0 comments4 min readLW link

In­ter­view: Ap­pli­ca­tions w/​ Alice Rigg

jacobhaimes19 Dec 2023
12 points
0 comments1 min readLW link

INTERVIEW: StakeOut.AI w/​ Dr. Peter Park

jacobhaimes4 Mar 2024
6 points
0 comments1 min readLW link

Train­ing-time do­main au­tho­riza­tion could be helpful for safety

25 May 2024
15 points
4 comments

A Re­view of Weak to Strong Gen­er­al­iza­tion [AI Safety Camp]

sevdeawesome7 Mar 2024
10 points
0 comments9 min readLW link

In­duc­ing hu­man-like bi­ases in moral rea­son­ing LMs

20 Feb 2024
21 points
3 comments

INTERVIEW: Round 2 - StakeOut.AI w/​ Dr. Peter Park

jacobhaimes18 Mar 2024
5 points
0 comments1 min readLW link

Pod­cast in­ter­view se­ries fea­tur­ing Dr. Peter Park

jacobhaimes26 Mar 2024
3 points
0 comments2 min readLW link

Im­mu­niza­tion against harm­ful fine-tun­ing attacks

6 Jun 2024
4 points
0 comments12 min readLW link

“Open Source AI” is a lie, but it doesn’t have to be

jacobhaimes30 Apr 2024
18 points
5 comments

Whirlwind Tour of Chain of Thought Liter­a­ture Rele­vant to Au­tomat­ing Align­ment Re­search.

sevdeawesome1 Jul 2024
21 points
0 comments17 min readLW link

In­tro to On­to­ge­netic Curriculum

Eris13 Apr 2023
19 points
1 comment

Paths to failure

25 Apr 2023
29 points
1 comment

Con­trol Sym­me­try: why we might want to start in­ves­ti­gat­ing asym­met­ric al­ign­ment interventions

domenicrosati11 Nov 2023
25 points
1 comment

The Science Al­gorithm AISC Project

Johannes C. Mayer13 Nov 2023
12 points
0 comments1 min readLW link

AISC pro­ject: Satis­fIA – AI that satis­fies with­out over­do­ing it

Jobst Heitzig11 Nov 2023
12 points
0 comments1 min readLW link

AISC pro­ject: TinyEvals

Jett22 Nov 2023
22 points
0 comments4 min readLW link

AISC pro­ject: How promis­ing is au­tomat­ing al­ign­ment re­search? (liter­a­ture re­view)

Bogdan Ionut Cirstea28 Nov 2023
4 points
1 comment

Agen­tic Mess (A Failure Story)

6 Jun 2023
44 points
5 comments

AISC team re­port: Soft-op­ti­miza­tion, Bayes and Goodhart

27 Jun 2023
37 points
2 comments

A Friendly Face (Another Failure Story)

20 Jun 2023
65 points
21 comments


30 Jun 2023
7 points
0 comments2 min readLW link

Pos­i­tive Attractors

30 Jun 2023
6 points
0 comments13 min readLW link

The Con­trol Prob­lem: Un­solved or Un­solv­able?

Remmelt2 Jun 2023
50 points
46 comments

“Want­ing” and “lik­ing”

Mateusz Bagiński30 Aug 2023
22 points
2 comments