AI Safety Camp

TagLast edit: 10 May 2022 6:23 UTC by Remmelt

AI Safety Camp (AISC) is a non-profit initiative to run programs for diversely skilled researchers who want to try collaborate on an open problem for reducing AI existential risk.

Official Website

AISC 2024 - Project Summaries

NickyP27 Nov 2023 22:32 UTC

48 points

3 comments18 min readLW link

AI Safety Camp 2024

Linda Linsefors18 Nov 2023 10:37 UTC

15 points

1 comment4 min readLW link

(aisafety.camp)

AISC Project: Modelling Trajectories of Language Models

NickyP13 Nov 2023 14:33 UTC

26 points

0 comments12 min readLW link

The first AI Safety Camp & onwards

Remmelt7 Jun 2018 20:13 UTC

46 points

0 comments8 min readLW link

How teams went about their research at AI Safety Camp edition 5

Remmelt28 Jun 2021 15:15 UTC

24 points

0 comments6 min readLW link

Thoughts on AI Safety Camp

Charlie Steiner13 May 2022 7:16 UTC

33 points

8 comments7 min readLW link

Applications for AI Safety Camp 2022 Now Open!

adamShimi17 Nov 2021 21:42 UTC

47 points

3 comments1 min readLW link

Trust-maximizing AGI

Jan and Karl von Wendt

25 Feb 2022 15:13 UTC

7 points

26 comments9 min readLW link

(universalprior.substack.com)

This might be the last AI Safety Camp

Remmelt and Linda Linsefors

24 Jan 2024 9:33 UTC

192 points

34 comments1 min readLW link

A Study of AI Science Models

Eleni Angelou and machinebiology

13 May 2023 23:25 UTC

20 points

0 comments24 min readLW link

Open Problems in Negative Side Effect Minimization

Fabian Schimpf and Lukas Fluri

6 May 2022 9:37 UTC

12 points

6 comments17 min readLW link

[Aspiration-based designs] 1. Informal introduction

B Jacobs, Jobst Heitzig, Simon Fischer and Simon Dima

28 Apr 2024 13:00 UTC

41 points

4 comments8 min readLW link

AISC Project: Benchmarks for Stable Reflectivity

jacquesthibs13 Nov 2023 14:51 UTC

17 points

0 comments8 min readLW link

Announcing the second AI Safety Camp

Lachouette11 Jun 2018 18:59 UTC

34 points

0 comments1 min readLW link

A brief review of the reasons multi-objective RL could be important in AI Safety Research

Ben Smith29 Sep 2021 17:09 UTC

30 points

7 comments10 min readLW link

Machines vs. Memes 2: Memetically-Motivated Model Extensions

naterush31 May 2022 22:03 UTC

6 points

0 comments4 min readLW link

AISC 2023, Progress Report for March: Team Interpretable Architectures

Robert Kralisch, Eris, teahorse and Sohaib Imran

2 Apr 2023 16:19 UTC

14 points

0 comments14 min readLW link

Machines vs Memes Part 3: Imitation and Memes

ceru231 Jun 2022 13:36 UTC

7 points

0 comments7 min readLW link

Steganography and the CycleGAN—alignment failure case study

Jan Czechowski11 Jun 2022 9:41 UTC

34 points

0 comments4 min readLW link

Reflection Mechanisms as an Alignment target: A survey

Marius Hobbhahn, elandgre and Beth Barnes

22 Jun 2022 15:05 UTC

32 points

1 comment14 min readLW link

AISC9 has ended and there will be an AISC10

Linda Linsefors29 Apr 2024 10:53 UTC

74 points

4 comments2 min readLW link

Towards a formalization of the agent structure problem

Alex_Altair29 Apr 2024 20:28 UTC

52 points

5 comments14 min readLW link

Apply to lead a project during the next virtual AI Safety Camp

Linda Linsefors and Remmelt

13 Sep 2023 13:29 UTC

19 points

0 comments5 min readLW link

(aisafety.camp)

Projects I would like to see (possibly at AI Safety Camp)

Linda Linsefors27 Sep 2023 21:27 UTC

22 points

12 comments4 min readLW link

AI Safety Research Camp—Project Proposal

David_Kristoffersson2 Feb 2018 4:25 UTC

29 points

11 comments8 min readLW link

Extraction of human preferences 👨→🤖

arunraja-hub24 Aug 2021 16:34 UTC

18 points

2 comments5 min readLW link

Theories of Modularity in the Biological Literature

CallumMcDougall, Avery and Lucius Bushnaq

4 Apr 2022 12:48 UTC

51 points

13 comments7 min readLW link

Project Intro: Selection Theorems for Modularity

CallumMcDougall, Avery and Lucius Bushnaq

4 Apr 2022 12:59 UTC

72 points

20 comments16 min readLW link

How teams went about their research at AI Safety Camp edition 8

Remmelt, Linda Linsefors and Kristi Uustalu

9 Sep 2023 16:34 UTC

28 points

0 comments13 min readLW link

AISC5 Retrospective: Mechanisms for Avoiding Tragedy of the Commons in Common Pool Resource Problems

Ariel Kwiatkowski, Quinn and bengr

27 Sep 2021 16:46 UTC

8 points

3 comments7 min readLW link

Survey on AI existential risk scenarios

Sam Clarke, apc and Jonas Schuett

8 Jun 2021 17:12 UTC

65 points

11 comments7 min readLW link

Acknowledging Human Preference Types to Support Value Learning

Nandi13 Nov 2018 18:57 UTC

34 points

4 comments9 min readLW link

Empirical Observations of Objective Robustness Failures

jbkjr and Lauro Langosco

23 Jun 2021 23:23 UTC

63 points

5 comments9 min readLW link

Discussion: Objective Robustness and Inner Alignment Terminology

jbkjr and Lauro Langosco

23 Jun 2021 23:25 UTC

73 points

7 comments9 min readLW link

A survey of tool use and workflows in alignment research

Logan Riggs, Jan, janus and jacquesthibs

23 Mar 2022 23:44 UTC

45 points

4 comments1 min readLW link

Machines vs Memes Part 1: AI Alignment and Memetics

Harriet Farlow31 May 2022 22:03 UTC

18 points

1 comment6 min readLW link

AI takeover tabletop RPG: “The Treacherous Turn”

Daniel Kokotajlo30 Nov 2022 7:16 UTC

53 points

5 comments1 min readLW link

Results from a survey on tool use and workflows in alignment research

jacquesthibs, Jan, janus and Logan Riggs

19 Dec 2022 15:19 UTC

79 points

2 comments19 min readLW link

A descriptive, not prescriptive, overview of current AI Alignment Research

Jan, Logan Riggs, jacquesthibs and janus

6 Jun 2022 21:59 UTC

138 points

21 comments7 min readLW link

AI Safety Camp, Virtual Edition 2023

Linda Linsefors6 Jan 2023 11:09 UTC

40 points

10 comments3 min readLW link

(aisafety.camp)

AI Safety Camp: Machine Learning for Scientific Discovery

Eleni Angelou6 Jan 2023 3:21 UTC

3 points

0 comments1 min readLW link

Inherently Interpretable Architectures

Robert Kralisch, teahorse, Eris and Sohaib Imran

30 Jun 2023 20:43 UTC

4 points

0 comments7 min readLW link

Funding case: AI Safety Camp

Remmelt and Linda Linsefors

12 Dec 2023 9:08 UTC

66 points

5 comments6 min readLW link

(manifund.org)

Paper review: “The Unreasonable Effectiveness of Easy Training Data for Hard Tasks”

Vassil Tashev29 Feb 2024 18:44 UTC

11 points

0 comments4 min readLW link

Interview: Applications w/ Alice Rigg

jacobhaimes19 Dec 2023 19:03 UTC

12 points

0 comments1 min readLW link

(into-ai-safety.github.io)

INTERVIEW: StakeOut.AI w/ Dr. Peter Park

jacobhaimes4 Mar 2024 16:35 UTC

6 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Training-time domain authorization could be helpful for safety

domenicrosati, Jan Wehner and David Atanasov

25 May 2024 15:10 UTC

15 points

4 comments7 min readLW link

A Review of Weak to Strong Generalization [AI Safety Camp]

sevdeawesome7 Mar 2024 17:16 UTC

10 points

0 comments9 min readLW link

Inducing human-like biases in moral reasoning LMs

Artyom Karpov, Austin Meek, Bogdan Ionut Cirstea and SCho

20 Feb 2024 16:28 UTC

21 points

3 comments14 min readLW link

INTERVIEW: Round 2 - StakeOut.AI w/ Dr. Peter Park

jacobhaimes18 Mar 2024 21:21 UTC

5 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Podcast interview series featuring Dr. Peter Park

jacobhaimes26 Mar 2024 0:25 UTC

3 points

0 comments2 min readLW link

(into-ai-safety.github.io)

Immunization against harmful fine-tuning attacks

domenicrosati, Jan Wehner and David Atanasov

6 Jun 2024 15:17 UTC

4 points

0 comments12 min readLW link

“Open Source AI” is a lie, but it doesn’t have to be

jacobhaimes30 Apr 2024 23:10 UTC

18 points

5 comments6 min readLW link

(jacob-haimes.github.io)

Whirlwind Tour of Chain of Thought Literature Relevant to Automating Alignment Research.

sevdeawesome1 Jul 2024 5:50 UTC

21 points

0 comments17 min readLW link

Intro to Ontogenetic Curriculum

Eris13 Apr 2023 17:15 UTC

19 points

1 comment2 min readLW link

Paths to failure

Karl von Wendt and mespa

25 Apr 2023 8:03 UTC

29 points

1 comment8 min readLW link

Control Symmetry: why we might want to start investigating asymmetric alignment interventions

domenicrosati11 Nov 2023 17:27 UTC

25 points

1 comment2 min readLW link

The Science Algorithm AISC Project

Johannes C. Mayer13 Nov 2023 12:52 UTC

12 points

0 comments1 min readLW link

(docs.google.com)

AISC project: SatisfIA – AI that satisfies without overdoing it

Jobst Heitzig11 Nov 2023 18:22 UTC

12 points

0 comments1 min readLW link

(docs.google.com)

AISC project: TinyEvals

Jett22 Nov 2023 20:47 UTC

22 points

0 comments4 min readLW link

AISC project: How promising is automating alignment research? (literature review)

Bogdan Ionut Cirstea28 Nov 2023 14:47 UTC

4 points

1 comment1 min readLW link

(docs.google.com)

Agentic Mess (A Failure Story)

Karl von Wendt, Sofia Bharadia, PeterDrotos, Artem Korotkov, mespa and mruwnik

6 Jun 2023 13:09 UTC

44 points

5 comments13 min readLW link

AISC team report: Soft-optimization, Bayes and Goodhart

Simon Fischer, benjaminko, jazcarretao, DFNaiff and Jeremy Gillen

27 Jun 2023 6:05 UTC

37 points

2 comments15 min readLW link

A Friendly Face (Another Failure Story)

Karl von Wendt, Sofia Bharadia, PeterDrotos, Artem Korotkov, mespa and mruwnik

20 Jun 2023 10:31 UTC

65 points

21 comments16 min readLW link

Introduction

Robert Kralisch, Eris, teahorse and Sohaib Imran

30 Jun 2023 20:45 UTC

7 points

0 comments2 min readLW link

Positive Attractors

Robert Kralisch, teahorse, Eris and Sohaib Imran

30 Jun 2023 20:43 UTC

6 points

0 comments13 min readLW link

The Control Problem: Unsolved or Unsolvable?

Remmelt2 Jun 2023 15:42 UTC

50 points

46 comments14 min readLW link

“Wanting” and “liking”

Mateusz Bagiński30 Aug 2023 14:52 UTC

22 points

2 comments29 min readLW link

plex 29 Aug 2021 15:59 UTC
3 points
I think this should be under the heading Organizations in the tag category AI.
- Remmelt 10 May 2022 6:02 UTC
  2 points
  Parent
  Do you mean as a sub-sub-tag? I think that would be good idea.
  
  I have looked at the LessWrong tag manager, but still do not know how to do it. Any tips?
  
  (if the idea is to merge it with the Organizations tag, I am biased of course, but there are enough posts tagged AI Safety Camp to warrant it being tagged as a distinguishable organisation)
  - Remmelt 10 May 2022 6:05 UTC
    2 points
    Parent
    Ah, found the other page, and I see it is already put under the category ‘Artificial Intelligence’, under the heading ‘Organizations’: https://www.lesswrong.com/tags/all
    
    Thanks for the help getting this sorted, @Plex.
    - plex 10 May 2022 12:49 UTC
      4 points
      Parent
      Yup, the all tags page is not openly editable, but Ruby gave me access a while back after I did a bunch of editing and categorization for the AI tags, so I added this.