Apart Research

TagLast edit: 18 Jul 2024 14:34 UTC by Esben Kran

Apart Research is an AI safety research lab. They host the Apart Sprints, large-scale international events for research experimentation. This tag includes posts written by Apart researchers and content about Apart Research.

Black Box Investigation Research Hackathon

Esben Kran and Jonas Hallgren

12 Sep 2022 7:20 UTC

9 points

4 comments2 min readLW link

Newsletter for Alignment Research: The ML Safety Updates

Esben Kran22 Oct 2022 16:17 UTC

25 points

0 comments1 min readLW link

Results from the language model hackathon

Esben Kran10 Oct 2022 8:29 UTC

22 points

1 comment4 min readLW link

We Found An Neuron in GPT-2

Joseph Miller and Clement Neo

11 Feb 2023 18:27 UTC

141 points

22 comments7 min readLW link

(clementneo.com)

Analysing Adversarial Attacks with Linear Probing

Yoann Poupart, Imene Kerboua, Clement Neo and Jason Hoelscher-Obermaier

17 Jun 2024 14:16 UTC

9 points

0 comments8 min readLW link

Deceptive agents can collude to hide dangerous features in SAEs

Simon Lermen and Mateusz Dziemian

15 Jul 2024 17:07 UTC

21 points

0 comments7 min readLW link

AI Safety Ideas: An Open AI Safety Research Platform

Esben Kran17 Oct 2022 17:01 UTC

24 points

0 comments1 min readLW link

Safety timelines: How long will it take to solve alignment?

Esben Kran, JonathanRystroem and Steinthal

19 Sep 2022 12:53 UTC

37 points

7 comments6 min readLW link

(forum.effectivealtruism.org)

Superposition and Dropout

Edoardo Pona16 May 2023 7:24 UTC

21 points

5 comments6 min readLW link

Results from the AI testing hackathon

Esben Kran2 Jan 2023 15:46 UTC

13 points

0 comments1 min readLW link

Towards AI Safety Infrastructure: Talk & Outline

Paul Bricman7 Jan 2024 9:31 UTC

10 points

0 comments2 min readLW link

(www.youtube.com)

Identifying semantic neurons, mechanistic circuits & interpretability web apps

Esben Kran and Neel Nanda

13 Apr 2023 11:59 UTC

18 points

0 comments8 min readLW link

Early Experiments in Reward Model Interpretation Using Sparse Autoencoders

lukemarks, Amirali Abdullah, Rauno Arike, Fazl and nothoughtsheadempty

3 Oct 2023 7:45 UTC

16 points

0 comments5 min readLW link

Join the interpretability research hackathon

Esben Kran28 Oct 2022 16:26 UTC

15 points

0 comments1 min readLW link

[Book] Interpretable Machine Learning: A Guide for Making Black Box Models Explainable

Esben Kran31 Oct 2022 11:38 UTC

20 points

1 comment1 min readLW link

(christophm.github.io)

NeurIPS Safety & ChatGPT. MLAISU W48

Esben Kran and Steinthal

2 Dec 2022 15:50 UTC

3 points

0 comments4 min readLW link

(newsletter.apartresearch.com)

ML Safety at NeurIPS & Paradigmatic AI Safety? MLAISU W49

Esben Kran and Steinthal

9 Dec 2022 10:38 UTC

19 points

0 comments4 min readLW link

(newsletter.apartresearch.com)

Join the AI Testing Hackathon this Friday

Esben Kran12 Dec 2022 14:24 UTC

10 points

0 comments1 min readLW link

Will Machines Ever Rule the World? MLAISU W50

Esben Kran16 Dec 2022 11:03 UTC

12 points

7 comments4 min readLW link

(newsletter.apartresearch.com)

AI improving AI [MLAISU W01!]

Esben Kran6 Jan 2023 11:13 UTC

5 points

0 comments4 min readLW link

(newsletter.apartresearch.com)

Robustness & Evolution [MLAISU W02]

Esben Kran13 Jan 2023 15:47 UTC

10 points

0 comments3 min readLW link

(newsletter.apartresearch.com)

Generalizability & Hope for AI [MLAISU W03]

Esben Kran20 Jan 2023 10:06 UTC

5 points

2 comments2 min readLW link

(newsletter.apartresearch.com)

Automated Sandwiching & Quantifying Human-LLM Cooperation: ScaleOversight hackathon results

Esben Kran, Fazl, Sabrina Zaki, gabrielrecc and rz2383

23 Feb 2023 10:48 UTC

8 points

0 comments6 min readLW link

Hackathon and Staying Up-to-Date in AI

jacobhaimes8 Jan 2024 17:10 UTC

11 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon

Esben Kran19 Apr 2024 14:46 UTC

5 points

0 comments1 min readLW link

(www.apartresearch.com)

Results from the AI x Democracy Research Sprint

Esben Kran, jordine and Jason Hoelscher-Obermaier

14 Jun 2024 16:40 UTC

13 points

0 comments6 min readLW link

Computational Mechanics Hackathon (June 1 & 2)

Adam Shai24 May 2024 22:18 UTC

34 points

5 comments1 min readLW link

Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1

StefanHex and Marius Hobbhahn

9 May 2023 19:41 UTC

119 points

1 comment10 min readLW link

Results from the interpretability hackathon

Esben Kran and Neel Nanda

17 Nov 2022 14:51 UTC

81 points

0 comments6 min readLW link

(alignmentjam.com)

Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2

StefanHex and Marius Hobbhahn

25 May 2023 15:37 UTC

71 points

1 comment13 min readLW link

How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!

StefanHex24 Jan 2023 18:45 UTC

47 points

5 comments13 min readLW link

Robustness of Model-Graded Evaluations and Automated Interpretability

Simon Lermen and viluon

15 Jul 2023 19:12 UTC

44 points

5 comments9 min readLW link

No comments.