RSS

Apart Research

TagLast edit: 10 Feb 2024 4:38 UTC by jas-ho

Apart Research is an AI safety research and talent development organization. This tag includes posts written by Apart researchers, and content about Apart Research.

Re­sults from the in­ter­pretabil­ity hackathon

17 Nov 2022 14:51 UTC
81 points
0 comments6 min readLW link
(alignmentjam.com)

Black Box In­ves­ti­ga­tion Re­search Hackathon

12 Sep 2022 7:20 UTC
9 points
4 comments2 min readLW link

Newslet­ter for Align­ment Re­search: The ML Safety Updates

Esben Kran22 Oct 2022 16:17 UTC
25 points
0 comments1 min readLW link

Re­sults from the lan­guage model hackathon

Esben Kran10 Oct 2022 8:29 UTC
22 points
1 comment4 min readLW link

AI Safety Ideas: An Open AI Safety Re­search Platform

Esben Kran17 Oct 2022 17:01 UTC
24 points
0 comments1 min readLW link

Safety timelines: How long will it take to solve al­ign­ment?

19 Sep 2022 12:53 UTC
37 points
7 comments6 min readLW link
(forum.effectivealtruism.org)

NeurIPS Safety & ChatGPT. MLAISU W48

2 Dec 2022 15:50 UTC
3 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

ML Safety at NeurIPS & Paradig­matic AI Safety? MLAISU W49

9 Dec 2022 10:38 UTC
19 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

Join the AI Test­ing Hackathon this Friday

Esben Kran12 Dec 2022 14:24 UTC
10 points
0 comments1 min readLW link

Will Machines Ever Rule the World? MLAISU W50

Esben Kran16 Dec 2022 11:03 UTC
12 points
7 comments4 min readLW link
(newsletter.apartresearch.com)

AI im­prov­ing AI [MLAISU W01!]

Esben Kran6 Jan 2023 11:13 UTC
5 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

Ro­bust­ness & Evolu­tion [MLAISU W02]

Esben Kran13 Jan 2023 15:47 UTC
10 points
0 comments3 min readLW link
(newsletter.apartresearch.com)

Gen­er­al­iz­abil­ity & Hope for AI [MLAISU W03]

Esben Kran20 Jan 2023 10:06 UTC
5 points
2 comments2 min readLW link
(newsletter.apartresearch.com)

We Found An Neu­ron in GPT-2

11 Feb 2023 18:27 UTC
141 points
22 comments7 min readLW link
(clementneo.com)

Au­to­mated Sand­wich­ing & Quan­tify­ing Hu­man-LLM Co­op­er­a­tion: ScaleOver­sight hackathon results

23 Feb 2023 10:48 UTC
8 points
0 comments6 min readLW link

Hackathon and Stay­ing Up-to-Date in AI

jacobhaimes8 Jan 2024 17:10 UTC
6 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Iden­ti­fy­ing se­man­tic neu­rons, mechanis­tic cir­cuits & in­ter­pretabil­ity web apps

13 Apr 2023 11:59 UTC
18 points
0 comments8 min readLW link

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

3 Oct 2023 7:45 UTC
11 points
0 comments5 min readLW link

Join the in­ter­pretabil­ity re­search hackathon

Esben Kran28 Oct 2022 16:26 UTC
15 points
0 comments1 min readLW link

[Book] In­ter­pretable Ma­chine Learn­ing: A Guide for Mak­ing Black Box Models Explainable

Esben Kran31 Oct 2022 11:38 UTC
20 points
1 comment1 min readLW link
(christophm.github.io)
No comments.