RSS

Align­ment Jam

TagLast edit: 16 May 2023 8:25 UTC by Esben Kran

This lists the posts that have come from the Alignment Jam hackathons.

Towards AI Safety In­fras­truc­ture: Talk & Outline

Paul Bricman7 Jan 2024 9:31 UTC
10 points
0 comments2 min readLW link
(www.youtube.com)

Re­sults from the AI test­ing hackathon

Esben Kran2 Jan 2023 15:46 UTC
13 points
0 comments1 min readLW link

Su­per­po­si­tion and Dropout

Edoardo Pona16 May 2023 7:24 UTC
21 points
5 comments6 min readLW link

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 1

9 May 2023 19:41 UTC
119 points
1 comment10 min readLW link

Re­sults from the in­ter­pretabil­ity hackathon

17 Nov 2022 14:51 UTC
81 points
0 comments6 min readLW link
(alignmentjam.com)

Iden­ti­fy­ing se­man­tic neu­rons, mechanis­tic cir­cuits & in­ter­pretabil­ity web apps

13 Apr 2023 11:59 UTC
18 points
0 comments8 min readLW link

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 2

25 May 2023 15:37 UTC
71 points
1 comment13 min readLW link

We Found An Neu­ron in GPT-2

11 Feb 2023 18:27 UTC
141 points
22 comments7 min readLW link
(clementneo.com)

How-to Trans­former Mechanis­tic In­ter­pretabil­ity—in 50 lines of code or less!

StefanHex24 Jan 2023 18:45 UTC
47 points
5 comments13 min readLW link

Ro­bust­ness of Model-Graded Eval­u­a­tions and Au­to­mated Interpretability

15 Jul 2023 19:12 UTC
43 points
5 comments9 min readLW link
No comments.