Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Alignment Jam
Tag
Last edit:
16 May 2023 8:25 UTC
by
Esben Kran
This lists the posts that have come from the
Alignment Jam hackathons
.
Relevant
New
Old
Towards AI Safety Infrastructure: Talk & Outline
Paul Bricman
7 Jan 2024 9:31 UTC
10
points
0
comments
2
min read
LW
link
(www.youtube.com)
Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon
Esben Kran
19 Apr 2024 14:46 UTC
5
points
0
comments
1
min read
LW
link
(www.apartresearch.com)
Results from the AI testing hackathon
Esben Kran
2 Jan 2023 15:46 UTC
13
points
0
comments
1
min read
LW
link
Superposition and Dropout
Edoardo Pona
16 May 2023 7:24 UTC
21
points
5
comments
6
min read
LW
link
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
StefanHex
and
Marius Hobbhahn
9 May 2023 19:41 UTC
119
points
1
comment
10
min read
LW
link
Results from the interpretability hackathon
Esben Kran
and
Neel Nanda
17 Nov 2022 14:51 UTC
81
points
0
comments
6
min read
LW
link
(alignmentjam.com)
Identifying semantic neurons, mechanistic circuits & interpretability web apps
Esben Kran
and
Neel Nanda
13 Apr 2023 11:59 UTC
18
points
0
comments
8
min read
LW
link
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2
StefanHex
and
Marius Hobbhahn
25 May 2023 15:37 UTC
71
points
1
comment
13
min read
LW
link
We Found An Neuron in GPT-2
Joseph Miller
and
Clement Neo
11 Feb 2023 18:27 UTC
141
points
22
comments
7
min read
LW
link
(clementneo.com)
How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
StefanHex
24 Jan 2023 18:45 UTC
47
points
5
comments
13
min read
LW
link
Robustness of Model-Graded Evaluations and Automated Interpretability
Simon Lermen
and
viluon
15 Jul 2023 19:12 UTC
44
points
5
comments
9
min read
LW
link
No comments.
Back to top