Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Jessica Rumbelow
Karma:
1,060
AI interpretability researcher
All
Posts
Comments
New
Top
Old
Introducing Leap Labs, an AI interpretability startup
Jessica Rumbelow
6 Mar 2023 16:16 UTC
99
points
11
comments
1
min read
LW
link
SolidGoldMagikarp III: Glitch token archaeology
mwatkins
and
Jessica Rumbelow
14 Feb 2023 10:17 UTC
90
points
30
comments
16
min read
LW
link
SolidGoldMagikarp II: technical details and more recent findings
mwatkins
and
Jessica Rumbelow
6 Feb 2023 19:09 UTC
109
points
45
comments
13
min read
LW
link
SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow
and
mwatkins
5 Feb 2023 22:02 UTC
663
points
204
comments
12
min read
LW
link
Guardian AI (Misaligned systems are all around us.)
Jessica Rumbelow
25 Nov 2022 15:55 UTC
15
points
6
comments
2
min read
LW
link
The Ground Truth Problem (Or, Why Evaluating Interpretability Methods Is Hard)
Jessica Rumbelow
17 Nov 2022 11:06 UTC
27
points
2
comments
2
min read
LW
link
Why I’m Working On Model Agnostic Interpretability
Jessica Rumbelow
11 Nov 2022 9:24 UTC
26
points
9
comments
2
min read
LW
link
Back to top