Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Erik Jenner
Karma:
2,043
Research Scientist on the Google DeepMind AGI Safety & Alignment team
All
Posts
Comments
New
Top
Old
Page
1
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner
Jun 4, 2024, 3:50 PM
121
points
14
comments
13
min read
LW
link
Concrete empirical research projects in mechanistic anomaly detection
Erik Jenner
,
Viktor Rehnberg
and
Oliver Daniels
Apr 3, 2024, 11:07 PM
43
points
3
comments
10
min read
LW
link
A gentle introduction to mechanistic anomaly detection
Erik Jenner
Apr 3, 2024, 11:06 PM
73
points
2
comments
11
min read
LW
link
CHAI internship applications are open (due Nov 13)
Erik Jenner
Oct 26, 2023, 12:53 AM
34
points
0
comments
3
min read
LW
link
A comparison of causal scrubbing, causal abstractions, and related methods
Erik Jenner
,
Adrià Garriga-alonso
and
Egor Zverev
Jun 8, 2023, 11:40 PM
73
points
3
comments
22
min read
LW
link
[Appendix] Natural Abstractions: Key Claims, Theorems, and Critiques
LawrenceC
,
Erik Jenner
and
Leon Lang
Mar 16, 2023, 4:38 PM
48
points
0
comments
13
min read
LW
link
Natural Abstractions: Key Claims, Theorems, and Critiques
LawrenceC
,
Leon Lang
and
Erik Jenner
Mar 16, 2023, 4:37 PM
241
points
26
comments
45
min read
LW
link
3
reviews
Sydney can play chess and kind of keep track of the board state
Erik Jenner
Mar 3, 2023, 9:39 AM
64
points
19
comments
6
min read
LW
link
Research agenda: Formalizing abstractions of computations
Erik Jenner
Feb 2, 2023, 4:29 AM
93
points
10
comments
31
min read
LW
link
Abstractions as morphisms between (co)algebras
Erik Jenner
Jan 14, 2023, 1:51 AM
17
points
1
comment
8
min read
LW
link
Subsets and quotients in interpretability
Erik Jenner
Dec 2, 2022, 11:13 PM
26
points
1
comment
7
min read
LW
link
ARC paper: Formalizing the presumption of independence
Erik Jenner
Nov 20, 2022, 1:22 AM
97
points
2
comments
2
min read
LW
link
(arxiv.org)
Response to Katja Grace’s AI x-risk counterarguments
Erik Jenner
and
Johannes Treutlein
Oct 19, 2022, 1:17 AM
77
points
18
comments
15
min read
LW
link
Disentangling inner alignment failures
Erik Jenner
Oct 10, 2022, 6:50 PM
23
points
5
comments
4
min read
LW
link
Good ontologies induce commutative diagrams
Erik Jenner
Oct 9, 2022, 12:06 AM
49
points
5
comments
14
min read
LW
link
How are you dealing with ontology identification?
Erik Jenner
Oct 4, 2022, 11:28 PM
34
points
10
comments
3
min read
LW
link
Breaking down the training/deployment dichotomy
Erik Jenner
Aug 28, 2022, 9:45 PM
30
points
3
comments
3
min read
LW
link
Reward model hacking as a challenge for reward learning
Erik Jenner
Apr 12, 2022, 9:39 AM
25
points
1
comment
9
min read
LW
link
The (not so) paradoxical asymmetry between position and momentum
Erik Jenner
Mar 28, 2021, 1:31 PM
21
points
10
comments
4
min read
LW
link
ejenner’s Shortform
Erik Jenner
Jul 28, 2020, 10:42 AM
2
points
35
comments
LW
link
Back to top
Next