Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Erik Jenner
Karma:
1,540
PhD student in AI safety at CHAI (UC Berkeley)
All
Posts
Comments
New
Top
Old
ARC paper: Formalizing the presumption of independence
Erik Jenner
20 Nov 2022 1:22 UTC
97
points
2
comments
2
min read
LW
link
(arxiv.org)
Research agenda: Formalizing abstractions of computations
Erik Jenner
2 Feb 2023 4:29 UTC
91
points
10
comments
31
min read
LW
link
Response to Katja Grace’s AI x-risk counterarguments
Erik Jenner
and
Johannes Treutlein
19 Oct 2022 1:17 UTC
77
points
18
comments
15
min read
LW
link
A comparison of causal scrubbing, causal abstractions, and related methods
Erik Jenner
,
Adrià Garriga-alonso
and
Egor Zverev
8 Jun 2023 23:40 UTC
72
points
3
comments
22
min read
LW
link
Sydney can play chess and kind of keep track of the board state
Erik Jenner
3 Mar 2023 9:39 UTC
62
points
19
comments
6
min read
LW
link
Good ontologies induce commutative diagrams
Erik Jenner
9 Oct 2022 0:06 UTC
49
points
5
comments
14
min read
LW
link
A gentle introduction to mechanistic anomaly detection
Erik Jenner
3 Apr 2024 23:06 UTC
45
points
0
comments
11
min read
LW
link
How are you dealing with ontology identification?
Erik Jenner
4 Oct 2022 23:28 UTC
34
points
10
comments
3
min read
LW
link
CHAI internship applications are open (due Nov 13)
Erik Jenner
26 Oct 2023 0:53 UTC
34
points
0
comments
3
min read
LW
link
Breaking down the training/deployment dichotomy
Erik Jenner
28 Aug 2022 21:45 UTC
30
points
3
comments
3
min read
LW
link
Concrete empirical research projects in mechanistic anomaly detection
Erik Jenner
,
Viktor Rehnberg
and
Oliver Daniels-Koch
3 Apr 2024 23:07 UTC
27
points
0
comments
10
min read
LW
link
[Question]
What is a decision theory as a mathematical object?
Erik Jenner
25 May 2020 13:44 UTC
26
points
3
comments
1
min read
LW
link
Subsets and quotients in interpretability
Erik Jenner
2 Dec 2022 23:13 UTC
26
points
1
comment
7
min read
LW
link
Reward model hacking as a challenge for reward learning
Erik Jenner
12 Apr 2022 9:39 UTC
25
points
1
comment
9
min read
LW
link
The (not so) paradoxical asymmetry between position and momentum
Erik Jenner
28 Mar 2021 13:31 UTC
21
points
10
comments
4
min read
LW
link
Disentangling inner alignment failures
Erik Jenner
10 Oct 2022 18:50 UTC
20
points
5
comments
4
min read
LW
link
Abstractions as morphisms between (co)algebras
Erik Jenner
14 Jan 2023 1:51 UTC
17
points
1
comment
8
min read
LW
link
Solution to the free will homework problem
Erik Jenner
24 Nov 2019 11:49 UTC
2
points
6
comments
2
min read
LW
link
Back to top