Erik Jenner

Karma: 1,540

PhD student in AI safety at CHAI (UC Berkeley)

ARC paper: Formalizing the presumption of independence

Erik Jenner20 Nov 2022 1:22 UTC

97 points

2 comments2 min readLW link

(arxiv.org)

Research agenda: Formalizing abstractions of computations

Erik Jenner2 Feb 2023 4:29 UTC

91 points

10 comments31 min readLW link

Response to Katja Grace’s AI x-risk counterarguments

Erik Jenner and Johannes Treutlein

19 Oct 2022 1:17 UTC

77 points

18 comments15 min readLW link

A comparison of causal scrubbing, causal abstractions, and related methods

Erik Jenner, Adrià Garriga-alonso and Egor Zverev

8 Jun 2023 23:40 UTC

72 points

3 comments22 min readLW link

Sydney can play chess and kind of keep track of the board state

Erik Jenner3 Mar 2023 9:39 UTC

62 points

19 comments6 min readLW link

Good ontologies induce commutative diagrams

Erik Jenner9 Oct 2022 0:06 UTC

49 points

5 comments14 min readLW link

A gentle introduction to mechanistic anomaly detection

Erik Jenner3 Apr 2024 23:06 UTC

45 points

0 comments11 min readLW link

How are you dealing with ontology identification?

Erik Jenner4 Oct 2022 23:28 UTC

34 points

10 comments3 min readLW link

CHAI internship applications are open (due Nov 13)

Erik Jenner26 Oct 2023 0:53 UTC

34 points

0 comments3 min readLW link

Breaking down the training/deployment dichotomy

Erik Jenner28 Aug 2022 21:45 UTC

30 points

3 comments3 min readLW link

Concrete empirical research projects in mechanistic anomaly detection

Erik Jenner, Viktor Rehnberg and Oliver Daniels-Koch

3 Apr 2024 23:07 UTC

27 points

0 comments10 min readLW link

[Question] What is a decision theory as a mathematical object?

Erik Jenner25 May 2020 13:44 UTC

26 points

3 comments1 min readLW link

Subsets and quotients in interpretability

Erik Jenner2 Dec 2022 23:13 UTC

26 points

1 comment7 min readLW link

Reward model hacking as a challenge for reward learning

Erik Jenner12 Apr 2022 9:39 UTC

25 points

1 comment9 min readLW link

The (not so) paradoxical asymmetry between position and momentum

Erik Jenner28 Mar 2021 13:31 UTC

21 points

10 comments4 min readLW link

Disentangling inner alignment failures

Erik Jenner10 Oct 2022 18:50 UTC

20 points

5 comments4 min readLW link

Abstractions as morphisms between (co)algebras

Erik Jenner14 Jan 2023 1:51 UTC

17 points

1 comment8 min readLW link

Solution to the free will homework problem

Erik Jenner24 Nov 2019 11:49 UTC

2 points

6 comments2 min readLW link