RSS

Jeremy Gillen

Karma: 2,064

I’m interested in doing in-depth dialogues to find cruxes. Message me if you are interested in doing this.

I do alignment research, mostly stuff that is vaguely agent foundations. Currently doing independent alignment research on ontology identification. Formerly on Vivek’s team at MIRI.

De­tect Good­hart and shut down

Jeremy Gillen22 Jan 2025 18:45 UTC
70 points
21 comments7 min readLW link

Con­text-de­pen­dent consequentialism

4 Nov 2024 9:29 UTC
31 points
6 comments27 min readLW link

Without fun­da­men­tal ad­vances, mis­al­ign­ment and catas­tro­phe are the de­fault out­comes of train­ing pow­er­ful AI

26 Jan 2024 7:22 UTC
161 points
60 comments57 min readLW link

Thomas Kwa’s MIRI re­search experience

2 Oct 2023 16:42 UTC
174 points
53 comments1 min readLW link

AISC team re­port: Soft-op­ti­miza­tion, Bayes and Goodhart

27 Jun 2023 6:05 UTC
38 points
2 comments15 min readLW link

Soft op­ti­miza­tion makes the value tar­get bigger

Jeremy Gillen2 Jan 2023 16:06 UTC
119 points
20 comments12 min readLW link

Jeremy Gillen’s Shortform

Jeremy Gillen19 Oct 2022 16:14 UTC
6 points
46 commentsLW link

Neu­ral Tan­gent Ker­nel Distillation

5 Oct 2022 18:11 UTC
76 points
20 comments8 min readLW link

In­ner Align­ment via Superpowers

30 Aug 2022 20:01 UTC
37 points
13 comments4 min readLW link

Find­ing Goals in the World Model

22 Aug 2022 18:06 UTC
59 points
8 comments13 min readLW link

The Core of the Align­ment Prob­lem is...

17 Aug 2022 20:07 UTC
76 points
10 comments9 min readLW link

Pro­ject pro­posal: Test­ing the IBP defi­ni­tion of agent

9 Aug 2022 1:09 UTC
21 points
4 comments2 min readLW link

Broad Bas­ins and Data Compression

8 Aug 2022 20:33 UTC
33 points
6 comments7 min readLW link

Trans­lat­ing be­tween La­tent Spaces

30 Jul 2022 3:25 UTC
27 points
2 comments8 min readLW link

Ex­plain­ing in­ner al­ign­ment to myself

Jeremy Gillen24 May 2022 23:10 UTC
9 points
2 comments10 min readLW link

Good­hart’s Law Causal Diagrams

11 Apr 2022 13:52 UTC
35 points
6 comments6 min readLW link