Jeremy Gillen

Karma: 2,064

I’m interested in doing in-depth dialogues to find cruxes. Message me if you are interested in doing this.

I do alignment research, mostly stuff that is vaguely agent foundations. Currently doing independent alignment research on ontology identification. Formerly on Vivek’s team at MIRI.

Detect Goodhart and shut down

Jeremy Gillen22 Jan 2025 18:45 UTC

70 points

21 comments7 min readLW link

Context-dependent consequentialism

Jeremy Gillen and mattmacdermott

4 Nov 2024 9:29 UTC

31 points

6 comments27 min readLW link

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI

Jeremy Gillen and peterbarnett

26 Jan 2024 7:22 UTC

161 points

60 comments57 min readLW link

Thomas Kwa’s MIRI research experience

Thomas Kwa, peterbarnett, Vivek Hebbar, Jeremy Gillen, Bird Concept and Raemon

2 Oct 2023 16:42 UTC

174 points

53 comments1 min readLW link

AISC team report: Soft-optimization, Bayes and Goodhart

Simon Fischer, benjaminko, jazcarretao, DFNaiff and Jeremy Gillen

27 Jun 2023 6:05 UTC

38 points

2 comments15 min readLW link

Soft optimization makes the value target bigger

Jeremy Gillen2 Jan 2023 16:06 UTC

119 points

20 comments12 min readLW link

Jeremy Gillen’s Shortform

Jeremy Gillen19 Oct 2022 16:14 UTC

6 points

46 comments LW link

Neural Tangent Kernel Distillation

Thomas Larsen and Jeremy Gillen

5 Oct 2022 18:11 UTC

76 points

20 comments8 min readLW link

Inner Alignment via Superpowers

JamesH, Thomas Larsen and Jeremy Gillen

30 Aug 2022 20:01 UTC

37 points

13 comments4 min readLW link

Finding Goals in the World Model

Jeremy Gillen, JamesH and Thomas Larsen

22 Aug 2022 18:06 UTC

59 points

8 comments13 min readLW link

The Core of the Alignment Problem is...

Thomas Larsen, Jeremy Gillen and JamesH

17 Aug 2022 20:07 UTC

76 points

10 comments9 min readLW link

Project proposal: Testing the IBP definition of agent

Jeremy Gillen, Thomas Larsen and JamesH

9 Aug 2022 1:09 UTC

21 points

4 comments2 min readLW link

Broad Basins and Data Compression

Jeremy Gillen, Stephen Fowler and Thomas Larsen

8 Aug 2022 20:33 UTC

33 points

6 comments7 min readLW link

Translating between Latent Spaces

JamesH, Jeremy Gillen and NickyP

30 Jul 2022 3:25 UTC

27 points

2 comments8 min readLW link

Explaining inner alignment to myself

Jeremy Gillen24 May 2022 23:10 UTC

9 points

2 comments10 min readLW link

Goodhart’s Law Causal Diagrams

JustinShovelain and Jeremy Gillen

11 Apr 2022 13:52 UTC

35 points

6 comments6 min readLW link