Jozdien

Karma: 3,156

The distillation double bind: Distilling misaligned models either transfers misalignment or it doesn’t

Alek Westover, SebastianP, Alexa Pan and Jozdien

18 Jun 2026 21:21 UTC

57 points

4 comments5 min readLW link

(blog.redwoodresearch.org)

Incriminating misaligned AI models via distillation

Alek Westover, SebastianP, Alex Mallen, Jozdien, Alexa Pan, Julian Stastny and Vivek Hebbar

15 May 2026 21:43 UTC

115 points

12 comments5 min readLW link

Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers

Jozdien and Alex Mallen

28 Apr 2026 18:00 UTC

55 points

2 comments7 min readLW link

Desiderata of good problems to hand off to AIs

Jozdien19 Jan 2026 16:55 UTC

29 points

1 comment4 min readLW link

How hard is it to inoculate against misalignment generalization?

Jozdien6 Jan 2026 17:30 UTC

46 points

4 comments14 min readLW link

Reasoning Models Sometimes Output Illegible Chains of Thought

Jozdien24 Nov 2025 18:24 UTC

83 points

9 comments6 min readLW link

Realistic Reward Hacking Induces Different and Deeper Misalignment

Jozdien9 Oct 2025 18:45 UTC

146 points

2 comments23 min readLW link

Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior

Sam Marks, Nevan Wichers, Daniel Tan, Aram Ebtekar, Jozdien, David Africa, Alex Mallen and Fabien Roger

8 Oct 2025 22:02 UTC

176 points

37 comments2 min readLW link

Why Do Some Language Models Fake Alignment While Others Don’t?

abhayesian, John Hughes, Alex Mallen, Jozdien, janus and Fabien Roger

8 Jul 2025 21:49 UTC

159 points

14 comments5 min readLW link

(arxiv.org)

Lighthaven Sequences Reading Group #36 (Tuesday 5/27)

Jozdien, Garrett Baker, Ben Pace, the Vacationing Vagabond, Ronny Fernandez and Aella

26 May 2025 23:52 UTC

8 points

0 comments1 min readLW link

Lighthaven Sequences Reading Group #35 (Tuesday 5/20)

Garrett Baker, Aella, Ronny Fernandez, Ben Pace, the Vacationing Vagabond and Jozdien

19 May 2025 20:58 UTC

8 points

0 comments1 min readLW link

Lighthaven Sequences Reading Group #34 (Tuesday 5/13)

Garrett Baker, Aella, Ronny Fernandez, Ben Pace, the Vacationing Vagabond and Jozdien

10 May 2025 7:42 UTC

8 points

0 comments1 min readLW link

Lighthaven Sequences Reading Group #33 (Tuesday 5/6)

Garrett Baker, Aella, Ronny Fernandez, Ben Pace, the Vacationing Vagabond, Garrett Baker and Jozdien

30 Apr 2025 3:39 UTC

8 points

0 comments1 min readLW link

[LAPTOP REQUIRED] Lighthaven Sequences Reading Group #32 (Tuesday 04/29)

Garrett Baker, Aella, Ronny Fernandez, Ben Pace, the Vacationing Vagabond, Garrett Baker and Jozdien

26 Apr 2025 3:53 UTC

12 points

0 comments2 min readLW link

Lighthaven Sequences Reading Group #31 (Tuesday 04/22)

Garrett Baker, Aella, Ronny Fernandez, Ben Pace, the Vacationing Vagabond, Garrett Baker and Jozdien

16 Apr 2025 4:46 UTC

7 points

0 comments1 min readLW link

Lighthaven Sequences Reading Group #30 (Tuesday 04/15)

Jozdien, Aella, Ronny Fernandez, Ben Pace, the Vacationing Vagabond and Garrett Baker

14 Apr 2025 1:18 UTC

8 points

0 comments2 min readLW link

Lighthaven Sequences Reading Group #29 (Tuesday 04/08)

Jozdien, Aella, Ronny Fernandez, Ben Pace, the Vacationing Vagabond, Garrett Baker, Jozdien and orthonormal

4 Apr 2025 1:16 UTC

9 points

0 comments2 min readLW link

Introducing BenchBench: An Industry Standard Benchmark for AI Strength

Jozdien2 Apr 2025 2:11 UTC

49 points

0 comments2 min readLW link

Lighthaven Sequences Reading Group #28 (Tuesday 04/01)

Jozdien, Aella, Ronny Fernandez, Ben Pace, the Vacationing Vagabond and Garrett Baker

26 Mar 2025 2:43 UTC

12 points

0 comments1 min readLW link

Lighthaven Sequences Reading Group #27 (Tuesday 03/25)

Garrett Baker, Aella, Ronny Fernandez, Ben Pace, the Vacationing Vagabond, Garrett Baker and Jozdien

20 Mar 2025 4:34 UTC

14 points

0 comments2 min readLW link