RogerDearnaley

Karma: 1,780

I’m a staff artificial intelligence engineer working with AI and LLMs, and have been interested in AI alignment, safety and interpretability for the last 15 years. I’m actively looking for employment working in this area, preferably in the UK — meanwhile I’ll be participating in SERI MATS summer 2025. I will also be attending LessOnline.

The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?

RogerDearnaley28 May 2025 6:21 UTC

31 points

34 comments9 min readLW link

Why Aligning an LLM is Hard, and How to Make it Easier

RogerDearnaley23 Jan 2025 6:44 UTC

34 points

3 comments4 min readLW link

[Question] What Other Lines of Work are Safe from AI Automation?

RogerDearnaley11 Jul 2024 10:01 UTC

40 points

35 comments5 min readLW link

A “Bitter Lesson” Approach to Aligning AGI and ASI

RogerDearnaley6 Jul 2024 1:23 UTC

64 points

41 comments24 min readLW link

7. Evolution and Ethics

RogerDearnaley15 Feb 2024 23:38 UTC

6 points

7 comments6 min readLW link

Requirements for a Basin of Attraction to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC

41 points

12 comments31 min readLW link

Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis

RogerDearnaley1 Feb 2024 21:15 UTC

16 points

15 comments13 min readLW link

Approximately Bayesian Reasoning: Knightian Uncertainty, Goodhart, and the Look-Elsewhere Effect

RogerDearnaley26 Jan 2024 3:58 UTC

16 points

2 comments11 min readLW link

A Chinese Room Containing a Stack of Stochastic Parrots

RogerDearnaley12 Jan 2024 6:29 UTC

20 points

3 comments5 min readLW link

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC

35 points

4 comments39 min readLW link

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor

RogerDearnaley9 Jan 2024 20:42 UTC

48 points

8 comments36 min readLW link

Striking Implications for Learning Theory, Interpretability — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC

37 points

4 comments2 min readLW link

5. Moral Value for Sentient Animals? Alas, Not Yet

RogerDearnaley27 Dec 2023 6:42 UTC

33 points

41 comments23 min readLW link

Interpreting the Learning of Deceit

RogerDearnaley18 Dec 2023 8:12 UTC

30 points

14 comments9 min readLW link

Language Model Memorization, Copyright Law, and Conditional Pretraining Alignment

RogerDearnaley7 Dec 2023 6:14 UTC

9 points

0 comments11 min readLW link

6. The Mutable Values Problem in Value Learning and CEV

RogerDearnaley4 Dec 2023 18:31 UTC

12 points

0 comments49 min readLW link

After Alignment — Dialogue between RogerDearnaley and Seth Herd

RogerDearnaley and Seth Herd

2 Dec 2023 6:03 UTC

15 points

2 comments25 min readLW link

How to Control an LLM’s Behavior (why my P(DOOM) went down)

RogerDearnaley28 Nov 2023 19:56 UTC

65 points

30 comments11 min readLW link

4. A Moral Case for Evolved-Sapience-Chauvinism

RogerDearnaley24 Nov 2023 4:56 UTC

10 points

0 comments4 min readLW link

3. Uploading

RogerDearnaley23 Nov 2023 7:39 UTC

21 points

5 comments8 min readLW link