RSS

RogerDearnaley

Karma: 2,638

I’m a staff artificial intelligence engineer and researcher working with AI and LLMs, and have been interested in AI alignment, safety and interpretability for the last 17 years. I did research into this during MATS summer 2025, and am now an independent researcher at Meridian in Cambridge. I’m currently looking for either employment or funding to work on this subject in the London/​Cambridge area in the UK.

[Question] How Hard a Prob­lem is Align­ment?

RogerDearnaley11 Mar 2026 16:47 UTC
21 points
15 comments3 min readLW link

How Hard a Prob­lem is Align­ment? (My Opinionated An­swer)

RogerDearnaley11 Mar 2026 16:46 UTC
48 points
4 comments67 min readLW link

Shap­ing the ex­plo­ra­tion of the mo­ti­va­tion-space mat­ters for AI safety

6 Mar 2026 14:43 UTC
77 points
13 comments10 min readLW link

Re­port­ing Tasks as Re­ward-Hack­able: Bet­ter Than Inoc­u­la­tion Prompt­ing?

RogerDearnaley21 Feb 2026 1:59 UTC
34 points
2 comments5 min readLW link

[Question] What’s Your P(WEIRD)?

RogerDearnaley16 Feb 2026 18:19 UTC
26 points
18 comments9 min readLW link

The Meta-An­thropic Argument

RogerDearnaley2 Feb 2026 1:10 UTC
41 points
55 comments2 min readLW link

Pre­train­ing on Aligned AI Data Dra­mat­i­cally Re­duces Misal­ign­ment—Even After Post-Training

RogerDearnaley19 Jan 2026 21:24 UTC
105 points
12 comments11 min readLW link
(arxiv.org)

Ground­ing Value Learn­ing in Evolu­tion­ary Psy­chol­ogy: an Alter­na­tive Pro­posal to CEV

RogerDearnaley23 Dec 2025 3:40 UTC
41 points
25 comments20 min readLW link

The Best Way to Align an LLM: Is In­ner Align­ment Now a Solved Prob­lem?

RogerDearnaley28 May 2025 6:21 UTC
36 points
34 comments9 min readLW link

Why Align­ing an LLM is Hard, and How to Make it Easier

RogerDearnaley23 Jan 2025 6:44 UTC
37 points
3 comments4 min readLW link

[Question] What Other Lines of Work are Safe from AI Au­toma­tion?

RogerDearnaley11 Jul 2024 10:01 UTC
41 points
36 comments5 min readLW link

A “Bit­ter Les­son” Ap­proach to Align­ing AGI and ASI

RogerDearnaley6 Jul 2024 1:23 UTC
66 points
43 comments24 min readLW link

2.5. Evolu­tion and Ethics

RogerDearnaley15 Feb 2024 23:38 UTC
8 points
12 comments7 min readLW link1 review

Re­quire­ments for a Basin of At­trac­tion to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC
47 points
12 comments31 min readLW link

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

RogerDearnaley1 Feb 2024 21:15 UTC
15 points
15 comments13 min readLW link

Ap­prox­i­mately Bayesian Rea­son­ing: Knigh­tian Uncer­tainty, Good­hart, and the Look-Else­where Effect

RogerDearnaley26 Jan 2024 3:58 UTC
17 points
2 comments11 min readLW link

A Chi­nese Room Con­tain­ing a Stack of Stochas­tic Parrots

RogerDearnaley12 Jan 2024 6:29 UTC
21 points
4 comments5 min readLW link1 review

Mo­ti­vat­ing Align­ment of LLM-Pow­ered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC
37 points
4 comments39 min readLW link

Good­bye, Shog­goth: The Stage, its An­i­ma­tron­ics, & the Pup­peteer – a New Metaphor

RogerDearnaley9 Jan 2024 20:42 UTC
49 points
8 comments37 min readLW link

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC
37 points
4 comments2 min readLW link