RogerDearnaley

Karma: 1,451

I’m an staff artificial intelligence engineer in Silicon Valley currently working with LLMs, and have been interested in AI alignment, safety and interpretability for the last 15 years. I’m now actively looking for employment working in this area.

[Question] What Other Lines of Work are Safe from AI Automation?

RogerDearnaley11 Jul 2024 10:01 UTC

29 points

35 comments5 min readLW link

A “Bitter Lesson” Approach to Aligning AGI and ASI

RogerDearnaley6 Jul 2024 1:23 UTC

56 points

39 comments24 min readLW link

7. Evolution and Ethics

RogerDearnaley15 Feb 2024 23:38 UTC

3 points

6 comments6 min readLW link

Requirements for a Basin of Attraction to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC

38 points

9 comments31 min readLW link

Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis

RogerDearnaley1 Feb 2024 21:15 UTC

13 points

15 comments13 min readLW link

Approximately Bayesian Reasoning: Knightian Uncertainty, Goodhart, and the Look-Elsewhere Effect

RogerDearnaley26 Jan 2024 3:58 UTC

16 points

2 comments11 min readLW link

A Chinese Room Containing a Stack of Stochastic Parrots

RogerDearnaley12 Jan 2024 6:29 UTC

19 points

3 comments5 min readLW link

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC

34 points

4 comments39 min readLW link

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor

RogerDearnaley9 Jan 2024 20:42 UTC

47 points

8 comments36 min readLW link

Striking Implications for Learning Theory, Interpretability — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC

37 points

4 comments2 min readLW link

5. Moral Value for Sentient Animals? Alas, Not Yet

RogerDearnaley27 Dec 2023 6:42 UTC

33 points

41 comments23 min readLW link

Interpreting the Learning of Deceit

RogerDearnaley18 Dec 2023 8:12 UTC

30 points

14 comments9 min readLW link

Language Model Memorization, Copyright Law, and Conditional Pretraining Alignment

RogerDearnaley7 Dec 2023 6:14 UTC

9 points

0 comments11 min readLW link

6. The Mutable Values Problem in Value Learning and CEV

RogerDearnaley4 Dec 2023 18:31 UTC

12 points

0 comments49 min readLW link

After Alignment — Dialogue between RogerDearnaley and Seth Herd

RogerDearnaley and Seth Herd

2 Dec 2023 6:03 UTC

15 points

2 comments25 min readLW link

How to Control an LLM’s Behavior (why my P(DOOM) went down)

RogerDearnaley28 Nov 2023 19:56 UTC

64 points

30 comments11 min readLW link

4. A Moral Case for Evolved-Sapience-Chauvinism

RogerDearnaley24 Nov 2023 4:56 UTC

10 points

0 comments4 min readLW link

3. Uploading

RogerDearnaley23 Nov 2023 7:39 UTC

21 points

5 comments8 min readLW link

2. AIs as Economic Agents

RogerDearnaley23 Nov 2023 7:07 UTC

9 points

2 comments6 min readLW link

1. A Sense of Fairness: Deconfusing Ethics

RogerDearnaley17 Nov 2023 20:55 UTC

16 points

8 comments15 min readLW link