RSS

RogerDearnaley

Karma: 1,451

I’m an staff artificial intelligence engineer in Silicon Valley currently working with LLMs, and have been interested in AI alignment, safety and interpretability for the last 15 years. I’m now actively looking for employment working in this area.

[Question] What Other Lines of Work are Safe from AI Au­toma­tion?

RogerDearnaley11 Jul 2024 10:01 UTC
29 points
35 comments5 min readLW link

A “Bit­ter Les­son” Ap­proach to Align­ing AGI and ASI

RogerDearnaley6 Jul 2024 1:23 UTC
56 points
39 comments24 min readLW link

7. Evolu­tion and Ethics

RogerDearnaley15 Feb 2024 23:38 UTC
3 points
6 comments6 min readLW link

Re­quire­ments for a Basin of At­trac­tion to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC
38 points
9 comments31 min readLW link

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

RogerDearnaley1 Feb 2024 21:15 UTC
13 points
15 comments13 min readLW link

Ap­prox­i­mately Bayesian Rea­son­ing: Knigh­tian Uncer­tainty, Good­hart, and the Look-Else­where Effect

RogerDearnaley26 Jan 2024 3:58 UTC
16 points
2 comments11 min readLW link

A Chi­nese Room Con­tain­ing a Stack of Stochas­tic Parrots

RogerDearnaley12 Jan 2024 6:29 UTC
19 points
3 comments5 min readLW link

Mo­ti­vat­ing Align­ment of LLM-Pow­ered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC
34 points
4 comments39 min readLW link

Good­bye, Shog­goth: The Stage, its An­i­ma­tron­ics, & the Pup­peteer – a New Metaphor

RogerDearnaley9 Jan 2024 20:42 UTC
47 points
8 comments36 min readLW link

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC
37 points
4 comments2 min readLW link

5. Mo­ral Value for Sen­tient An­i­mals? Alas, Not Yet

RogerDearnaley27 Dec 2023 6:42 UTC
33 points
41 comments23 min readLW link

In­ter­pret­ing the Learn­ing of Deceit

RogerDearnaley18 Dec 2023 8:12 UTC
30 points
14 comments9 min readLW link

Lan­guage Model Me­moriza­tion, Copy­right Law, and Con­di­tional Pre­train­ing Alignment

RogerDearnaley7 Dec 2023 6:14 UTC
9 points
0 comments11 min readLW link

6. The Mutable Values Prob­lem in Value Learn­ing and CEV

RogerDearnaley4 Dec 2023 18:31 UTC
12 points
0 comments49 min readLW link

After Align­ment — Dialogue be­tween RogerDear­naley and Seth Herd

2 Dec 2023 6:03 UTC
15 points
2 comments25 min readLW link

How to Con­trol an LLM’s Be­hav­ior (why my P(DOOM) went down)

RogerDearnaley28 Nov 2023 19:56 UTC
64 points
30 comments11 min readLW link

4. A Mo­ral Case for Evolved-Sapi­ence-Chau­vinism

RogerDearnaley24 Nov 2023 4:56 UTC
10 points
0 comments4 min readLW link

3. Uploading

RogerDearnaley23 Nov 2023 7:39 UTC
21 points
5 comments8 min readLW link

2. AIs as Eco­nomic Agents

RogerDearnaley23 Nov 2023 7:07 UTC
9 points
2 comments6 min readLW link

1. A Sense of Fair­ness: De­con­fus­ing Ethics

RogerDearnaley17 Nov 2023 20:55 UTC
16 points
8 comments15 min readLW link