RSS

Steven Byrnes

Karma: 27,970

I’m an AGI safety /​ AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. I’m also at: Substack, X/​Twitter, Bluesky, RSS, email, and more at this link. See https://​​sjbyrnes.com/​​agi.html for a summary of my research and sorted list of writing. Physicist by training. Leave me anonymous feedback here.

“Act-based ap­proval-di­rected agents”, for IDA skeptics

Steven Byrnes18 Mar 2026 18:47 UTC
55 points
3 comments5 min readLW link

You can’t imi­ta­tion-learn how to con­tinual-learn

Steven Byrnes16 Mar 2026 21:20 UTC
123 points
28 comments6 min readLW link

Pod­cast: Jeremy Howard is bear­ish on LLMs

Steven Byrnes6 Mar 2026 21:39 UTC
78 points
24 comments5 min readLW link
(www.youtube.com)

Why we should ex­pect ruth­less so­ciopath ASI

Steven Byrnes18 Feb 2026 17:28 UTC
160 points
63 comments8 min readLW link

The brain is a ma­chine that runs an algorithm

Steven Byrnes17 Feb 2026 19:36 UTC
102 points
17 comments4 min readLW link

In (highly con­tin­gent!) defense of in­ter­pretabil­ity-in-the-loop ML training

Steven Byrnes6 Feb 2026 16:32 UTC
83 points
11 comments3 min readLW link

The na­ture of LLM al­gorith­mic progress (v2)

Steven Byrnes5 Feb 2026 19:17 UTC
114 points
23 comments13 min readLW link

Are there les­sons from high-re­li­a­bil­ity en­g­ineer­ing for AGI safety?

Steven Byrnes2 Feb 2026 15:26 UTC
159 points
15 comments8 min readLW link

New ver­sion of “In­tro to Brain-Like-AGI Safety”

Steven Byrnes23 Jan 2026 16:21 UTC
58 points
1 comment19 min readLW link

My AGI safety re­search—2025 re­view, ’26 plans

Steven Byrnes11 Dec 2025 17:05 UTC
136 points
4 comments12 min readLW link

Re­ward Func­tion De­sign: a starter pack

Steven Byrnes8 Dec 2025 19:15 UTC
80 points
13 comments16 min readLW link

We need a field of Re­ward Func­tion Design

Steven Byrnes8 Dec 2025 19:15 UTC
118 points
12 comments5 min readLW link

6 rea­sons why “al­ign­ment-is-hard” dis­course seems alien to hu­man in­tu­itions, and vice-versa

Steven Byrnes3 Dec 2025 18:37 UTC
361 points
92 comments17 min readLW link

So­cial drives 2: “Ap­proval Re­ward”, from norm-en­force­ment to sta­tus-seeking

Steven Byrnes12 Nov 2025 20:40 UTC
42 points
6 comments17 min readLW link

So­cial drives 1: “Sym­pa­thy Re­ward”, from com­pas­sion to dehumanization

Steven Byrnes10 Nov 2025 14:53 UTC
36 points
7 comments13 min readLW link

Ex­cerpts from my neu­ro­science to-do list

Steven Byrnes6 Oct 2025 21:05 UTC
28 points
2 comments4 min readLW link

Op­ti­cal recten­nas are not a promis­ing clean en­ergy technology

Steven Byrnes11 Sep 2025 23:08 UTC
90 points
2 comments6 min readLW link

Neu­ro­science of hu­man sex­ual at­trac­tion trig­gers (3 hy­pothe­ses)

Steven Byrnes25 Aug 2025 17:51 UTC
67 points
9 comments12 min readLW link

Four ways learn­ing Econ makes peo­ple dumber re: fu­ture AI

Steven Byrnes21 Aug 2025 17:52 UTC
364 points
52 comments6 min readLW link
(x.com)

[Question] In­scrutabil­ity was always in­evitable, right?

Steven Byrnes6 Aug 2025 21:57 UTC
100 points
33 comments2 min readLW link