Steven Byrnes

Karma: 28,005

I’m an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. I’m also at: Substack, X/Twitter, Bluesky, RSS, email, and more at this link. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Leave me anonymous feedback here.

“Act-based approval-directed agents”, for IDA skeptics

Steven Byrnes18 Mar 2026 18:47 UTC

55 points

8 comments5 min readLW link

You can’t imitation-learn how to continual-learn

Steven Byrnes16 Mar 2026 21:20 UTC

124 points

29 comments6 min readLW link

Podcast: Jeremy Howard is bearish on LLMs

Steven Byrnes6 Mar 2026 21:39 UTC

78 points

24 comments5 min readLW link

(www.youtube.com)

Why we should expect ruthless sociopath ASI

Steven Byrnes18 Feb 2026 17:28 UTC

160 points

63 comments8 min readLW link

The brain is a machine that runs an algorithm

Steven Byrnes17 Feb 2026 19:36 UTC

102 points

17 comments4 min readLW link

In (highly contingent!) defense of interpretability-in-the-loop ML training

Steven Byrnes6 Feb 2026 16:32 UTC

83 points

11 comments3 min readLW link

The nature of LLM algorithmic progress (v2)

Steven Byrnes5 Feb 2026 19:17 UTC

114 points

23 comments13 min readLW link

Are there lessons from high-reliability engineering for AGI safety?

Steven Byrnes2 Feb 2026 15:26 UTC

159 points

15 comments8 min readLW link

New version of “Intro to Brain-Like-AGI Safety”

Steven Byrnes23 Jan 2026 16:21 UTC

58 points

1 comment19 min readLW link

My AGI safety research—2025 review, ’26 plans

Steven Byrnes11 Dec 2025 17:05 UTC

136 points

4 comments12 min readLW link

Reward Function Design: a starter pack

Steven Byrnes8 Dec 2025 19:15 UTC

80 points

13 comments16 min readLW link

We need a field of Reward Function Design

Steven Byrnes8 Dec 2025 19:15 UTC

118 points

12 comments5 min readLW link

6 reasons why “alignment-is-hard” discourse seems alien to human intuitions, and vice-versa

Steven Byrnes3 Dec 2025 18:37 UTC

361 points

92 comments17 min readLW link

Social drives 2: “Approval Reward”, from norm-enforcement to status-seeking

Steven Byrnes12 Nov 2025 20:40 UTC

42 points

6 comments17 min readLW link

Social drives 1: “Sympathy Reward”, from compassion to dehumanization

Steven Byrnes10 Nov 2025 14:53 UTC

36 points

7 comments13 min readLW link

Excerpts from my neuroscience to-do list

Steven Byrnes6 Oct 2025 21:05 UTC

28 points

2 comments4 min readLW link

Optical rectennas are not a promising clean energy technology

Steven Byrnes11 Sep 2025 23:08 UTC

90 points

2 comments6 min readLW link

Neuroscience of human sexual attraction triggers (3 hypotheses)

Steven Byrnes25 Aug 2025 17:51 UTC

67 points

9 comments12 min readLW link

Four ways learning Econ makes people dumber re: future AI

Steven Byrnes21 Aug 2025 17:52 UTC

364 points

52 comments6 min readLW link

(x.com)

[Question] Inscrutability was always inevitable, right?

Steven Byrnes6 Aug 2025 21:57 UTC

100 points

33 comments2 min readLW link