Steven Byrnes

Karma: 17,294

I’m an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed , Twitter , Mastodon , Threads , Bluesky , GitHub , Wikipedia , Physics-StackExchange , LinkedIn

Changing the world through slack & hobbies

Steven Byrnes21 Jul 2022 18:11 UTC

258 points

13 comments10 min readLW link

Munk AI debate: confusions and possible cruxes

Steven Byrnes27 Jun 2023 14:18 UTC

244 points

21 comments8 min readLW link

What does it take to defend the world against out-of-control AGIs?

Steven Byrnes25 Oct 2022 14:47 UTC

194 points

47 comments30 min readLW link 1 review

Thoughts on “AI is easy to control” by Pope & Belrose

Steven Byrnes1 Dec 2023 17:30 UTC

189 points

55 comments13 min readLW link

Jeff Hawkins on neuromorphic AGI within 20 years

Steven Byrnes15 Jul 2019 19:16 UTC

170 points

24 comments12 min readLW link

[Intro to brain-like-AGI safety] 1. What’s the problem & Why work on it now?

Steven Byrnes26 Jan 2022 15:23 UTC

150 points

19 comments24 min readLW link

My computational framework for the brain

Steven Byrnes14 Sep 2020 14:19 UTC

150 points

26 comments13 min readLW link 1 review

Some (problematic) aesthetics of what constitutes good work in academia

Steven Byrnes11 Mar 2024 17:47 UTC

146 points

12 comments12 min readLW link

AI doom from an LLM-plateau-ist perspective

Steven Byrnes27 Apr 2023 13:58 UTC

144 points

23 comments6 min readLW link

My side of an argument with Jacob Cannell about chip interconnect losses

Steven Byrnes21 Jun 2023 13:33 UTC

144 points

11 comments11 min readLW link

Why I’m not into the Free Energy Principle

Steven Byrnes2 Mar 2023 19:27 UTC

138 points

48 comments9 min readLW link

Inner Alignment in Salt-Starved Rats

Steven Byrnes19 Nov 2020 2:40 UTC

137 points

39 comments11 min readLW link 2 reviews

LeCun’s “A Path Towards Autonomous Machine Intelligence” has an unsolved technical alignment problem

Steven Byrnes8 May 2023 19:35 UTC

132 points

37 comments15 min readLW link

Reward Is Not Enough

Steven Byrnes16 Jun 2021 13:52 UTC

120 points

19 comments10 min readLW link 1 review

Steven Byrnes 7 Apr 2022 15:11 UTC
120 points
0
on: What an actually pessimistic containment strategy looks like
a strategy like “get existing top AGI researchers to stop”
There’s a (hopefully obvious) failure mode where the AGI doomer walks up to the AI capabilities researcher and says “Screw you for hastening the apocalypse. You should join me in opposing knowledge and progress.” Then the AI capabilities researcher responds “No, screw you, and leave me alone”. Not only is this useless, but it’s strongly counterproductive: that researcher will now be far more inclined to ignore and reject future outreach efforts (“Oh, pfft, I’ve already heard the argument for that, it’s stupid”), even if those future outreach efforts are better.
So the first step to good outreach is not treating AI capabilities researchers as the enemy. We need to view them as our future allies, and gently win them over to our side by the force of good arguments that meets them where they’re at, in a spirit of pedagogy and truth-seeking.
(You can maybe be more direct with someone that they’re doing counterproductive capabilities research when they’re already sold on AGI doom. That’s probably why your conversation at EleutherAI discord went OK.)
(In addition to “it would be directly super-counterproductive”, a second-order reason not to try to sabotage AI capabilities research is that “the kind of people who are attracted to movements that involve sabotaging enemies” has essentially no overlap with “the kind of people who we want to be part of our movement to avoid AGI doom”, in my opinion.)
So I endorse “get existing top AGI researchers to stop” as a good thing in the sense that if I had a magic wand I might wish for it (at least until we make more progress on AGI safety). But that’s very different from thinking that people should go out and directly try to do that.
Instead, I think the best approach to “get existing top AGI researchers to stop” is producing good pedagogy, and engaging in gentle, good-faith arguments (as opposed to gotchas) when the subject comes up, and continuing to do the research that may lead to more crisp and rigorous arguments for why AGI doom is likely (if indeed it’s likely) (and note that there are reasonable people who have heard and parsed and engaged with all the arguments about AGI doom but still think the probability of doom is <10%).
I do a lot of that kind of activity myself (1,2,3,4,5, etc.).