RSS

Seth Herd

Karma: 2,634

I’ve been doing computational cognitive neuroscience research since getting my PhD in 2006, until the end of 2022. I’ve worked on computatonal theories of vision, executive function, episodic memory, and decision-making. I’ve focused on the emergent interactions that are needed to explain complex thought. I was increasingly concerned with AGI applications of the research, and reluctant to publish my best ideas. I’m incredibly excited to now be working directly on alignment, currently with generous funding from the Astera Institute. More info and publication list here.

Goals se­lected from learned knowl­edge: an al­ter­na­tive to RL alignment

Seth Herd15 Jan 2024 21:52 UTC
39 points
17 comments7 min readLW link

After Align­ment — Dialogue be­tween RogerDear­naley and Seth Herd

2 Dec 2023 6:03 UTC
15 points
2 comments25 min readLW link

Cor­rigi­bil­ity or DWIM is an at­trac­tive pri­mary goal for AGI

Seth Herd25 Nov 2023 19:37 UTC
18 points
4 comments1 min readLW link

Sapi­ence, un­der­stand­ing, and “AGI”

Seth Herd24 Nov 2023 15:13 UTC
15 points
3 comments6 min readLW link

Alt­man re­turns as OpenAI CEO with new board

Seth Herd22 Nov 2023 16:04 UTC
5 points
3 comments1 min readLW link

OpenAI Staff (in­clud­ing Sutskever) Threaten to Quit Un­less Board Resigns

Seth Herd20 Nov 2023 14:20 UTC
52 points
29 comments1 min readLW link
(www.wired.com)

We have promis­ing al­ign­ment plans with low taxes

Seth Herd10 Nov 2023 18:51 UTC
30 points
9 comments5 min readLW link

Seth Herd’s Shortform

Seth Herd10 Nov 2023 6:52 UTC
6 points
2 comments1 min readLW link

Shane Legg in­ter­view on alignment

Seth Herd28 Oct 2023 19:28 UTC
65 points
20 comments2 min readLW link
(www.youtube.com)

The (par­tial) fal­lacy of dumb superintelligence

Seth Herd18 Oct 2023 21:25 UTC
27 points
5 comments4 min readLW link

Steer­ing sub­sys­tems: ca­pa­bil­ities, agency, and alignment

Seth Herd29 Sep 2023 13:45 UTC
22 points
0 comments8 min readLW link

AGI isn’t just a technology

Seth Herd1 Sep 2023 14:35 UTC
18 points
12 comments2 min readLW link

In­ter­nal in­de­pen­dent re­view for lan­guage model agent alignment

Seth Herd7 Jul 2023 6:54 UTC
53 points
26 comments11 min readLW link

Sim­pler ex­pla­na­tions of AGI risk

Seth Herd14 May 2023 1:29 UTC
8 points
9 comments3 min readLW link

A sim­ple pre­sen­ta­tion of AI risk arguments

Seth Herd26 Apr 2023 2:19 UTC
16 points
0 comments2 min readLW link

Ca­pa­bil­ities and al­ign­ment of LLM cog­ni­tive architectures

Seth Herd18 Apr 2023 16:29 UTC
80 points
18 comments20 min readLW link

Agen­tized LLMs will change the al­ign­ment landscape

Seth Herd9 Apr 2023 2:29 UTC
153 points
95 comments3 min readLW link

AI scares and chang­ing pub­lic beliefs

Seth Herd6 Apr 2023 18:51 UTC
45 points
21 comments6 min readLW link

The al­ign­ment sta­bil­ity problem

Seth Herd26 Mar 2023 2:10 UTC
24 points
10 comments4 min readLW link

Hu­man prefer­ences as RL critic val­ues—im­pli­ca­tions for alignment

Seth Herd14 Mar 2023 22:10 UTC
21 points
6 comments6 min readLW link