[Question] What‘s in your list of un­solved prob­lems in AI al­ign­ment?

jacquesthibsMar 7, 2023, 6:58 PM
60 points
9 comments1 min readLW link

Prin­ci­ples for Pro­duc­tive Group Meetings

jsteinhardtMar 22, 2023, 12:50 AM
60 points
1 comment13 min readLW link
(bounded-regret.ghost.io)

Role Ar­chi­tec­tures: Ap­ply­ing LLMs to con­se­quen­tial tasks

Eric DrexlerMar 30, 2023, 3:00 PM
60 points
7 comments9 min readLW link

Chat­bot con­vinces Bel­gian to com­mit suicide

Jeroen De RyckMar 28, 2023, 6:14 PM
60 points
18 comments3 min readLW link
(www.standaard.be)

The Nat­u­ral State is Goodhart

devanshMar 20, 2023, 12:00 AM
59 points
4 comments2 min readLW link

On the Cri­sis at Sili­con Valley Bank

ZviMar 16, 2023, 3:50 PM
59 points
9 comments41 min readLW link
(thezvi.wordpress.com)

Thoughts on the OpenAI al­ign­ment plan: will AI re­search as­sis­tants be net-pos­i­tive for AI ex­is­ten­tial risk?

Jeffrey LadishMar 10, 2023, 8:21 AM
58 points
3 comments9 min readLW link

Shan­non’s Sur­pris­ing Discovery

johnswentworthMar 30, 2023, 8:15 PM
57 points
7 comments8 min readLW link

Against ubiquitous al­ign­ment taxes

berenMar 6, 2023, 7:50 PM
57 points
10 comments2 min readLW link

What can we learn from Lex Frid­man’s in­ter­view with Sam Alt­man?

Karl von WendtMar 27, 2023, 6:27 AM
56 points
22 comments9 min readLW link

AI Gover­nance & Strat­egy: Pri­ori­ties, tal­ent gaps, & opportunities

Orpheus16Mar 3, 2023, 6:09 PM
56 points
2 comments4 min readLW link

GPT-4 solves Gary Mar­cus-in­duced flubs

JakubKMar 17, 2023, 6:40 AM
56 points
29 comments2 min readLW link
(docs.google.com)

Robin Han­son’s lat­est AI risk po­si­tion statement

LironMar 3, 2023, 2:25 PM
55 points
18 comments1 min readLW link
(www.overcomingbias.com)

AI #3

ZviMar 9, 2023, 12:20 PM
55 points
12 comments62 min readLW link
(thezvi.wordpress.com)

Try to solve the hard parts of the al­ign­ment problem

Mikhail SaminMar 18, 2023, 2:55 PM
54 points
33 comments5 min readLW link

Les­sons from Con­ver­gent Evolu­tion for AI Alignment

Mar 27, 2023, 4:25 PM
54 points
9 comments8 min readLW link

Why did you trash the old HPMOR.com?

AnnoyedReaderMar 6, 2023, 1:55 AM
54 points
68 comments2 min readLW link

Against Deep Ideas

YafahEdelmanMar 19, 2023, 3:04 AM
53 points
14 comments2 min readLW link

~100 In­ter­est­ing Questions

RohanSMar 30, 2023, 1:57 PM
53 points
18 comments9 min readLW link

A Primer On Chaos

johnswentworthMar 28, 2023, 6:01 PM
53 points
9 comments9 min readLW link

The al­gorithm isn’t do­ing X, it’s just do­ing Y.

Cleo NardoMar 16, 2023, 11:28 PM
53 points
43 comments5 min readLW link

Dona­tion offsets for ChatGPT Plus subscriptions

Jeffrey LadishMar 16, 2023, 11:29 PM
53 points
3 comments3 min readLW link

Ques­tions about Con­je­cure’s CoEm proposal

Mar 9, 2023, 7:32 PM
51 points
4 comments2 min readLW link

Some ML-Re­lated Math I Now Un­der­stand Better

Fabien RogerMar 9, 2023, 4:35 PM
50 points
6 comments4 min readLW link

[Question] Challenge: Does ChatGPT ever claim that a bad out­come for hu­man­ity is ac­tu­ally good?

Yair HalberstadtMar 22, 2023, 4:01 PM
49 points
29 comments1 min readLW link

[Question] Which parts of the ex­ist­ing in­ter­net are already likely to be in (GPT-5/​other soon-to-be-trained LLMs)’s train­ing cor­pus?

AnnaSalamonMar 29, 2023, 5:17 AM
49 points
2 comments1 min readLW link

How well did Man­i­fold pre­dict GPT-4?

David CheeMar 15, 2023, 11:19 PM
49 points
5 comments2 min readLW link

The hot mess the­ory of AI mis­al­ign­ment: More in­tel­li­gent agents be­have less coherently

Jonathan YanMar 10, 2023, 12:20 AM
48 points
22 comments1 min readLW link
(sohl-dickstein.github.io)

Othello-GPT: Fu­ture Work I Am Ex­cited About

Neel NandaMar 29, 2023, 10:13 PM
48 points
2 comments33 min readLW link
(neelnanda.io)

Who Aligns the Align­ment Re­searchers?

Ben SmithMar 5, 2023, 11:22 PM
48 points
0 comments11 min readLW link

[Ap­pendix] Nat­u­ral Ab­strac­tions: Key Claims, The­o­rems, and Critiques

Mar 16, 2023, 4:38 PM
48 points
0 comments13 min readLW link

2022 Sur­vey Results

ScrewtapeMar 8, 2023, 7:16 PM
48 points
8 comments20 min readLW link

Fight­ing with­out hope

Orpheus16Mar 1, 2023, 6:15 PM
47 points
14 comments4 min readLW link1 review

Co­or­di­na­tion ex­plo­sion be­fore in­tel­li­gence ex­plo­sion...?

tailcalledMar 5, 2023, 8:48 PM
47 points
9 comments2 min readLW link

Nose /​ throat treat­ments for res­pi­ra­tory infections

juliawiseMar 13, 2023, 2:41 AM
47 points
6 comments8 min readLW link

Pod­cast Tran­script: Daniela and Dario Amodei on Anthropic

rememberMar 7, 2023, 4:47 PM
46 points
2 comments79 min readLW link
(futureoflife.org)

A Brief Defense of Ath­let­i­cism

WofsenMar 7, 2023, 8:48 PM
46 points
5 comments1 min readLW link

[Question] Does polyamory at a work­place turn nepo­tism up to eleven?

ViliamMar 5, 2023, 12:57 AM
45 points
11 comments2 min readLW link

Write a Book?

jefftkMar 16, 2023, 12:10 AM
45 points
7 comments3 min readLW link
(www.jefftk.com)

Im­plied “util­ities” of simu­la­tors are broad, dense, and shallow

porbyMar 1, 2023, 3:23 AM
45 points
7 comments3 min readLW link

At­tri­bu­tion Patch­ing: Ac­ti­va­tion Patch­ing At In­dus­trial Scale

Neel NandaMar 16, 2023, 9:44 PM
45 points
10 comments58 min readLW link
(www.neelnanda.io)

Do hu­mans de­rive val­ues from fic­ti­tious im­puted co­her­ence?

TsviBTMar 5, 2023, 3:23 PM
45 points
8 comments14 min readLW link

The Power of In­tel­li­gence—The Animation

WriterMar 11, 2023, 4:15 PM
45 points
3 comments1 min readLW link
(youtu.be)

Wittgen­stein and ML — pa­ram­e­ters vs architecture

Cleo NardoMar 24, 2023, 4:54 AM
44 points
9 comments5 min readLW link

Meetup Tip: The Next Meetup Will Be. . .

ScrewtapeMar 17, 2023, 10:04 PM
44 points
0 comments3 min readLW link

Some con­struc­tions for proof-based co­op­er­a­tion with­out Löb

James PayorMar 21, 2023, 4:12 PM
43 points
3 comments4 min readLW link

Draft: In­tro­duc­tion to optimization

Alex_AltairMar 26, 2023, 5:25 PM
43 points
8 comments16 min readLW link

[Question] Why no ma­jor LLMs with mem­ory?

Kaj_SotalaMar 28, 2023, 4:34 PM
42 points
15 comments1 min readLW link

Pro­ject “MIRI as a Ser­vice”

RomanSMar 8, 2023, 7:22 PM
42 points
4 comments1 min readLW link

How pop­u­lar is ChatGPT? Part 2: slower growth than Poké­mon GO

Richard Korzekwa Mar 3, 2023, 11:40 PM
42 points
4 comments6 min readLW link
(aiimpacts.org)