3 lev­els of threat obfuscation

HoldenKarnofskyAug 2, 2023, 2:58 PM
69 points
14 comments7 min readLW link

LLMs are (mostly) not helped by filler tokens

Kshitij SachanAug 10, 2023, 12:48 AM
66 points
35 comments6 min readLW link

Steven Wolfram on AI Alignment

Bill BenzonAug 20, 2023, 7:49 PM
66 points
15 comments4 min readLW link

Manag­ing risks of our own work

Beth BarnesAug 18, 2023, 12:41 AM
66 points
0 comments2 min readLW link

“Dirty con­cepts” in AI al­ign­ment dis­courses, and some guesses for how to deal with them

Aug 20, 2023, 9:13 AM
66 points
4 comments3 min readLW link

State of Gen­er­ally Available Self-Driving

jefftkAug 22, 2023, 6:50 PM
66 points
6 comments2 min readLW link
(www.jefftk.com)

AI Reg­u­la­tion May Be More Im­por­tant Than AI Align­ment For Ex­is­ten­tial Safety

otto.bartenAug 24, 2023, 11:41 AM
65 points
39 comments5 min readLW link

A short calcu­la­tion about a Twit­ter poll

Ege ErdilAug 14, 2023, 7:48 PM
64 points
64 comments11 min readLW link

Ideas for im­prov­ing epistemics in AI safety outreach

micAug 21, 2023, 7:55 PM
64 points
6 comments3 min readLW link

What Does a Marginal Grant at LTFF Look Like? Fund­ing Pri­ori­ties and Grant­mak­ing Thresh­olds at the Long-Term Fu­ture Fund

Aug 11, 2023, 3:59 AM
64 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

“Is There Any­thing That’s Worth More”

Zack_M_DavisAug 2, 2023, 3:28 AM
64 points
6 comments1 min readLW link

DIY De­liber­ate Practice

lynettebyeAug 21, 2023, 12:22 PM
63 points
4 comments5 min readLW link
(lynettebye.com)

Bar­ri­ers to Mechanis­tic In­ter­pretabil­ity for AGI Safety

Connor LeahyAug 29, 2023, 10:56 AM
63 points
13 comments1 min readLW link
(www.youtube.com)

Pri­vate notes on LW?

RaemonAug 4, 2023, 5:35 PM
61 points
33 comments1 min readLW link

‘We’re chang­ing the clouds.’ An un­fore­seen test of geo­eng­ineer­ing is fuel­ing record ocean warmth

AnnapurnaAug 6, 2023, 8:58 PM
60 points
6 comments1 min readLW link
(www.science.org)

AI #25: In­flec­tion Point

ZviAug 17, 2023, 2:40 PM
59 points
9 comments36 min readLW link
(thezvi.wordpress.com)

If we had known the at­mo­sphere would ignite

JeffsAug 16, 2023, 8:28 PM
59 points
63 comments2 min readLW link

AI #23: Fun­da­men­tal Prob­lems with RLHF

ZviAug 3, 2023, 12:50 PM
59 points
9 comments41 min readLW link
(thezvi.wordpress.com)

Will AI kill ev­ery­one? Here’s what the god­fathers of AI have to say [RA video]

WriterAug 19, 2023, 5:29 PM
58 points
8 commentsLW link
(youtu.be)

Stomach Ulcers and Den­tal Cavities

MetacelsusAug 5, 2023, 2:08 PM
57 points
7 comments1 min readLW link
(denovo.substack.com)

Open Call for Re­search As­sis­tants in Devel­op­men­tal Interpretability

Aug 30, 2023, 9:02 AM
56 points
11 comments4 min readLW link

Diet Ex­per­i­ment Pr­ereg­is­tra­tion: Long-term wa­ter fast­ing + seed oil re­moval

lcAug 23, 2023, 10:08 PM
56 points
18 comments1 min readLW link

AI De­cep­tion: A Sur­vey of Ex­am­ples, Risks, and Po­ten­tial Solutions

Aug 29, 2023, 1:29 AM
54 points
3 comments10 min readLW link

The lost millennium

Ege ErdilAug 24, 2023, 3:48 AM
54 points
14 comments3 min readLW link

Why Is No One Try­ing To Align Profit In­cen­tives With Align­ment Re­search?

PrometheusAug 23, 2023, 1:16 PM
51 points
11 comments4 min readLW link

Effi­ciency and re­source use scal­ing parity

Ege ErdilAug 21, 2023, 12:18 AM
51 points
1 comment4 min readLW link1 review

Reflec­tions on “Mak­ing the Atomic Bomb”

boazbarakAug 17, 2023, 2:48 AM
51 points
7 comments8 min readLW link

An­nounc­ing Squig­gle Hub

Aug 5, 2023, 1:00 AM
49 points
4 comments5 min readLW link
(forum.effectivealtruism.org)

AI #26: Fine Tun­ing Time

ZviAug 24, 2023, 3:30 PM
49 points
6 comments33 min readLW link
(thezvi.wordpress.com)

AI #24: Week of the Podcast

ZviAug 10, 2023, 3:00 PM
49 points
5 comments44 min readLW link
(thezvi.wordpress.com)

Bar­bieheimer: Across the Dead Reckoning

ZviAug 1, 2023, 1:00 PM
49 points
17 comments41 min readLW link
(thezvi.wordpress.com)

how 2 tell if ur in­put is out of dis­tri­bu­tion given only model weights

dkirmaniAug 5, 2023, 10:45 PM
48 points
10 comments1 min readLW link

Assess­ment of in­tel­li­gence agency func­tion­al­ity is difficult yet important

trevorAug 24, 2023, 1:42 AM
48 points
5 comments9 min readLW link

Per­pet­u­ally De­clin­ing Pop­u­la­tion?

jefftkAug 8, 2023, 1:30 AM
48 points
29 comments3 min readLW link
(www.jefftk.com)

Chess as a case study in hid­den ca­pa­bil­ities in ChatGPT

AdamYedidiaAug 19, 2023, 6:35 AM
47 points
32 comments6 min readLW link

Un­der­stand­ing and vi­su­al­iz­ing syco­phancy datasets

Nina PanicksseryAug 16, 2023, 5:34 AM
46 points
0 comments6 min readLW link

Au­tonomous repli­ca­tion and adap­ta­tion: an at­tempt at a con­crete dan­ger threshold

Hjalmar_WijkAug 17, 2023, 1:31 AM
45 points
0 comments13 min readLW link

A Model-based Ap­proach to AI Ex­is­ten­tial Risk

Aug 25, 2023, 10:32 AM
45 points
9 comments32 min readLW link

Man­i­fund: What we’re fund­ing (weeks 2-4)

Austin ChenAug 4, 2023, 4:00 PM
44 points
2 commentsLW link
(manifund.substack.com)

The Sinews of Su­dan’s Lat­est War

Tim LiptrotAug 4, 2023, 6:17 PM
43 points
12 comments12 min readLW link

Is Chi­nese to­tal fac­tor pro­duc­tivity lower to­day than it was in 1956?

Ege ErdilAug 18, 2023, 10:33 PM
43 points
0 comments26 min readLW link

Monthly Roundup #9: Au­gust 2023

ZviAug 7, 2023, 1:20 PM
42 points
25 comments57 min readLW link
(thezvi.wordpress.com)

[Linkpost] Per­sonal and Psy­cholog­i­cal Di­men­sions of AI Re­searchers Con­fronting AI Catas­trophic Risks

Bogdan Ionut Cirstea12 Aug 2023 22:02 UTC
42 points
0 comments1 min readLW link

Some rules for life (v.0,0)

Neil 17 Aug 2023 0:43 UTC
42 points
13 comments12 min readLW link
(neilwarren.substack.com)

[Question] Which pos­si­ble AI sys­tems are rel­a­tively safe?

Zach Stein-Perlman21 Aug 2023 17:00 UTC
42 points
20 comments1 min readLW link

Walk while you talk: don’t balk at “no chalk”

dkl922 Aug 2023 21:27 UTC
41 points
10 comments2 min readLW link
(dkl9.net)

AGI is eas­ier than robotaxis

Daniel Kokotajlo13 Aug 2023 17:00 UTC
41 points
30 comments4 min readLW link

marine cloud brightening

bhauth9 Aug 2023 2:50 UTC
40 points
14 comments3 min readLW link
(www.bhauth.com)

Seth Ex­plains Consciousness

Jacob Falkovich22 Aug 2023 18:06 UTC
39 points
130 comments14 min readLW link1 review
(putanumonit.com)

Im­pli­ca­tions of ev­i­den­tial co­op­er­a­tion in large worlds

Lukas Finnveden23 Aug 2023 0:43 UTC
39 points
4 comments17 min readLW link
(lukasfinnveden.substack.com)