Thoughts On (Solv­ing) Deep Deception

Jozdien21 Oct 2023 22:40 UTC
66 points
2 comments6 min readLW link

Best effort beliefs

Adam Zerner21 Oct 2023 22:05 UTC
14 points
9 comments4 min readLW link

How toy mod­els of on­tol­ogy changes can be misleading

Stuart_Armstrong21 Oct 2023 21:13 UTC
41 points
0 comments2 min readLW link

Soups as Spreads

jefftk21 Oct 2023 20:30 UTC
22 points
0 comments1 min readLW link
(www.jefftk.com)

Which COVID booster to get?

Sameerishere21 Oct 2023 19:43 UTC
8 points
0 comments2 min readLW link

Align­ment Im­pli­ca­tions of LLM Suc­cesses: a De­bate in One Act

Zack_M_Davis21 Oct 2023 15:22 UTC
238 points
50 comments13 min readLW link

How to find a good mov­ing service

Ziyue Wang21 Oct 2023 4:59 UTC
8 points
0 comments3 min readLW link

Ap­ply for MATS Win­ter 2023-24!

21 Oct 2023 2:27 UTC
106 points
6 comments5 min readLW link

[Question] Can we iso­late neu­rons that rec­og­nize fea­tures vs. those which have some other role?

Joshua Clancy21 Oct 2023 0:30 UTC
4 points
2 comments3 min readLW link

Mud­dling Along Is More Likely Than Dystopia

Jeffrey Heninger20 Oct 2023 21:25 UTC
82 points
10 comments8 min readLW link

What’s Hard About The Shut­down Problem

johnswentworth20 Oct 2023 21:13 UTC
98 points
31 comments4 min readLW link

Holly El­more and Rob Miles di­alogue on AI Safety Advocacy

20 Oct 2023 21:04 UTC
157 points
30 comments27 min readLW link

TOMORROW: the largest AI Safety protest ever!

Holly_Elmore20 Oct 2023 18:15 UTC
101 points
25 comments2 min readLW link

The Overkill Con­spir­acy Hypothesis

ymeskhout20 Oct 2023 16:51 UTC
25 points
8 comments7 min readLW link

I Would Have Solved Align­ment, But I Was Wor­ried That Would Ad­vance Timelines

307th20 Oct 2023 16:37 UTC
115 points
32 comments9 min readLW link

In­ter­nal Tar­get In­for­ma­tion for AI Oversight

Paul Colognese20 Oct 2023 14:53 UTC
15 points
0 comments5 min readLW link

On the proper date for sols­tice celebrations

jchan20 Oct 2023 13:55 UTC
16 points
0 comments4 min readLW link

Are (at least some) Large Lan­guage Models Holo­graphic Me­mory Stores?

Bill Benzon20 Oct 2023 13:07 UTC
11 points
4 comments6 min readLW link

Mechanis­tic in­ter­pretabil­ity of LLM anal­ogy-making

Sergii20 Oct 2023 12:53 UTC
2 points
0 comments4 min readLW link
(grgv.xyz)

How To So­cial­ize With Psy­cho(lo­gist)s

Sable20 Oct 2023 11:33 UTC
34 points
11 comments3 min readLW link
(affablyevil.substack.com)

Re­veal­ing In­ten­tion­al­ity In Lan­guage Models Through AdaVAE Guided Sampling

jdp20 Oct 2023 7:32 UTC
118 points
14 comments22 min readLW link

Fea­tures and Ad­ver­saries in MemoryDT

20 Oct 2023 7:32 UTC
31 points
6 comments25 min readLW link

AI Safety Hub Ser­bia Soft Launch

DusanDNesic20 Oct 2023 7:11 UTC
65 points
1 comment3 min readLW link
(forum.effectivealtruism.org)

An­nounc­ing new round of “Key Phenom­ena in AI Risk” Read­ing Group

20 Oct 2023 7:11 UTC
13 points
2 comments1 min readLW link

Un­pack­ing the dy­nam­ics of AGI con­flict that sug­gest the ne­ces­sity of a premp­tive pivotal act

Eli Tyre20 Oct 2023 6:48 UTC
53 points
2 comments8 min readLW link

Geno­cide isn’t Decolonization

robotelvis20 Oct 2023 4:14 UTC
26 points
19 comments5 min readLW link
(messyprogress.substack.com)

Try­ing to un­der­stand John Went­worth’s re­search agenda

20 Oct 2023 0:05 UTC
92 points
11 comments12 min readLW link

Boost your pro­duc­tivity, hap­piness and health with this one weird trick

ajc58619 Oct 2023 23:30 UTC
9 points
9 comments1 min readLW link

A Good Ex­pla­na­tion of Differ­en­tial Gears

Johannes C. Mayer19 Oct 2023 23:07 UTC
46 points
4 comments1 min readLW link
(youtu.be)

Even­ing Wiki(pe­dia) Workout

mcint19 Oct 2023 21:29 UTC
1 point
1 comment1 min readLW link

New roles on my team: come build Open Phil’s tech­ni­cal AI safety pro­gram with me!

Ajeya Cotra19 Oct 2023 16:47 UTC
83 points
6 comments4 min readLW link

[Question] In­finite tower of meta-probability

fryolysis19 Oct 2023 16:44 UTC
6 points
5 comments3 min readLW link

A NotKillEvery­oneIsm Ar­gu­ment for Ac­cel­er­at­ing Deep Learn­ing Research

Logan Zoellner19 Oct 2023 16:28 UTC
−7 points
6 comments5 min readLW link
(midwitalignment.substack.com)

Knowl­edge Base 5: Busi­ness model

iwis19 Oct 2023 16:06 UTC
−6 points
2 comments1 min readLW link

AI #34: Chip­ping Away at Chip Exports

Zvi19 Oct 2023 15:00 UTC
36 points
19 comments59 min readLW link
(thezvi.wordpress.com)

Is Yann LeCun straw­man­ning AI x-risks?

Chris_Leong19 Oct 2023 11:35 UTC
25 points
4 comments1 min readLW link

[Video] Too much Em­piri­cism kills you

Johannes C. Mayer19 Oct 2023 5:08 UTC
14 points
0 comments1 min readLW link
(youtu.be)

Are hu­mans mis­al­igned with evolu­tion?

19 Oct 2023 3:14 UTC
42 points
13 comments18 min readLW link

Brains, Planes, Blimps, and Algorithms

ai dan18 Oct 2023 21:26 UTC
1 point
0 comments6 min readLW link

The (par­tial) fal­lacy of dumb superintelligence

Seth Herd18 Oct 2023 21:25 UTC
27 points
5 comments4 min readLW link

[Question] Does AI gov­er­nance needs a “Fed­er­al­ist pa­pers” de­bate?

azsantosk18 Oct 2023 21:08 UTC
40 points
4 comments1 min readLW link

Me­tac­u­lus Launches Con­di­tional Cup to Ex­plore Linked Forecasts

ChristianWilliams18 Oct 2023 20:41 UTC
9 points
0 comments1 min readLW link
(www.metaculus.com)

AI Safety 101 : Re­ward Misspecification

markov18 Oct 2023 20:39 UTC
30 points
4 comments31 min readLW link

2023 East Coast Ra­tion­al­ist Megameetup

Screwtape18 Oct 2023 20:33 UTC
8 points
0 comments1 min readLW link

Su­perfore­cast­ing the premises in “Is power-seek­ing AI an ex­is­ten­tial risk?”

Joe Carlsmith18 Oct 2023 20:23 UTC
31 points
3 comments5 min readLW link

The Real Fan­fic Is The Friends We Made Along The Way

Eneasz18 Oct 2023 19:21 UTC
83 points
0 comments27 min readLW link
(deathisbad.substack.com)

AISN #24: Kiss­inger Urges US-China Co­op­er­a­tion on AI, China’s New AI Law, US Ex­port Con­trols, In­ter­na­tional In­sti­tu­tions, and Open Source AI

18 Oct 2023 17:06 UTC
14 points
0 comments6 min readLW link
(newsletter.safe.ai)

Back to the Past to the Future

Prometheus18 Oct 2023 16:51 UTC
5 points
0 comments1 min readLW link

How to Erad­i­cate Global Ex­treme Poverty [RA video with fundraiser!]

18 Oct 2023 15:51 UTC
50 points
5 comments9 min readLW link
(youtu.be)

On In­ter­pretabil­ity’s Robustness

WCargo18 Oct 2023 13:18 UTC
11 points
0 comments4 min readLW link