Man­i­fold Hal­loween Hackathon

Austin ChenOct 23, 2023, 10:47 PM
8 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Open Source Repli­ca­tion & Com­men­tary on An­thropic’s Dic­tionary Learn­ing Paper

Neel NandaOct 23, 2023, 10:38 PM
93 points

42 votes

Overall karma indicates overall quality.

12 comments9 min readLW link

The Shut­down Prob­lem: An AI Eng­ineer­ing Puz­zle for De­ci­sion Theorists

EJTOct 23, 2023, 9:00 PM
79 points

29 votes

Overall karma indicates overall quality.

22 comments39 min readLW link
(philpapers.org)

AI Align­ment [In­cre­men­tal Progress Units] this Week (10/​22/​23)

Logan ZoellnerOct 23, 2023, 8:32 PM
22 points

10 votes

Overall karma indicates overall quality.

0 comments6 min readLW link
(midwitalignment.substack.com)

z is not the cause of x

hrbigelowOct 23, 2023, 5:43 PM
6 points

5 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

Some of my pre­dictable up­dates on AI

Aaron_ScherOct 23, 2023, 5:24 PM
32 points

15 votes

Overall karma indicates overall quality.

8 comments9 min readLW link

Pro­gram­matic back­doors: DNNs can use SGD to run ar­bi­trary state­ful computation

Oct 23, 2023, 4:37 PM
107 points

46 votes

Overall karma indicates overall quality.

3 comments8 min readLW link

Ma­chine Un­learn­ing Eval­u­a­tions as In­ter­pretabil­ity Benchmarks

Oct 23, 2023, 4:33 PM
33 points

17 votes

Overall karma indicates overall quality.

2 comments11 min readLW link

VLM-RM: Spec­i­fy­ing Re­wards with Nat­u­ral Language

Oct 23, 2023, 2:11 PM
20 points

6 votes

Overall karma indicates overall quality.

2 comments5 min readLW link
(far.ai)

Con­tra Dance Dialect Survey

jefftkOct 23, 2023, 1:40 PM
11 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.jefftk.com)

[Question] Which LessWrongers are (as­piring) YouTu­bers?

Mati_RoyOct 23, 2023, 1:21 PM
22 points

8 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

[Question] What is an “anti-Oc­camian prior”?

ZaneOct 23, 2023, 2:26 AM
35 points

18 votes

Overall karma indicates overall quality.

22 comments1 min readLW link

An­nounc­ing Timaeus

Oct 22, 2023, 11:59 AM
188 points

83 votes

Overall karma indicates overall quality.

15 comments4 min readLW link

Into AI Safety—Epi­sode 0

jacobhaimesOct 22, 2023, 3:30 AM
5 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(into-ai-safety.github.io)

Thoughts On (Solv­ing) Deep Deception

JozdienOct 21, 2023, 10:40 PM
72 points

34 votes

Overall karma indicates overall quality.

6 comments6 min readLW link

Best effort beliefs

Adam ZernerOct 21, 2023, 10:05 PM
14 points

11 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

How toy mod­els of on­tol­ogy changes can be misleading

Stuart_ArmstrongOct 21, 2023, 9:13 PM
42 points

16 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Soups as Spreads

jefftkOct 21, 2023, 8:30 PM
22 points

14 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.jefftk.com)

Which COVID booster to get?

SameerishereOct 21, 2023, 7:43 PM
8 points

3 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Align­ment Im­pli­ca­tions of LLM Suc­cesses: a De­bate in One Act

Zack_M_DavisOct 21, 2023, 3:22 PM
266 points

124 votes

Overall karma indicates overall quality.

56 comments13 min readLW link2 reviews

How to find a good mov­ing service

Ziyue WangOct 21, 2023, 4:59 AM
8 points

8 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Ap­ply for MATS Win­ter 2023-24!

Oct 21, 2023, 2:27 AM
104 points

35 votes

Overall karma indicates overall quality.

6 comments5 min readLW link

[Question] Can we iso­late neu­rons that rec­og­nize fea­tures vs. those which have some other role?

Joshua ClancyOct 21, 2023, 12:30 AM
4 points

4 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

Mud­dling Along Is More Likely Than Dystopia

Jeffrey HeningerOct 20, 2023, 9:25 PM
88 points

42 votes

Overall karma indicates overall quality.

10 comments8 min readLW link

What’s Hard About The Shut­down Problem

johnswentworthOct 20, 2023, 9:13 PM
98 points

37 votes

Overall karma indicates overall quality.

33 comments4 min readLW link

Holly El­more and Rob Miles di­alogue on AI Safety Advocacy

Oct 20, 2023, 9:04 PM
163 points

60 votes

Overall karma indicates overall quality.

30 comments27 min readLW link

TOMORROW: the largest AI Safety protest ever!

Holly_ElmoreOct 20, 2023, 6:15 PM
105 points

56 votes

Overall karma indicates overall quality.

26 comments2 min readLW link

The Overkill Con­spir­acy Hypothesis

ymeskhoutOct 20, 2023, 4:51 PM
27 points

14 votes

Overall karma indicates overall quality.

9 comments7 min readLW link

I Would Have Solved Align­ment, But I Was Wor­ried That Would Ad­vance Timelines

307thOct 20, 2023, 4:37 PM
125 points

86 votes

Overall karma indicates overall quality.

33 comments9 min readLW link

In­ter­nal Tar­get In­for­ma­tion for AI Oversight

Paul CologneseOct 20, 2023, 2:53 PM
15 points

5 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

On the proper date for sols­tice celebrations

jchanOct 20, 2023, 1:55 PM
16 points

4 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Are (at least some) Large Lan­guage Models Holo­graphic Me­mory Stores?

Bill BenzonOct 20, 2023, 1:07 PM
11 points

6 votes

Overall karma indicates overall quality.

4 comments6 min readLW link

Mechanis­tic in­ter­pretabil­ity of LLM anal­ogy-making

SergiiOct 20, 2023, 12:53 PM
2 points

1 vote

Overall karma indicates overall quality.

0 comments4 min readLW link
(grgv.xyz)

How To So­cial­ize With Psy­cho(lo­gist)s

SableOct 20, 2023, 11:33 AM
37 points

17 votes

Overall karma indicates overall quality.

11 comments3 min readLW link
(affablyevil.substack.com)

Re­veal­ing In­ten­tion­al­ity In Lan­guage Models Through AdaVAE Guided Sampling

jdpOct 20, 2023, 7:32 AM
119 points

50 votes

Overall karma indicates overall quality.

15 comments22 min readLW link

Fea­tures and Ad­ver­saries in MemoryDT

Oct 20, 2023, 7:32 AM
31 points

15 votes

Overall karma indicates overall quality.

6 comments25 min readLW link

AI Safety Hub Ser­bia Soft Launch

DusanDNesicOct 20, 2023, 7:11 AM
64 points

35 votes

Overall karma indicates overall quality.

1 comment3 min readLW link
(forum.effectivealtruism.org)

An­nounc­ing new round of “Key Phenom­ena in AI Risk” Read­ing Group

Oct 20, 2023, 7:11 AM
15 points

7 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Un­pack­ing the dy­nam­ics of AGI con­flict that sug­gest the ne­ces­sity of a premp­tive pivotal act

Eli TyreOct 20, 2023, 6:48 AM
63 points

18 votes

Overall karma indicates overall quality.

2 comments8 min readLW link

Geno­cide isn’t Decolonization

robotelvisOct 20, 2023, 4:14 AM
33 points

62 votes

Overall karma indicates overall quality.

20 comments5 min readLW link
(messyprogress.substack.com)

Try­ing to un­der­stand John Went­worth’s re­search agenda

Oct 20, 2023, 12:05 AM
96 points

42 votes

Overall karma indicates overall quality.

13 comments12 min readLW link

Boost your pro­duc­tivity, hap­piness and health with this one weird trick

ajc586Oct 19, 2023, 11:30 PM
9 points

8 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

A Good Ex­pla­na­tion of Differ­en­tial Gears

Johannes C. MayerOct 19, 2023, 11:07 PM
48 points

19 votes

Overall karma indicates overall quality.

4 comments1 min readLW link
(youtu.be)

Even­ing Wiki(pe­dia) Workout

mcintOct 19, 2023, 9:29 PM
1 point

1 vote

Overall karma indicates overall quality.

1 comment1 min readLW link

New roles on my team: come build Open Phil’s tech­ni­cal AI safety pro­gram with me!

Ajeya CotraOct 19, 2023, 4:47 PM
83 points

32 votes

Overall karma indicates overall quality.

6 comments4 min readLW link

[Question] In­finite tower of meta-probability

fryolysisOct 19, 2023, 4:44 PM
6 points

7 votes

Overall karma indicates overall quality.

5 comments3 min readLW link

A NotKillEvery­oneIsm Ar­gu­ment for Ac­cel­er­at­ing Deep Learn­ing Research

Logan ZoellnerOct 19, 2023, 4:28 PM
−6 points

8 votes

Overall karma indicates overall quality.

6 comments5 min readLW link
(midwitalignment.substack.com)

Knowl­edge Base 5: Busi­ness model

iwisOct 19, 2023, 4:06 PM
−4 points

3 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

AI #34: Chip­ping Away at Chip Exports

ZviOct 19, 2023, 3:00 PM
36 points

26 votes

Overall karma indicates overall quality.

19 comments59 min readLW link
(thezvi.wordpress.com)

Is Yann LeCun straw­man­ning AI x-risks?

Chris_LeongOct 19, 2023, 11:35 AM
26 points

18 votes

Overall karma indicates overall quality.

4 comments1 min readLW link