Col­laps­ing the Belief/​Knowl­edge Distinction

JeremiasSep 11, 2024, 9:24 PM
−7 points
8 comments1 min readLW link

Pro­gram­ming Re­fusal with Con­di­tional Ac­ti­va­tion Steering

Bruce W. LeeSep 11, 2024, 8:57 PM
41 points
0 comments11 min readLW link
(brucewlee.com)

Check­ing pub­lic figures on whether they “an­swered the ques­tion” quick anal­y­sis from Har­ris/​Trump de­bate, and a proposal

david reinsteinSep 11, 2024, 8:25 PM
7 points
4 comments1 min readLW link
(open.substack.com)

AI Safety Newslet­ter #41: The Next Gen­er­a­tion of Com­pute Scale Plus, Rank­ing Models by Sus­cep­ti­bil­ity to Jailbreak­ing, and Ma­chine Ethics

Sep 11, 2024, 7:14 PM
5 points
1 comment5 min readLW link
(newsletter.safe.ai)

Re­fac­tor­ing cry­on­ics as struc­tural brain preservation

Andy_McKenzieSep 11, 2024, 6:36 PM
101 points
14 comments3 min readLW link

[Question] Is this a Pivotal Weak Act? Creat­ing bac­te­ria that de­com­pose metal

doomyeserSep 11, 2024, 6:07 PM
9 points
9 comments3 min readLW link

How to dis­cover the na­ture of sen­tience, and ethics

Gustavo RamiresSep 11, 2024, 5:22 PM
−2 points
5 comments5 min readLW link

Seek­ing Mechanism De­signer for Re­search into In­ter­nal­iz­ing Catas­trophic Externalities

c.troutSep 11, 2024, 3:09 PM
24 points
2 comments3 min readLW link

Could Things Be Very Differ­ent?—How His­tor­i­cal In­er­tia Might Blind Us To Op­ti­mal Solutions

James Stephen BrownSep 11, 2024, 9:53 AM
5 points
0 comments8 min readLW link
(nonzerosum.games)

Re­for­ma­tive Hypocrisy, and Pay­ing Close Enough At­ten­tion to Selec­tively Re­ward It.

Andrew_CritchSep 11, 2024, 4:41 AM
53 points
11 comments3 min readLW link

A nec­es­sary Mem­brane for­mal­ism feature

ThomasCederborgSep 10, 2024, 9:33 PM
20 points
6 comments11 min readLW link

For­mal­iz­ing the In­for­mal (event in­vite)

abramdemskiSep 10, 2024, 7:22 PM
42 points
0 comments1 min readLW link

AI #80: Never Have I Ever

ZviSep 10, 2024, 5:50 PM
46 points
20 comments39 min readLW link
(thezvi.wordpress.com)

The Best Lay Ar­gu­ment is not a Sim­ple English Yud Essay

J BostockSep 10, 2024, 5:34 PM
253 points
15 comments5 min readLW link

Eco­nomics Roundup #3

ZviSep 10, 2024, 1:50 PM
44 points
9 comments20 min readLW link
(thezvi.wordpress.com)

Am­plify is hiring! Work with us to sup­port field-build­ing ini­ti­a­tives through digi­tal marketing

gergogasparSep 10, 2024, 8:56 AM
0 points
1 comment4 min readLW link

What boot­straps in­tel­li­gence?

invertedpassionSep 10, 2024, 7:11 AM
2 points
2 comments1 min readLW link

Phys­i­cal Ther­apy Sucks (but have you tried hid­ing it in some peanut but­ter?)

Declan MolonySep 10, 2024, 5:54 AM
16 points
12 comments1 min readLW link

Si­mon DeDeo on Ex­plore vs Ex­ploit in Science

ElizabethSep 10, 2024, 3:40 AM
20 points
0 comments1 min readLW link
(acesounderglass.com)

Virtue is a Vector

robotelvisSep 10, 2024, 3:02 AM
9 points
1 comment9 min readLW link
(messyprogress.substack.com)

MIT Fu­tureTech are hiring for a Tech­ni­cal As­so­ci­ate role

peterslatterySep 9, 2024, 8:16 PM
3 points
0 comments3 min readLW link

AI fore­cast­ing bots incoming

Sep 9, 2024, 7:14 PM
29 points
44 comments4 min readLW link
(www.safe.ai)

My takes on SB-1047

leogaoSep 9, 2024, 6:38 PM
151 points
8 comments4 min readLW link

[Question] Build­ing an In­ex­pen­sive, Aes­thetic, Pri­vate Forum

Aaron GraifmanSep 9, 2024, 5:10 PM
13 points
15 comments1 min readLW link

[Linkpost] In­ter­pretable Anal­y­sis of Fea­tures Found in Open-source Sparse Au­toen­coder (par­tial repli­ca­tion)

Fernando AvalosSep 9, 2024, 3:33 AM
6 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

[Question] Has Any­one Here Con­sciously Changed Their Pas­sions?

SpadeSep 9, 2024, 1:36 AM
11 points
12 comments1 min readLW link

Pol­lsters Should Pub­lish Ques­tion Translations

jefftkSep 8, 2024, 10:10 PM
60 points
3 comments2 min readLW link
(www.jefftk.com)

On Fables and Nuanced Charts

Niko_McCartySep 8, 2024, 5:09 PM
35 points
2 comments8 min readLW link
(www.asimov.press)

Con­tra Yud­kowsky on 2-4-6 Game Difficulty Explanations

Josh HickmanSep 8, 2024, 4:13 PM
6 points
1 comment2 min readLW link
(xn--2r8hmb.ws)

At­tach­ment THEORY AND THE EFFECTS OF SECURE ATTACHMENT ON CHILD DEVELOPMENT

Mihriban TemelSep 8, 2024, 4:09 PM
−8 points
0 comments9 min readLW link

Fic­tional par­a­sites very differ­ent from our own

Abhishaike MahajanSep 8, 2024, 2:59 PM
25 points
0 comments4 min readLW link
(www.owlposting.com)

My Num­ber 1 Episte­mol­ogy Book Recom­men­da­tion: In­vent­ing Temperature

adamShimiSep 8, 2024, 2:30 PM
121 points
18 comments3 min readLW link
(epistemologicalfascinations.substack.com)

[Question] I want a good multi-LLM API-pow­ered chatbot

rotatingpaguroSep 8, 2024, 9:40 AM
10 points
5 comments1 min readLW link

That Alien Mes­sage—The Animation

WriterSep 7, 2024, 2:53 PM
144 points
10 comments8 min readLW link
(youtu.be)

Jonothan Go­rard:The ter­ri­tory is iso­mor­phic to an equiv­alence class of its maps

Daniel CSep 7, 2024, 10:04 AM
19 points
18 comments2 min readLW link
(x.com)

Pay Risk Eval­u­a­tors in Cash, Not Equity

Adam SchollSep 7, 2024, 2:37 AM
214 points
19 comments1 min readLW link

Ex­cerpts from “A Reader’s Man­i­festo”

Arjun PanicksserySep 6, 2024, 10:37 PM
72 points
1 comment13 min readLW link
(arjunpanickssery.substack.com)

Fun With CellxGene

sarahconstantinSep 6, 2024, 10:00 PM
30 points
2 comments7 min readLW link
(sarahconstantin.substack.com)

[Question] Is this vot­ing sys­tem strat­egy proof?

Donald HobsonSep 6, 2024, 8:44 PM
17 points
9 comments1 min readLW link

Adam Op­ti­mizer Causes Priv­ileged Ba­sis in Trans­former LM Resi­d­ual Stream

Sep 6, 2024, 5:55 PM
70 points
7 comments4 min readLW link

Back­doors as an anal­ogy for de­cep­tive alignment

Sep 6, 2024, 3:30 PM
104 points
2 comments8 min readLW link
(www.alignment.org)

A Cable Holder for 2 Cent

Johannes C. MayerSep 6, 2024, 11:01 AM
1 point
1 comment1 min readLW link

Per­haps Try a Lit­tle Ther­apy, As a Treat?

segfault Sep 6, 2024, 8:51 AM
−187 points
61 comments16 min readLW link

In­ves­ti­gat­ing Sen­si­tive Direc­tions in GPT-2: An Im­proved Baseline and Com­par­a­tive Anal­y­sis of SAEs

Sep 6, 2024, 2:28 AM
28 points
0 comments12 min readLW link

Dist­in­guish worst-case anal­y­sis from in­stru­men­tal train­ing-gaming

Sep 5, 2024, 7:13 PM
38 points
0 comments5 min readLW link

AI x Hu­man Flour­ish­ing: In­tro­duc­ing the Cos­mos Institute

Brendan McCordSep 5, 2024, 6:23 PM
14 points
5 comments6 min readLW link
(cosmosinstitute.substack.com)

What is SB 1047 *for*?

RaemonSep 5, 2024, 5:39 PM
61 points
8 comments3 min readLW link

in­struc­tion tun­ing and au­tore­gres­sive dis­tri­bu­tion shift

nostalgebraistSep 5, 2024, 4:53 PM
40 points
5 comments5 min readLW link

Con­flat­ing value al­ign­ment and in­tent al­ign­ment is caus­ing confusion

Seth HerdSep 5, 2024, 4:39 PM
49 points
18 comments5 min readLW link

A bet for Samo Burja

Nathan Helm-BurgerSep 5, 2024, 4:01 PM
14 points
2 comments2 min readLW link