Back­doors have uni­ver­sal rep­re­sen­ta­tions across large lan­guage models

Dec 6, 2024, 10:56 PM
16 points
0 comments16 min readLW link

Gra­di­ent Rout­ing: Mask­ing Gra­di­ents to Lo­cal­ize Com­pu­ta­tion in Neu­ral Networks

Dec 6, 2024, 10:19 PM
165 points
12 comments11 min readLW link
(arxiv.org)

Un­der­stand­ing Shap­ley Values with Venn Diagrams

Carson LDec 6, 2024, 9:56 PM
214 points
36 commentsLW link
(medium.com)

Model Integrity

Dec 6, 2024, 9:28 PM
4 points
1 comment18 min readLW link

Can AI im­prove the cur­rent state of molec­u­lar simu­la­tion?

Abhishaike MahajanDec 6, 2024, 8:22 PM
5 points
0 comments1 min readLW link
(www.owlposting.com)

Low Tem­per­a­ture Solomonoff Induction

dil-leik-ogDec 6, 2024, 6:55 PM
10 points
4 comments11 min readLW link

Ex­per­i­ments are in the ter­ri­tory, re­sults are in the map

TahpDec 6, 2024, 3:44 PM
5 points
1 comment6 min readLW link

A car jour­ney with con­ser­va­tive evan­gel­i­cals—Un­der­stand­ing some Bri­tish poli­ti­cal-re­li­gious beliefs

Nathan YoungDec 6, 2024, 11:22 AM
41 points
8 comments6 min readLW link
(nathanpmyoung.substack.com)

Fron­tier Models are Ca­pable of In-con­text Scheming

Dec 5, 2024, 10:11 PM
203 points
24 comments7 min readLW link

Should you be wor­ried about H5N1?

gwDec 5, 2024, 9:11 PM
89 points
2 comments5 min readLW link
(www.georgeyw.com)

o1 tried to avoid be­ing shut down

RaelifinDec 5, 2024, 7:52 PM
10 points
5 comments1 min readLW link
(www.transformernews.ai)

More Growth, Me­lan­choly, and MindCraft @3QD [re­vised and up­dated]

Bill BenzonDec 5, 2024, 7:36 PM
4 points
0 comments4 min readLW link

Ex­pevolu, a laissez-faire ap­proach to coun­try creation

FernandoDec 5, 2024, 7:29 PM
4 points
4 comments44 min readLW link
(expevolu.substack.com)

Are SAE fea­tures from the Base Model still mean­ingful to LLaVA?

Shan23ChenDec 5, 2024, 7:24 PM
5 points
2 comments10 min readLW link

OpenAI o1 + ChatGPT Pro release

anagumaDec 5, 2024, 7:13 PM
5 points
0 comments1 min readLW link
(openai.com)

Smart peo­ple should do biology

HaotianDec 5, 2024, 7:11 PM
11 points
2 comments3 min readLW link

An­nounce­ment: AI for Math Fund

sarahconstantinDec 5, 2024, 6:33 PM
20 points
9 comments2 min readLW link
(renaissancephilanthropy.org)

De­tec­tion of Asymp­tomat­i­cally Spread­ing Pathogens

jefftkDec 5, 2024, 6:20 PM
45 points
8 comments7 min readLW link
(www.jefftk.com)

Model In­tegrity: MAI on Value Alignment

Jonas HallgrenDec 5, 2024, 5:11 PM
6 points
11 comments1 min readLW link
(meaningalignment.substack.com)

So­cial Science in its episte­molog­i­cal context

Arturo MaciasDec 5, 2024, 4:12 PM
3 points
0 comments1 min readLW link
(www.theseedsofscience.pub)

Higher and lower pleasures

Chris_LeongDec 5, 2024, 1:13 PM
19 points
3 comments1 min readLW link

Sam Har­ris’s Ar­gu­ment For Ob­jec­tive Morality

Zero ContradictionsDec 5, 2024, 10:19 AM
7 points
5 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

Mo­ral­ity as Co­op­er­a­tion Part III: Failure Modes

DeLesley HutchinsDec 5, 2024, 9:39 AM
4 points
0 comments20 min readLW link

Mo­ral­ity as Co­op­er­a­tion Part II: The­ory and Experiment

DeLesley HutchinsDec 5, 2024, 9:04 AM
2 points
0 comments17 min readLW link

Mo­ral­ity as Co­op­er­a­tion Part I: Humans

DeLesley HutchinsDec 5, 2024, 8:16 AM
5 points
0 comments19 min readLW link

I Fi­nally Worked Through Bayes’ The­o­rem (Per­sonal Achieve­ment)

keltanDec 5, 2024, 2:04 AM
53 points
7 comments9 min readLW link

The Dream Machine

sarahconstantinDec 5, 2024, 12:00 AM
117 points
6 comments12 min readLW link
(sarahconstantin.substack.com)

Should you have chil­dren? A de­ci­sion frame­work for a cru­cial life choice that af­fects your­self, your child and the world

SherrinfordDec 4, 2024, 11:14 PM
0 points
1 comment20 min readLW link

CCing Mailing Lists on Ex­ter­nal Communication

jefftkDec 4, 2024, 10:00 PM
9 points
0 comments1 min readLW link
(www.jefftk.com)

Pick­ing favourites is hard

dkl9Dec 4, 2024, 8:46 PM
11 points
3 comments1 min readLW link
(dkl9.net)

[Question] How can I con­vince my cryp­to­bro friend that S&P500 is effi­cient?

AhmedNeedsATherapistDec 4, 2024, 8:04 PM
−7 points
10 comments1 min readLW link

The 2023 LessWrong Re­view: The Ba­sic Ask

RaemonDec 4, 2024, 7:52 PM
77 points
25 comments9 min readLW link

Is the AI Dooms­day Nar­ra­tive the Product of a Big Tech Con­spir­acy?

garrisonDec 4, 2024, 7:20 PM
35 points
1 commentLW link
(garrisonlovely.substack.com)

[Question] AI box question

KvmanThinkingDec 4, 2024, 7:03 PM
2 points
2 comments1 min readLW link

The Po­lite Coup

Charlie SandersDec 4, 2024, 2:03 PM
3 points
0 comments3 min readLW link
(www.dailymicrofiction.com)

Anal­y­sis of Global AI Gover­nance Strategies

Dec 4, 2024, 10:45 AM
49 points
10 comments36 min readLW link

[Question] Cry­on­ics con­sid­er­a­tions: how big of a prob­lem is is­chemia?

kmanDec 4, 2024, 4:45 AM
8 points
1 comment1 min readLW link

AI #93: Happy Tuesday

ZviDec 4, 2024, 12:30 AM
26 points
2 comments23 min readLW link
(thezvi.wordpress.com)

A Qual­i­ta­tive Case for LTFF: Filling Crit­i­cal Ecosys­tem Gaps

LinchDec 3, 2024, 9:57 PM
64 points
2 commentsLW link

Deep Causal Transcod­ing: A Frame­work for Mechanis­ti­cally Elic­it­ing La­tent Be­hav­iors in Lan­guage Models

Dec 3, 2024, 9:19 PM
106 points
8 comments41 min readLW link

“Align­ment at Large”: Bend­ing the Arc of His­tory Towards Life-Affirm­ing Futures

welfvhDec 3, 2024, 9:17 PM
5 points
0 comments4 min readLW link

Roots of Progress is hiring an event manager

jasoncrawfordDec 3, 2024, 8:46 PM
10 points
0 comments7 min readLW link
(rootsofprogress.notion.site)

Do simu­lacra dream of digi­tal sheep?

EuanMcLeanDec 3, 2024, 8:25 PM
16 points
36 comments10 min readLW link

Orca com­mu­ni­ca­tion pro­ject—seek­ing feed­back (and col­lab­o­ra­tors)

Towards_KeeperhoodDec 3, 2024, 5:29 PM
38 points
16 comments2 min readLW link

Book a Time to Chat about In­terp Research

Logan RiggsDec 3, 2024, 5:27 PM
47 points
3 comments1 min readLW link

Balsa Re­search 2024 Update

ZviDec 3, 2024, 12:30 PM
21 points
0 comments5 min readLW link
(thezvi.wordpress.com)

First Solo Bus Ride

jefftkDec 3, 2024, 12:20 PM
28 points
1 comment1 min readLW link
(www.jefftk.com)

How to make evals for the AISI evals bounty

TheManxLoinerDec 3, 2024, 10:44 AM
9 points
0 comments5 min readLW link

Should there be just one west­ern AGI pro­ject?

Dec 3, 2024, 10:11 AM
78 points
75 comments15 min readLW link
(www.forethought.org)

Cog­ni­tive Bi­ases Con­tribut­ing to AI X-risk — a deleted ex­cerpt from my 2018 ARCHES draft

Andrew_CritchDec 3, 2024, 9:29 AM
48 points
2 comments5 min readLW link