So You Want To Make Marginal Progress...

johnswentworthFeb 7, 2025, 11:22 PM
294 points
42 comments4 min readLW link

Rea­sons-based choice and cluelessness

JesseCliftonFeb 7, 2025, 10:21 PM
34 points
0 comments10 min readLW link

[Trans­la­tion] In the Age of AI don’t Look for Unicorns

mushroomsoupFeb 7, 2025, 9:06 PM
3 points
0 comments10 min readLW link

Rac­ing Towards Fu­sion and AI

Jeffrey HeningerFeb 7, 2025, 8:40 PM
48 points
11 comments7 min readLW link

‘High-Level Ma­chine In­tel­li­gence’ and ‘Full Au­toma­tion of La­bor’ in the AI Im­pacts Surveys

Jeffrey HeningerFeb 7, 2025, 8:40 PM
11 points
1 comment7 min readLW link

Re­quest for In­for­ma­tion for a new US AI Ac­tion Plan (OSTP RFI)

agucovaFeb 7, 2025, 8:40 PM
5 points
0 commentsLW link
(www.federalregister.gov)

A Prob­lem to Solve Be­fore Build­ing a De­cep­tion Detector

Feb 7, 2025, 7:35 PM
71 points
12 comments14 min readLW link

Re­quest for pro­pos­als: im­prov­ing ca­pa­bil­ity evaluations

cbFeb 7, 2025, 6:51 PM
1 point
0 comments1 min readLW link
(www.openphilanthropy.org)

How AI Takeover Might Hap­pen in 2 Years

joshcFeb 7, 2025, 5:10 PM
422 points
137 comments29 min readLW link
(x.com)

the devil’s ontology

lostinwilliamsburgFeb 7, 2025, 2:18 PM
−1 points
14 comments6 min readLW link

On the Meta and Deep­Mind Safety Frameworks

ZviFeb 7, 2025, 1:10 PM
45 points
1 comment17 min readLW link
(thezvi.wordpress.com)

Illu­sory Safety: Redteam­ing Deep­Seek R1 and the Strongest Fine-Tun­able Models of OpenAI, An­thropic, and Google

Feb 7, 2025, 3:57 AM
29 points
0 comments10 min readLW link

When you down­vote, ex­plain why

KvmanThinkingFeb 7, 2025, 1:03 AM
4 points
31 comments1 min readLW link

Med­i­cal Wind­fall Prizes

PeterMcCluskeyFeb 6, 2025, 11:33 PM
5 points
1 comment5 min readLW link
(bayesianinvestor.com)

Do No Harm? Nav­i­gat­ing and Nudg­ing AI Mo­ral Choices

Feb 6, 2025, 7:18 PM
11 points
0 comments9 min readLW link

Open Philan­thropy Tech­ni­cal AI Safety RFP - $40M Available Across 21 Re­search Areas

Feb 6, 2025, 6:58 PM
111 points
0 comments1 min readLW link
(www.openphilanthropy.org)

AISN #47: Rea­son­ing Models

Feb 6, 2025, 6:52 PM
3 points
0 comments4 min readLW link
(newsletter.safe.ai)

Wild An­i­mal Suffer­ing Is The Worst Thing In The World

Bentham's BulldogFeb 6, 2025, 4:15 PM
26 points
18 comments7 min readLW link

De­tect­ing Strate­gic De­cep­tion Us­ing Lin­ear Probes

Feb 6, 2025, 3:46 PM
102 points
9 comments2 min readLW link
(arxiv.org)

AI #102: Made in America

ZviFeb 6, 2025, 2:20 PM
26 points
18 comments67 min readLW link
(thezvi.wordpress.com)

Biol­ogy, Ide­ol­ogy and Violence

Zero ContradictionsFeb 6, 2025, 11:26 AM
−3 points
5 comments2 min readLW link
(thewaywardaxolotl.blogspot.com)

MATS Ap­pli­ca­tions + Re­search Direc­tions I’m Cur­rently Ex­cited About

Neel NandaFeb 6, 2025, 11:03 AM
73 points
7 comments8 min readLW link

Don’t go bankrupt, don’t go rogue

Nathan YoungFeb 6, 2025, 10:31 AM
20 points
1 comment7 min readLW link

Vot­ing Re­sults for the 2023 Review

RaemonFeb 6, 2025, 8:00 AM
86 points
3 comments69 min readLW link

Chi­canery: No

ScrewtapeFeb 6, 2025, 5:42 AM
31 points
10 comments5 min readLW link

[Question] hyp­no­sis question

KvmanThinkingFeb 6, 2025, 2:41 AM
3 points
5 comments1 min readLW link

BIDA Cal­en­dar iCal Feed

jefftkFeb 6, 2025, 1:30 AM
9 points
0 comments1 min readLW link
(www.jefftk.com)

C’mon guys, De­liber­ate Prac­tice is Real

RaemonFeb 5, 2025, 10:33 PM
99 points
25 comments9 min readLW link

The Risk of Grad­ual Disem­pow­er­ment from AI

ZviFeb 5, 2025, 10:10 PM
87 points
20 comments20 min readLW link
(thezvi.wordpress.com)

Wired on: “DOGE per­son­nel with ad­min ac­cess to Fed­eral Pay­ment Sys­tem”

RaemonFeb 5, 2025, 9:32 PM
88 points
45 comments2 min readLW link
(web.archive.org)

On AI Scaling

harsimonyFeb 5, 2025, 8:24 PM
6 points
3 comments8 min readLW link
(splittinginfinity.substack.com)

The State of Metaculus

ChristianWilliamsFeb 5, 2025, 7:17 PM
21 points
0 commentsLW link
(www.metaculus.com)

Post-hoc rea­son­ing in chain of thought

Kyle CoxFeb 5, 2025, 6:58 PM
17 points
0 comments11 min readLW link

Deep­Seek-R1 for Beginners

Anton RazzhigaevFeb 5, 2025, 6:58 PM
12 points
0 comments8 min readLW link

Mak­ing the case for av­er­age-case AI Control

Nathaniel MitraniFeb 5, 2025, 6:56 PM
4 points
0 comments5 min readLW link

[Question] Align­ment Para­dox and a Re­quest for Harsh Criticism

Bridgett KayFeb 5, 2025, 6:17 PM
6 points
7 comments1 min readLW link

In­tro­duc­ing In­ter­na­tional AI Gover­nance Alli­ance (IAIGA)

jamesnorrisFeb 5, 2025, 4:02 PM
1 point
0 comments1 min readLW link

In­tro­duc­ing Col­lec­tive Ac­tion for Ex­is­ten­tial Safety: 80+ ac­tions in­di­vi­d­u­als, or­ga­ni­za­tions, and na­tions can take to im­prove our ex­is­ten­tial safety

jamesnorrisFeb 5, 2025, 4:02 PM
−9 points
2 comments1 min readLW link

Lan­guage Models Use Tri­gonom­e­try to Do Addition

Subhash KantamneniFeb 5, 2025, 1:50 PM
76 points
1 comment10 min readLW link

De­ploy­ing the Ob­server will save hu­man­ity from ex­is­ten­tial threats

Aram PanasencoFeb 5, 2025, 10:39 AM
−11 points
8 comments1 min readLW link

The Do­main of Orthogonality

mgfcatherallFeb 5, 2025, 8:14 AM
1 point
0 comments7 min readLW link

Re­view­ing LessWrong: Screw­tape’s Ba­sic Answer

ScrewtapeFeb 5, 2025, 4:30 AM
96 points
18 comments6 min readLW link

[Question] Why isn’t AI con­tain­ment the pri­mary AI safety strat­egy?

OKlogicFeb 5, 2025, 3:54 AM
1 point
3 comments3 min readLW link

Nick Land: Orthogonality

lumpenspaceFeb 4, 2025, 9:07 PM
12 points
37 comments8 min readLW link

What work­ing on AI safety taught me about B2B SaaS sales

purple fireFeb 4, 2025, 8:50 PM
7 points
12 comments5 min readLW link

Sub­jec­tive Nat­u­ral­ism in De­ci­sion The­ory: Sav­age vs. Jeffrey–Bolker

Feb 4, 2025, 8:34 PM
45 points
22 comments5 min readLW link

Anti-Slop In­ter­ven­tions?

abramdemskiFeb 4, 2025, 7:50 PM
76 points
33 comments6 min readLW link

Can Per­sua­sion Break AI Safety? Ex­plor­ing the In­ter­play Between Fine-Tun­ing, At­tacks, and Guardrails

Devina JainFeb 4, 2025, 7:10 PM
3 points
0 comments10 min readLW link

[Question] Jour­nal­ism stu­dent look­ing for sources

pinkertonFeb 4, 2025, 6:58 PM
11 points
3 comments1 min readLW link

We’re in Deep Research

ZviFeb 4, 2025, 5:20 PM
45 points
2 comments20 min readLW link
(thezvi.wordpress.com)