Med­i­cal Wind­fall Prizes

PeterMcCluskey6 Feb 2025 23:33 UTC
5 points
1 comment5 min readLW link
(bayesianinvestor.com)

Do No Harm? Nav­i­gat­ing and Nudg­ing AI Mo­ral Choices

6 Feb 2025 19:18 UTC
11 points
0 comments9 min readLW link

Open Philan­thropy Tech­ni­cal AI Safety RFP - $40M Available Across 21 Re­search Areas

6 Feb 2025 18:58 UTC
111 points
0 comments1 min readLW link
(www.openphilanthropy.org)

AISN #47: Rea­son­ing Models

6 Feb 2025 18:52 UTC
3 points
0 comments4 min readLW link
(newsletter.safe.ai)

Wild An­i­mal Suffer­ing Is The Worst Thing In The World

Bentham's Bulldog6 Feb 2025 16:15 UTC
26 points
18 comments7 min readLW link

De­tect­ing Strate­gic De­cep­tion Us­ing Lin­ear Probes

6 Feb 2025 15:46 UTC
104 points
9 comments2 min readLW link
(arxiv.org)

AI #102: Made in America

Zvi6 Feb 2025 14:20 UTC
26 points
18 comments67 min readLW link
(thezvi.wordpress.com)

Biol­ogy, Ide­ol­ogy and Violence

Zero Contradictions6 Feb 2025 11:26 UTC
−3 points
5 comments2 min readLW link
(thewaywardaxolotl.blogspot.com)

MATS Ap­pli­ca­tions + Re­search Direc­tions I’m Cur­rently Ex­cited About

Neel Nanda6 Feb 2025 11:03 UTC
73 points
7 comments8 min readLW link

Don’t go bankrupt, don’t go rogue

Nathan Young6 Feb 2025 10:31 UTC
20 points
1 comment7 min readLW link

Vot­ing Re­sults for the 2023 Review

Raemon6 Feb 2025 8:00 UTC
86 points
3 comments69 min readLW link

Chi­canery: No

Screwtape6 Feb 2025 5:42 UTC
31 points
10 comments5 min readLW link

[Question] hyp­no­sis question

KvmanThinking6 Feb 2025 2:41 UTC
3 points
5 comments1 min readLW link

BIDA Cal­en­dar iCal Feed

jefftk6 Feb 2025 1:30 UTC
9 points
0 comments1 min readLW link
(www.jefftk.com)

C’mon guys, De­liber­ate Prac­tice is Real

Raemon5 Feb 2025 22:33 UTC
99 points
25 comments9 min readLW link

The Risk of Grad­ual Disem­pow­er­ment from AI

Zvi5 Feb 2025 22:10 UTC
87 points
20 comments20 min readLW link
(thezvi.wordpress.com)

Wired on: “DOGE per­son­nel with ad­min ac­cess to Fed­eral Pay­ment Sys­tem”

Raemon5 Feb 2025 21:32 UTC
88 points
45 comments2 min readLW link
(web.archive.org)

On AI Scaling

harsimony5 Feb 2025 20:24 UTC
6 points
3 comments8 min readLW link
(splittinginfinity.substack.com)

The State of Metaculus

ChristianWilliams5 Feb 2025 19:17 UTC
21 points
0 comments6 min readLW link
(www.metaculus.com)

Post-hoc rea­son­ing in chain of thought

Kyle Cox5 Feb 2025 18:58 UTC
19 points
0 comments11 min readLW link

Deep­Seek-R1 for Beginners

Anton Razzhigaev5 Feb 2025 18:58 UTC
13 points
0 comments8 min readLW link

Mak­ing the case for av­er­age-case AI Control

Nathaniel Mitrani5 Feb 2025 18:56 UTC
4 points
0 comments5 min readLW link

[Question] Align­ment Para­dox and a Re­quest for Harsh Criticism

Bridgett Kay5 Feb 2025 18:17 UTC
6 points
7 comments1 min readLW link

In­tro­duc­ing In­ter­na­tional AI Gover­nance Alli­ance (IAIGA)

jamesnorris5 Feb 2025 16:02 UTC
7 points
0 comments1 min readLW link

In­tro­duc­ing Col­lec­tive Ac­tion for Ex­is­ten­tial Safety: 80+ ac­tions in­di­vi­d­u­als, or­ga­ni­za­tions, and na­tions can take to im­prove our ex­is­ten­tial safety

jamesnorris5 Feb 2025 16:02 UTC
−9 points
2 comments1 min readLW link

Lan­guage Models Use Tri­gonom­e­try to Do Addition

Subhash Kantamneni5 Feb 2025 13:50 UTC
76 points
1 comment10 min readLW link

De­ploy­ing the Ob­server will save hu­man­ity from ex­is­ten­tial threats

Aram Panasenco5 Feb 2025 10:39 UTC
−11 points
8 comments1 min readLW link

The Do­main of Orthogonality

mgfcatherall5 Feb 2025 8:14 UTC
1 point
0 comments7 min readLW link

Re­view­ing LessWrong: Screw­tape’s Ba­sic Answer

Screwtape5 Feb 2025 4:30 UTC
97 points
18 comments6 min readLW link

[Question] Why isn’t AI con­tain­ment the pri­mary AI safety strat­egy?

Oliver Kuperman5 Feb 2025 3:54 UTC
1 point
3 comments3 min readLW link

Nick Land: Orthogonality

lumpenspace4 Feb 2025 21:07 UTC
5 points
37 comments8 min readLW link

What work­ing on AI safety taught me about B2B SaaS sales

purple fire4 Feb 2025 20:50 UTC
7 points
12 comments5 min readLW link

Sub­jec­tive Nat­u­ral­ism in De­ci­sion The­ory: Sav­age vs. Jeffrey–Bolker

4 Feb 2025 20:34 UTC
45 points
22 comments5 min readLW link

Anti-Slop In­ter­ven­tions?

abramdemski4 Feb 2025 19:50 UTC
76 points
33 comments6 min readLW link

Can Per­sua­sion Break AI Safety? Ex­plor­ing the In­ter­play Between Fine-Tun­ing, At­tacks, and Guardrails

Devina Jain4 Feb 2025 19:10 UTC
9 points
0 comments10 min readLW link

[Question] Jour­nal­ism stu­dent look­ing for sources

pinkerton4 Feb 2025 18:58 UTC
11 points
3 comments1 min readLW link

We’re in Deep Research

Zvi4 Feb 2025 17:20 UTC
45 points
3 comments20 min readLW link
(thezvi.wordpress.com)

The Cap­i­tal­ist Agent

henophilia4 Feb 2025 15:32 UTC
1 point
10 comments3 min readLW link
(blog.hermesloom.org)

Fore­cast­ing AGI: In­sights from Pre­dic­tion Mar­kets and Metaculus

Alvin Ånestrand4 Feb 2025 13:03 UTC
13 points
0 comments4 min readLW link
(forecastingaifutures.substack.com)

Rul­ing Out Lookup Tables

Alfred Harwood4 Feb 2025 10:39 UTC
22 points
11 comments7 min readLW link

Half-baked idea: a straight­for­ward method for learn­ing en­vi­ron­men­tal goals?

Q Home4 Feb 2025 6:56 UTC
16 points
7 comments5 min readLW link

In­for­ma­tion Ver­sus Action

Screwtape4 Feb 2025 5:13 UTC
27 points
0 comments6 min readLW link

Utili­tar­ian AI Align­ment: Build­ing a Mo­ral As­sis­tant with the Con­sti­tu­tional AI Method

Clément L4 Feb 2025 4:15 UTC
6 points
1 comment13 min readLW link

Tear Down the Burren

jefftk4 Feb 2025 3:40 UTC
45 points
2 comments2 min readLW link
(www.jefftk.com)

Con­sti­tu­tional Clas­sifiers: Defend­ing against uni­ver­sal jailbreaks (An­thropic Blog)

Archimedes4 Feb 2025 2:55 UTC
17 points
1 comment1 min readLW link
(www.anthropic.com)

Can some­one, any­one, make su­per­in­tel­li­gence a more con­crete con­cept?

Ori Nagel4 Feb 2025 2:18 UTC
2 points
8 comments5 min readLW link

What are the “no free lunch” the­o­rems?

4 Feb 2025 2:02 UTC
19 points
4 comments1 min readLW link
(aisafety.info)

elimi­nat­ing bias through lan­guage?

KvmanThinking4 Feb 2025 1:52 UTC
1 point
12 comments1 min readLW link

New Fore­sight Longevity Bio & Molec­u­lar Nano Grants Program

Allison Duettmann4 Feb 2025 0:28 UTC
11 points
0 comments1 min readLW link

Meta: Fron­tier AI Framework

Zach Stein-Perlman3 Feb 2025 22:00 UTC
33 points
2 comments1 min readLW link
(ai.meta.com)