Fron­tierMath Score of o3-mini Much Lower Than Claimed

YafahEdelman17 Mar 2025 22:41 UTC
61 points
7 comments1 min readLW link

Proof-of-Con­cept De­bug­ger for a Small LLM

17 Mar 2025 22:27 UTC
27 points
0 comments11 min readLW link

Effec­tively Com­mu­ni­cat­ing with DC Policymakers

PolicyTakes17 Mar 2025 22:11 UTC
14 points
0 comments2 min readLW link

EIS XV: A New Proof of Con­cept for Use­ful Interpretability

scasper17 Mar 2025 20:05 UTC
30 points
2 comments3 min readLW link

Sen­tinel’s Global Risks Weekly Roundup #11/​2025. Trump in­vokes Alien Ene­mies Act, Chi­nese in­va­sion barges de­ployed in ex­er­cise.

NunoSempere17 Mar 2025 19:34 UTC
59 points
3 comments6 min readLW link
(blog.sentinel-team.org)

Claude Son­net 3.7 (of­ten) knows when it’s in al­ign­ment evaluations

17 Mar 2025 19:11 UTC
188 points
9 comments6 min readLW link

Three Types of In­tel­li­gence Explosion

17 Mar 2025 14:47 UTC
40 points
8 comments3 min readLW link
(www.forethought.org)

An Ad­vent of Thought

Kaarel17 Mar 2025 14:21 UTC
57 points
13 comments48 min readLW link

In­ter­ested in work­ing from a new Bos­ton AI Safety Hub?

17 Mar 2025 13:42 UTC
17 points
0 comments2 min readLW link

Other Civ­i­liza­tions Would Re­cover 84+% of Our Cos­mic Re­sources—A Challenge to Ex­tinc­tion Risk Prioritization

Maxime Riché17 Mar 2025 13:12 UTC
5 points
0 comments12 min readLW link

Monthly Roundup #28: March 2025

Zvi17 Mar 2025 12:50 UTC
31 points
8 comments14 min readLW link
(thezvi.wordpress.com)

Are cor­po­ra­tions su­per­in­tel­li­gent?

17 Mar 2025 10:36 UTC
3 points
3 comments1 min readLW link
(aisafety.info)

One pager

samuelshadrach17 Mar 2025 8:12 UTC
6 points
2 comments8 min readLW link
(samuelshadrach.com)

The Case for AI Optimism

Annapurna17 Mar 2025 1:29 UTC
−6 points
1 comment1 min readLW link
(nationalaffairs.com)

Sys­tem­atic run­away-op­ti­miser-like LLM failure modes on Biolog­i­cally and Eco­nom­i­cally al­igned AI safety bench­marks for LLMs with sim­plified ob­ser­va­tion for­mat (BioBlue)

16 Mar 2025 23:23 UTC
45 points
8 comments13 min readLW link

Read More News

utilistrutil16 Mar 2025 21:31 UTC
25 points
2 comments5 min readLW link

What would a post la­bor econ­omy *ac­tu­ally* look like?

Ansh Juneja16 Mar 2025 20:38 UTC
3 points
2 comments17 min readLW link

Why White-Box Redteam­ing Makes Me Feel Weird

Zygi Straznickas16 Mar 2025 18:54 UTC
206 points
36 comments3 min readLW link

How I’ve run ma­jor projects

benkuhn16 Mar 2025 18:40 UTC
127 points
10 comments8 min readLW link
(www.benkuhn.net)

Count­ing Ob­jec­tions to Housing

jefftk16 Mar 2025 18:20 UTC
13 points
7 comments3 min readLW link
(www.jefftk.com)

I make sev­eral mil­lion dol­lars per year and have hun­dreds of thou­sands of fol­low­ers—what is the straight­est line path to uti­liz­ing these re­sources to re­duce ex­is­ten­tial-level AI threats?

shrimpy16 Mar 2025 16:52 UTC
161 points
26 comments1 min readLW link

Sibe­rian Arc­tic ori­gins of East Asian psy­chol­ogy

David Sun16 Mar 2025 16:52 UTC
6 points
0 comments1 min readLW link

AI Model His­tory is Be­ing Lost

Vale16 Mar 2025 12:38 UTC
19 points
1 comment1 min readLW link
(vale.rocks)

Me­tacog­ni­tion Broke My Nail-Bit­ing Habit

Rafka16 Mar 2025 12:36 UTC
45 points
20 comments2 min readLW link

[Question] Can we ever en­sure AI al­ign­ment if we can only test AI per­sonas?

Karl von Wendt16 Mar 2025 8:06 UTC
22 points
8 comments1 min readLW link

Can time prefer­ences make AI safe?

TerriLeaf15 Mar 2025 21:41 UTC
2 points
1 comment2 min readLW link

Help make the orca lan­guage ex­per­i­ment happen

Towards_Keeperhood15 Mar 2025 21:39 UTC
9 points
12 comments5 min readLW link

An­nounc­ing EXP: Ex­per­i­men­tal Sum­mer Work­shop on Col­lec­tive Cognition

15 Mar 2025 20:14 UTC
36 points
2 comments4 min readLW link

AI Self-Cor­rec­tion vs. Self-Reflec­tion: Is There a Fun­da­men­tal Differ­ence?

Project Solon15 Mar 2025 18:24 UTC
−3 points
0 comments1 min readLW link

The Fork in the Road

testingthewaters15 Mar 2025 17:36 UTC
14 points
12 comments2 min readLW link

Any-Benefit Mind­set and Any-Rea­son Reasoning

silentbob15 Mar 2025 17:10 UTC
36 points
9 comments6 min readLW link

deleted

funnyfranco15 Mar 2025 15:24 UTC
−1 points
2 comments1 min readLW link

Paper: Field-build­ing and the epistemic cul­ture of AI safety

peterslattery15 Mar 2025 12:30 UTC
13 points
3 comments3 min readLW link
(firstmonday.org)

deleted

funnyfranco15 Mar 2025 6:08 UTC
8 points
0 comments1 min readLW link

AI Says It’s Not Con­scious. That’s a Bad An­swer to the Wrong Ques­tion.

JohnMarkNorman15 Mar 2025 1:25 UTC
1 point
0 comments2 min readLW link

Re­port & ret­ro­spec­tive on the Dove­tail fellowship

Alex_Altair14 Mar 2025 23:20 UTC
26 points
3 comments9 min readLW link

The Dangers of Out­sourc­ing Think­ing: Los­ing Our Crit­i­cal Think­ing to the Over-Reli­ance on AI De­ci­sion-Making

Cameron Tomé-Moreira14 Mar 2025 23:07 UTC
11 points
4 comments8 min readLW link

LLMs may en­able di­rect democ­racy at scale

Davey Morse14 Mar 2025 22:51 UTC
14 points
20 comments1 min readLW link

2024 Unoffi­cial LessWrong Sur­vey Results

Screwtape14 Mar 2025 22:29 UTC
110 points
28 comments48 min readLW link

AI4Science: The Hid­den Power of Neu­ral Net­works in Scien­tific Discovery

Max Ma14 Mar 2025 21:18 UTC
2 points
2 comments1 min readLW link

What are we do­ing when we do math­e­mat­ics?

epicurus14 Mar 2025 20:54 UTC
7 points
2 comments1 min readLW link
(asving.com)

AI for Epistemics Hackathon

Austin Chen14 Mar 2025 20:46 UTC
76 points
12 comments10 min readLW link
(manifund.substack.com)

Geom­e­try of Fea­tures in Mechanis­tic Interpretability

Gunnar Carlsson14 Mar 2025 19:11 UTC
16 points
0 comments8 min readLW link

AI Tools for Ex­is­ten­tial Security

14 Mar 2025 18:38 UTC
22 points
4 comments11 min readLW link
(www.forethought.org)

deleted

funnyfranco14 Mar 2025 18:14 UTC
−3 points
2 comments1 min readLW link

Minor in­ter­pretabil­ity ex­plo­ra­tion #3: Ex­tend­ing su­per­po­si­tion to differ­ent ac­ti­va­tion func­tions (loss land­scape)

Rareș Baron14 Mar 2025 15:45 UTC
5 points
0 comments3 min readLW link

AI for AI safety

Joe Carlsmith14 Mar 2025 15:00 UTC
79 points
13 comments17 min readLW link
(joecarlsmith.substack.com)

Eval­u­at­ing the ROI of Information

Mr. Keating14 Mar 2025 14:22 UTC
13 points
3 comments3 min readLW link

On MAIM and Su­per­in­tel­li­gence Strategy

Zvi14 Mar 2025 12:30 UTC
53 points
2 comments13 min readLW link
(thezvi.wordpress.com)

Whether gov­ern­ments will con­trol AGI is im­por­tant and neglected

Seth Herd14 Mar 2025 9:48 UTC
28 points
2 comments9 min readLW link