The Stag Hunt—cul­ti­vat­ing co­op­er­a­tion to reap rewards

James Stephen BrownFeb 25, 2025, 11:45 PM
7 points
0 comments4 min readLW link
(nonzerosum.games)

Three Levels for Large Lan­guage Model Cognition

Eleni AngelouFeb 25, 2025, 11:14 PM
21 points
0 comments5 min readLW link

[Cross­post] Strate­gic wealth ac­cu­mu­la­tion un­der trans­for­ma­tive AI expectations

Feb 25, 2025, 9:50 PM
5 points
0 comments17 min readLW link
(forum.effectivealtruism.org)

Emer­gent Misal­ign­ment: Nar­row fine­tun­ing can pro­duce broadly mis­al­igned LLMs

Feb 25, 2025, 5:39 PM
329 points
91 comments4 min readLW link

We Can Build Com­pas­sion­ate AI

Gordon Seidoh WorleyFeb 25, 2025, 4:37 PM
9 points
6 comments4 min readLW link
(uncertainupdates.substack.com)

[Question] In­tel­lec­tual life­hacks repo

Antoine de ScorrailleFeb 25, 2025, 4:32 PM
11 points
15 comments1 min readLW link

Eco­nomics Roundup #5

ZviFeb 25, 2025, 1:40 PM
27 points
10 comments20 min readLW link
(thezvi.wordpress.com)

Mak­ing al­ign­ment a law of the universe

Richard JugginsFeb 25, 2025, 10:44 AM
0 points
3 comments15 min readLW link

Re­vis­it­ing Con­way’s Law

annebrandesFeb 25, 2025, 8:33 AM
12 points
4 comments3 min readLW link

De­mys­tify­ing the Pinoc­chio Paradox

Novak ZukowskiFeb 25, 2025, 6:16 AM
−1 points
0 comments3 min readLW link

Tech­ni­cal com­par­i­son of Deepseek, No­vasky, S1, Helix, P0

JuliezhangggFeb 25, 2025, 4:20 AM
8 points
0 comments5 min readLW link

Up­com­ing Protest for AI Safety

Matt VincentFeb 25, 2025, 3:04 AM
12 points
0 comments1 min readLW link
(www.pauseai-us.org)

what an effi­cient mar­ket feels from inside

DMMFFeb 25, 2025, 2:38 AM
40 points
9 comments6 min readLW link
(danfrank.ca)

Metacompilation

Donald HobsonFeb 24, 2025, 10:58 PM
11 points
1 comment4 min readLW link

The man­i­fest manifesto

dkl9Feb 24, 2025, 10:13 PM
6 points
2 comments2 min readLW link
(dkl9.net)

Credit Suisse col­lapse obfus­cated Par­reaux, Thiébaud & Part­ners scan­dal

pocockFeb 24, 2025, 9:28 PM
3 points
0 comments1 min readLW link
(juristgate.com)

Topolog­i­cal Data Anal­y­sis and Mechanis­tic Interpretability

Gunnar CarlssonFeb 24, 2025, 7:56 PM
16 points
4 comments7 min readLW link

Zizian com­par­i­sons /​ con­nec­tions in the open source & Linux communities

pocockFeb 24, 2025, 7:55 PM
−15 points
0 comments1 min readLW link

Lo­cal Trust

Feb 24, 2025, 7:53 PM
21 points
4 comments5 min readLW link

Na­tion­wide Ac­tion Work­shop: Con­tact Congress about AI safety!

Felix De SimoneFeb 24, 2025, 7:36 PM
7 points
0 comments1 min readLW link

An­thropic re­leases Claude 3.7 Son­net with ex­tended think­ing mode

LawrenceCFeb 24, 2025, 7:32 PM
88 points
8 comments4 min readLW link
(www.anthropic.com)

Train­ing AI to do al­ign­ment re­search we don’t already know how to do

joshcFeb 24, 2025, 7:19 PM
45 points
23 comments7 min readLW link

Con­fer­ence Re­port: Thresh­old 2030 - Model­ing AI Eco­nomic Futures

Feb 24, 2025, 6:56 PM
51 points
0 comments10 min readLW link
(www.convergenceanalysis.org)

Eval­u­at­ing “What 2026 Looks Like” So Far

Jonny SpicerFeb 24, 2025, 6:55 PM
77 points
5 comments7 min readLW link

Su­per­in­tel­li­gent Agents Pose Catas­trophic Risks: Can Scien­tist AI Offer a Safer Path?

Feb 24, 2025, 6:31 PM
44 points
15 comments11 min readLW link

Un­der­stand­ing Agent Preferences

martinkunevFeb 24, 2025, 5:46 PM
6 points
2 comments14 min readLW link

What We Can Do to Prevent Ex­tinc­tion by AI

Joe RogeroFeb 24, 2025, 5:15 PM
12 points
0 commentsLW link

Dream, Truth, & Good

abramdemskiFeb 24, 2025, 4:59 PM
50 points
11 comments4 min readLW link

Fore­cast­ing Fron­tier Lan­guage Model Agent Capabilities

Feb 24, 2025, 4:51 PM
35 points
0 comments5 min readLW link
(www.apolloresearch.ai)

A City Within a City

Declan MolonyFeb 24, 2025, 3:51 PM
48 points
1 comment7 min readLW link

Grok Grok

ZviFeb 24, 2025, 2:20 PM
36 points
2 comments19 min readLW link
(thezvi.wordpress.com)

if you’re not happy sin­gle, you won’t be happy immortal

daijinFeb 24, 2025, 1:23 PM
2 points
1 comment1 min readLW link

[NSFW] The Fuzzy Hand­cuffs of Liberation

lsusrFeb 24, 2025, 1:05 PM
27 points
11 comments2 min readLW link

Day­ton, Ohio, HPMOR 10 year An­niver­sary meetup

LunawarriorFeb 24, 2025, 12:55 PM
1 point
0 comments1 min readLW link

An Alter­nate His­tory of the Fu­ture, 2025-2040

Mr BeastlyFeb 24, 2025, 5:53 AM
3 points
5 comments10 min readLW link

Ex­port Surplusses

lsusrFeb 24, 2025, 5:53 AM
24 points
21 comments3 min readLW link

AI al­ign­ment for men­tal health supports

hiki_tFeb 24, 2025, 4:21 AM
1 point
1 comment1 min readLW link

The GDM AGI Safety+Align­ment Team is Hiring for Ap­plied In­ter­pretabil­ity Research

Feb 24, 2025, 2:17 AM
48 points
1 comment7 min readLW link

Poll on AI opinions.

Niclas KupperFeb 23, 2025, 10:39 PM
1 point
2 comments1 min readLW link

The Geom­e­try of Lin­ear Re­gres­sion ver­sus PCA

criticalpointsFeb 23, 2025, 9:01 PM
20 points
7 comments6 min readLW link
(eregis.github.io)

Judge­ments: Merg­ing Pre­dic­tion & Evidence

abramdemskiFeb 23, 2025, 7:35 PM
103 points
5 comments6 min readLW link

In­tel­li­gence as Priv­ilege Escalation

Cole WyethFeb 23, 2025, 7:31 PM
28 points
0 comments5 min readLW link

[Question] Have LLMs Gen­er­ated Novel In­sights?

23 Feb 2025 18:22 UTC
159 points
41 comments2 min readLW link

The case for cor­po­ral punishment

Yair Halberstadt23 Feb 2025 15:05 UTC
27 points
4 comments2 min readLW link

Reflec­tions on the state of the race to su­per­in­tel­li­gence, Fe­bru­ary 2025

Mitchell_Porter23 Feb 2025 13:58 UTC
21 points
7 comments4 min readLW link

List of most in­ter­est­ing ideas I en­coun­tered in my life, ranked

Lucien23 Feb 2025 12:36 UTC
21 points
6 comments1 min readLW link

Test of the Bene Gesserit

lsusr23 Feb 2025 11:51 UTC
19 points
3 comments3 min readLW link

Mo­ral gauge the­ory: A spec­u­la­tive sug­ges­tion for AI alignment

James Diacoumis23 Feb 2025 11:42 UTC
6 points
2 comments8 min readLW link

[Question] Does hu­man (mis)al­ign­ment pose a sig­nifi­cant and im­mi­nent ex­is­ten­tial threat?

jr23 Feb 2025 10:03 UTC
6 points
3 comments1 min readLW link

Deep sparse au­toen­coders yield in­ter­pretable fea­tures too

Armaan A. Abraham23 Feb 2025 5:46 UTC
29 points
8 comments8 min readLW link