Why Can’t We Hy­poth­e­size After the Fact?

David Udell26 Feb 2025 22:41 UTC
40 points
3 comments2 min readLW link

“AI Rapidly Gets Smarter, And Makes Some of Us Dum­ber,” from Sabine Hossenfelder

Evan_Gaensbauer26 Feb 2025 22:33 UTC
4 points
9 comments2 min readLW link
(youtu.be)

METR: AI mod­els can be dan­ger­ous be­fore pub­lic deployment

UnofficialLinkpostBot26 Feb 2025 20:19 UTC
16 points
0 comments3 min readLW link
(metr.org)

Rep­re­sen­ta­tion Eng­ineer­ing has Its Prob­lems, but None Seem Unsolvable

Lukasz G Bartoszcze26 Feb 2025 19:53 UTC
15 points
1 comment3 min readLW link

Thoughts that prompt good fore­casts: A survey

Daniel_Friedrich26 Feb 2025 18:36 UTC
1 point
0 comments1 min readLW link

The non-tribal tribes

PatrickDFarley26 Feb 2025 17:22 UTC
24 points
4 comments16 min readLW link

SAE Train­ing Dataset In­fluence in Fea­ture Match­ing and a Hy­poth­e­sis on Po­si­tion Features

Seonglae Cho26 Feb 2025 17:05 UTC
4 points
3 comments17 min readLW link

Fuzzing LLMs some­times makes them re­veal their secrets

Fabien Roger26 Feb 2025 16:48 UTC
64 points
13 comments9 min readLW link

You can just wear a suit

lsusr26 Feb 2025 14:57 UTC
131 points
53 comments2 min readLW link

Matthew Ygle­sias—Mis­in­for­ma­tion Mostly Con­fuses Your Own Side

Siebe26 Feb 2025 14:55 UTC
10 points
1 comment1 min readLW link
(www.slowboring.com)

Op­ti­miz­ing Feed­back to Learn Faster

Towards_Keeperhood26 Feb 2025 14:24 UTC
12 points
0 comments2 min readLW link

out­lin­ing is a his­tor­i­cally re­cent un­der­uti­lized gift to family

daijin26 Feb 2025 13:58 UTC
4 points
2 comments3 min readLW link

Osaka

lsusr26 Feb 2025 13:50 UTC
76 points
11 comments1 min readLW link

Time to Wel­come Claude 3.7

Zvi26 Feb 2025 13:00 UTC
49 points
2 comments24 min readLW link
(thezvi.wordpress.com)

[PAPER] Ja­co­bian Sparse Au­toen­coders: Spar­sify Com­pu­ta­tions, Not Just Activations

Lucy Farnik26 Feb 2025 12:50 UTC
79 points
8 comments7 min readLW link

Minor in­ter­pretabil­ity ex­plo­ra­tion #1: Grokking of mod­u­lar ad­di­tion, sub­trac­tion, mul­ti­pli­ca­tion, for differ­ent ac­ti­va­tion functions

Rareș Baron26 Feb 2025 11:35 UTC
5 points
13 comments4 min readLW link

[Question] Name for Stan­dard AI Caveat?

yrimon26 Feb 2025 7:07 UTC
6 points
5 comments1 min readLW link

Levels of anal­y­sis for think­ing about agency

Cole Wyeth26 Feb 2025 4:24 UTC
11 points
0 comments7 min readLW link

The Stag Hunt—cul­ti­vat­ing co­op­er­a­tion to reap rewards

James Stephen Brown25 Feb 2025 23:45 UTC
7 points
0 comments4 min readLW link
(nonzerosum.games)

Three Levels for Large Lan­guage Model Cognition

Eleni Angelou25 Feb 2025 23:14 UTC
21 points
0 comments5 min readLW link

[Cross­post] Strate­gic wealth ac­cu­mu­la­tion un­der trans­for­ma­tive AI expectations

25 Feb 2025 21:50 UTC
5 points
0 comments17 min readLW link
(forum.effectivealtruism.org)

Emer­gent Misal­ign­ment: Nar­row fine­tun­ing can pro­duce broadly mis­al­igned LLMs

25 Feb 2025 17:39 UTC
330 points
92 comments4 min readLW link

We Can Build Com­pas­sion­ate AI

Gordon Seidoh Worley25 Feb 2025 16:37 UTC
9 points
6 comments4 min readLW link
(uncertainupdates.substack.com)

[Question] In­tel­lec­tual life­hacks repo

Antoine de Scorraille25 Feb 2025 16:32 UTC
11 points
15 comments1 min readLW link

Eco­nomics Roundup #5

Zvi25 Feb 2025 13:40 UTC
27 points
10 comments20 min readLW link
(thezvi.wordpress.com)

Mak­ing al­ign­ment a law of the universe

Richard Juggins25 Feb 2025 10:44 UTC
0 points
3 comments15 min readLW link

Re­vis­it­ing Con­way’s Law

annebrandes25 Feb 2025 8:33 UTC
13 points
4 comments2 min readLW link

De­mys­tify­ing the Pinoc­chio Paradox

Novak Zukowski25 Feb 2025 6:16 UTC
−1 points
0 comments3 min readLW link

Tech­ni­cal com­par­i­son of Deepseek, No­vasky, S1, Helix, P0

Juliezhanggg25 Feb 2025 4:20 UTC
8 points
0 comments5 min readLW link

Up­com­ing Protest for AI Safety

Matt Vincent25 Feb 2025 3:04 UTC
12 points
0 comments1 min readLW link
(www.pauseai-us.org)

what an effi­cient mar­ket feels from inside

DMMF25 Feb 2025 2:38 UTC
41 points
9 comments6 min readLW link
(danfrank.ca)

Metacompilation

Donald Hobson24 Feb 2025 22:58 UTC
11 points
1 comment4 min readLW link

The man­i­fest manifesto

dkl924 Feb 2025 22:13 UTC
6 points
2 comments2 min readLW link
(dkl9.net)

Credit Suisse col­lapse obfus­cated Par­reaux, Thiébaud & Part­ners scan­dal

pocock24 Feb 2025 21:28 UTC
3 points
0 comments1 min readLW link
(juristgate.com)

Topolog­i­cal Data Anal­y­sis and Mechanis­tic Interpretability

Gunnar Carlsson24 Feb 2025 19:56 UTC
16 points
4 comments7 min readLW link

Zizian com­par­i­sons /​ con­nec­tions in the open source & Linux communities

pocock24 Feb 2025 19:55 UTC
−15 points
0 comments1 min readLW link

Lo­cal Trust

24 Feb 2025 19:53 UTC
21 points
4 comments5 min readLW link

Na­tion­wide Ac­tion Work­shop: Con­tact Congress about AI safety!

Felix De Simone24 Feb 2025 19:36 UTC
7 points
0 comments1 min readLW link

An­thropic re­leases Claude 3.7 Son­net with ex­tended think­ing mode

LawrenceC24 Feb 2025 19:32 UTC
88 points
8 comments4 min readLW link
(www.anthropic.com)

Train­ing AI to do al­ign­ment re­search we don’t already know how to do

joshc24 Feb 2025 19:19 UTC
45 points
24 comments7 min readLW link

Con­fer­ence Re­port: Thresh­old 2030 - Model­ing AI Eco­nomic Futures

24 Feb 2025 18:56 UTC
52 points
0 comments10 min readLW link
(www.convergenceanalysis.org)

Eval­u­at­ing “What 2026 Looks Like” So Far

Jonny Spicer24 Feb 2025 18:55 UTC
78 points
6 comments7 min readLW link

Su­per­in­tel­li­gent Agents Pose Catas­trophic Risks: Can Scien­tist AI Offer a Safer Path?

24 Feb 2025 18:31 UTC
44 points
15 comments11 min readLW link

Un­der­stand­ing Agent Preferences

martinkunev24 Feb 2025 17:46 UTC
6 points
2 comments14 min readLW link

What We Can Do to Prevent Ex­tinc­tion by AI

Joe Rogero24 Feb 2025 17:15 UTC
12 points
0 comments11 min readLW link

Dream, Truth, & Good

abramdemski24 Feb 2025 16:59 UTC
50 points
11 comments4 min readLW link

Fore­cast­ing Fron­tier Lan­guage Model Agent Capabilities

24 Feb 2025 16:51 UTC
35 points
0 comments5 min readLW link
(www.apolloresearch.ai)

A City Within a City

Declan Molony24 Feb 2025 15:51 UTC
50 points
1 comment7 min readLW link

Grok Grok

Zvi24 Feb 2025 14:20 UTC
36 points
2 comments19 min readLW link
(thezvi.wordpress.com)

if you’re not happy sin­gle, you won’t be happy immortal

daijin24 Feb 2025 13:23 UTC
2 points
1 comment1 min readLW link