[New Jersey] HPMOR 10 Year An­niver­sary Party 🎉

🟠UnlimitedOranges🟠Feb 27, 2025, 10:30 PM
4 points
0 comments1 min readLW link

OpenAI re­leases GPT-4.5

Seth HerdFeb 27, 2025, 9:40 PM
34 points
12 comments3 min readLW link
(openai.com)

The Elic­i­ta­tion Game: Eval­u­at­ing ca­pa­bil­ity elic­i­ta­tion techniques

Feb 27, 2025, 8:33 PM
10 points
0 comments2 min readLW link

For the Sake of Plea­sure Alone

Greenless MirrorFeb 27, 2025, 8:07 PM
4 points
14 comments12 min readLW link

Keep­ing AI Subor­di­nate to Hu­man Thought: A Pro­posal for Public AI Conversations

syhFeb 27, 2025, 8:00 PM
−1 points
0 comments1 min readLW link
(medium.com)

How to Corner Liars: A Mi­asma-Clear­ing Protocol

ymeskhoutFeb 27, 2025, 5:18 PM
62 points
23 comments7 min readLW link
(www.ymeskhout.com)

Eco­nomic Topol­ogy, ASI, and the Sepa­ra­tion Equilibrium

mkualquieraFeb 27, 2025, 4:36 PM
2 points
11 comments6 min readLW link

The Illu­sion of Iter­a­tive Im­prove­ment: Why AI (and Hu­mans) Fail to Track Their Own Epistemic Drift

Andy E WilliamsFeb 27, 2025, 4:26 PM
1 point
3 comments4 min readLW link

AI #105: Hey There Alexa

ZviFeb 27, 2025, 2:30 PM
31 points
3 comments40 min readLW link
(thezvi.wordpress.com)

Space-Far­ing Civ­i­liza­tion den­sity es­ti­mates and mod­els—Review

Maxime RichéFeb 27, 2025, 11:44 AM
20 points
0 comments12 min readLW link

Mar­ket Cap­i­tal­iza­tion is Se­man­ti­cally Invalid

Zero ContradictionsFeb 27, 2025, 11:27 AM
3 points
14 comments3 min readLW link
(thewaywardaxolotl.blogspot.com)

Propos­ing Hu­man Sur­vival Strat­egy based on the NAIA Vi­sion: Toward the Co-evolu­tion of Di­verse Intelligences

Hiroshi YamakawaFeb 27, 2025, 5:18 AM
−2 points
0 comments11 min readLW link

Short & long term trade­offs of strate­gic vot­ing

kalebFeb 27, 2025, 4:25 AM
2 points
0 comments8 min readLW link

Re­cur­sive al­ign­ment with the prin­ci­ple of alignment

hiveFeb 27, 2025, 2:34 AM
9 points
1 comment15 min readLW link
(hiveism.substack.com)

Kingfisher Tour Fe­bru­ary 2025

jefftkFeb 27, 2025, 2:20 AM
9 points
0 comments4 min readLW link
(www.jefftk.com)

You should use Con­sumer Reports

KvmanThinkingFeb 27, 2025, 1:52 AM
7 points
5 comments1 min readLW link

Univer­sal AI Max­i­mizes Vari­a­tional Em­pow­er­ment: New In­sights into AGI Safety

Yusuke HayashiFeb 27, 2025, 12:46 AM
7 points
0 comments4 min readLW link

Why Can’t We Hy­poth­e­size After the Fact?

David UdellFeb 26, 2025, 10:41 PM
40 points
3 comments2 min readLW link

“AI Rapidly Gets Smarter, And Makes Some of Us Dum­ber,” from Sabine Hossenfelder

Evan_GaensbauerFeb 26, 2025, 10:33 PM
4 points
9 comments2 min readLW link
(youtu.be)

METR: AI mod­els can be dan­ger­ous be­fore pub­lic deployment

UnofficialLinkpostBotFeb 26, 2025, 8:19 PM
16 points
0 comments3 min readLW link
(metr.org)

Rep­re­sen­ta­tion Eng­ineer­ing has Its Prob­lems, but None Seem Unsolvable

Lukasz G BartoszczeFeb 26, 2025, 7:53 PM
15 points
1 comment3 min readLW link

Thoughts that prompt good fore­casts: A survey

Daniel_FriedrichFeb 26, 2025, 6:36 PM
1 point
0 comments1 min readLW link

The non-tribal tribes

PatrickDFarleyFeb 26, 2025, 5:22 PM
24 points
4 comments16 min readLW link

SAE Train­ing Dataset In­fluence in Fea­ture Match­ing and a Hy­poth­e­sis on Po­si­tion Features

Seonglae ChoFeb 26, 2025, 5:05 PM
4 points
3 comments17 min readLW link

Fuzzing LLMs some­times makes them re­veal their secrets

Fabien RogerFeb 26, 2025, 4:48 PM
62 points
13 comments9 min readLW link

You can just wear a suit

lsusrFeb 26, 2025, 2:57 PM
111 points
48 comments2 min readLW link

Matthew Ygle­sias—Mis­in­for­ma­tion Mostly Con­fuses Your Own Side

SiebeFeb 26, 2025, 2:55 PM
10 points
1 comment1 min readLW link
(www.slowboring.com)

Op­ti­miz­ing Feed­back to Learn Faster

Towards_KeeperhoodFeb 26, 2025, 2:24 PM
12 points
0 comments2 min readLW link

out­lin­ing is a his­tor­i­cally re­cent un­der­uti­lized gift to family

daijinFeb 26, 2025, 1:58 PM
4 points
2 comments3 min readLW link

Osaka

lsusrFeb 26, 2025, 1:50 PM
73 points
11 comments1 min readLW link

Time to Wel­come Claude 3.7

ZviFeb 26, 2025, 1:00 PM
49 points
2 comments24 min readLW link
(thezvi.wordpress.com)

[PAPER] Ja­co­bian Sparse Au­toen­coders: Spar­sify Com­pu­ta­tions, Not Just Activations

Lucy FarnikFeb 26, 2025, 12:50 PM
79 points
8 comments7 min readLW link

Minor in­ter­pretabil­ity ex­plo­ra­tion #1: Grokking of mod­u­lar ad­di­tion, sub­trac­tion, mul­ti­pli­ca­tion, for differ­ent ac­ti­va­tion functions

Rareș BaronFeb 26, 2025, 11:35 AM
3 points
13 comments4 min readLW link

[Question] Name for Stan­dard AI Caveat?

yrimonFeb 26, 2025, 7:07 AM
6 points
5 comments1 min readLW link

Levels of anal­y­sis for think­ing about agency

Cole WyethFeb 26, 2025, 4:24 AM
11 points
0 comments7 min readLW link

The Stag Hunt—cul­ti­vat­ing co­op­er­a­tion to reap rewards

James Stephen BrownFeb 25, 2025, 11:45 PM
7 points
0 comments4 min readLW link
(nonzerosum.games)

Three Levels for Large Lan­guage Model Cognition

Eleni AngelouFeb 25, 2025, 11:14 PM
21 points
0 comments5 min readLW link

[Cross­post] Strate­gic wealth ac­cu­mu­la­tion un­der trans­for­ma­tive AI expectations

Feb 25, 2025, 9:50 PM
5 points
0 comments17 min readLW link
(forum.effectivealtruism.org)

Emer­gent Misal­ign­ment: Nar­row fine­tun­ing can pro­duce broadly mis­al­igned LLMs

Feb 25, 2025, 5:39 PM
329 points
91 comments4 min readLW link

We Can Build Com­pas­sion­ate AI

Gordon Seidoh WorleyFeb 25, 2025, 4:37 PM
9 points
6 comments4 min readLW link
(uncertainupdates.substack.com)

[Question] In­tel­lec­tual life­hacks repo

Antoine de ScorrailleFeb 25, 2025, 4:32 PM
11 points
15 comments1 min readLW link

Eco­nomics Roundup #5

ZviFeb 25, 2025, 1:40 PM
27 points
10 comments20 min readLW link
(thezvi.wordpress.com)

Mak­ing al­ign­ment a law of the universe

Richard JugginsFeb 25, 2025, 10:44 AM
0 points
3 comments15 min readLW link

Re­vis­it­ing Con­way’s Law

annebrandesFeb 25, 2025, 8:33 AM
12 points
4 comments3 min readLW link

De­mys­tify­ing the Pinoc­chio Paradox

Novak ZukowskiFeb 25, 2025, 6:16 AM
−1 points
0 comments3 min readLW link

Tech­ni­cal com­par­i­son of Deepseek, No­vasky, S1, Helix, P0

JuliezhangggFeb 25, 2025, 4:20 AM
8 points
0 comments5 min readLW link

Up­com­ing Protest for AI Safety

Matt VincentFeb 25, 2025, 3:04 AM
12 points
0 comments1 min readLW link
(www.pauseai-us.org)

what an effi­cient mar­ket feels from inside

DMMFFeb 25, 2025, 2:38 AM
40 points
9 comments6 min readLW link
(danfrank.ca)

Metacompilation

Donald HobsonFeb 24, 2025, 10:58 PM
11 points
1 comment4 min readLW link

The man­i­fest manifesto

dkl9Feb 24, 2025, 10:13 PM
6 points
2 comments2 min readLW link
(dkl9.net)