So­cial Dilem­mas — pub­lic goods, free rid­ers, and exploitation

James Stephen Brown5 Mar 2025 23:31 UTC
7 points
0 comments3 min readLW link
(nonzerosum.games)

In­tro­duc­ing MASK: A Bench­mark for Mea­sur­ing Hon­esty in AI Systems

5 Mar 2025 22:56 UTC
37 points
5 comments2 min readLW link
(www.mask-benchmark.ai)

The Hard­ware-Soft­ware Frame­work: A New Per­spec­tive on Eco­nomic Growth with AI

Jakub Growiec5 Mar 2025 19:59 UTC
13 points
0 comments3 min readLW link

NY State Has a New Fron­tier Model Bill (+quick takes)

henryj5 Mar 2025 19:29 UTC
9 points
0 comments1 min readLW link
(www.henryjosephson.com)

The old mem­o­ries tree

Yair Halberstadt5 Mar 2025 19:03 UTC
7 points
1 comment1 min readLW link

Re­ply to Vi­talik on d/​acc

samuelshadrach5 Mar 2025 18:55 UTC
8 points
0 comments3 min readLW link
(samuelshadrach.com)

A Bear Case: My Pre­dic­tions Re­gard­ing AI Progress

Thane Ruthenis5 Mar 2025 16:41 UTC
377 points
163 comments9 min readLW link

On the Ra­tion­al­ity of Deter­ring ASI

Dan H5 Mar 2025 16:11 UTC
171 points
34 comments4 min readLW link
(nationalsecurity.ai)

On OpenAI’s Safety and Align­ment Philosophy

Zvi5 Mar 2025 14:00 UTC
58 points
5 comments17 min readLW link
(thezvi.wordpress.com)

The Align­ment Im­per­a­tive: Act Now or Lose Every­thing

racinkc15 Mar 2025 5:49 UTC
−14 points
0 comments1 min readLW link

Con­tra Dance Pay and Inflation

jefftk5 Mar 2025 2:40 UTC
12 points
0 comments2 min readLW link
(www.jefftk.com)

*NYT Op-Ed* The Govern­ment Knows A.G.I. Is Coming

worse5 Mar 2025 1:53 UTC
11 points
12 comments2 min readLW link
(www.nytimes.com)

Could this be an un­usu­ally good time to Earn To Give?

TomGardiner4 Mar 2025 21:51 UTC
−1 points
0 comments3 min readLW link
(forum.effectivealtruism.org)

What is the best /​ most proper defi­ni­tion of “Feel­ing the AGI” there is?

Annapurna4 Mar 2025 20:13 UTC
8 points
5 comments1 min readLW link

En­ergy Mar­kets Tem­po­ral Ar­bi­trage with Batteries

NickyP4 Mar 2025 17:37 UTC
28 points
3 comments16 min readLW link

Distil­la­tion of Meta’s Large Con­cept Models Paper

NickyP4 Mar 2025 17:33 UTC
19 points
3 comments4 min readLW link

Top AI safety newslet­ters, books, pod­casts, etc – new AISafety.com resource

4 Mar 2025 17:01 UTC
33 points
2 comments1 min readLW link

2028 Should Not Be AI Safety’s First Fo­ray Into Politics

Jesse Richardson4 Mar 2025 16:46 UTC
5 points
0 comments2 min readLW link

[Question] How Much Are LLMs Ac­tu­ally Boost­ing Real-World Pro­gram­mer Pro­duc­tivity?

Thane Ruthenis4 Mar 2025 16:23 UTC
141 points
52 comments3 min readLW link

Val­i­dat­ing against a mis­al­ign­ment de­tec­tor is very differ­ent to train­ing against one

mattmacdermott4 Mar 2025 15:41 UTC
46 points
4 comments4 min readLW link

For schem­ing, we should first fo­cus on de­tec­tion and then on prevention

Marius Hobbhahn4 Mar 2025 15:22 UTC
53 points
7 comments5 min readLW link

Progress links and short notes, 2025-03-03

jasoncrawford4 Mar 2025 15:20 UTC
8 points
0 comments6 min readLW link
(newsletter.rootsofprogress.org)

For­ma­tion Re­search: Or­gani­sa­tion Overview

alamerton4 Mar 2025 15:03 UTC
6 points
0 comments11 min readLW link

On Writ­ing #1

Zvi4 Mar 2025 13:30 UTC
38 points
2 comments15 min readLW link
(thezvi.wordpress.com)

The Semi-Ra­tional Mili­tar Firefighter

P. João4 Mar 2025 12:23 UTC
73 points
10 comments2 min readLW link

Ob­ser­va­tions About LLM In­fer­ence Pricing

Aaron_Scher4 Mar 2025 3:03 UTC
40 points
2 comments9 min readLW link
(techgov.intelligence.org)

[Question] How much should I worry about the At­lanta Fed’s GDP es­ti­mates?

Brendan Long4 Mar 2025 2:03 UTC
16 points
2 comments1 min readLW link

[Question] shouldn’t we try to get me­dia at­ten­tion?

KvmanThinking4 Mar 2025 1:39 UTC
6 points
1 comment1 min readLW link

The Mil­ton Fried­man Model of Policy Change

JohnofCharleston4 Mar 2025 0:38 UTC
152 points
17 comments4 min readLW link

The Com­pli­ment Sand­wich 🥪 aka: How to crit­i­cize a normie with­out mak­ing them up­set.

keltan3 Mar 2025 23:15 UTC
15 points
10 comments1 min readLW link

AI Safety at the Fron­tier: Paper High­lights, Fe­bru­ary ’25

gasteigerjo3 Mar 2025 22:09 UTC
7 points
0 comments7 min readLW link
(aisafetyfrontier.substack.com)

What goals will AIs have? A list of hypotheses

Daniel Kokotajlo3 Mar 2025 20:08 UTC
91 points
20 comments18 min readLW link

Take­aways From Our Re­cent Work on SAE Probing

3 Mar 2025 19:50 UTC
30 points
4 comments5 min readLW link

Why Peo­ple Com­mit White Col­lar Fraud (Ozy linkpost)

sapphire3 Mar 2025 19:33 UTC
24 points
1 comment1 min readLW link
(thingofthings.substack.com)

[Question] Ask Me Any­thing—Samuel

samuelshadrach3 Mar 2025 19:24 UTC
0 points
0 comments1 min readLW link

Ex­pand­ing Har­mBench: In­ves­ti­gat­ing Gaps & Ex­tend­ing Ad­ver­sar­ial LLM Test­ing

racinkc13 Mar 2025 19:23 UTC
1 point
0 comments1 min readLW link

Could Ad­vanced AI Ac­cel­er­ate the Pace of AI Progress? In­ter­views with AI Researchers

3 Mar 2025 19:05 UTC
41 points
1 comment1 min readLW link
(papers.ssrn.com)

Mid­dle School Choice

jefftk3 Mar 2025 16:10 UTC
27 points
10 comments4 min readLW link
(www.jefftk.com)

On GPT-4.5

Zvi3 Mar 2025 13:40 UTC
44 points
12 comments22 min readLW link
(thezvi.wordpress.com)

Co­a­les­cence—Deter­minism In Ways We Care About

vitaliya3 Mar 2025 13:20 UTC
12 points
0 comments11 min readLW link

Meth­ods for strong hu­man germline en­g­ineer­ing

TsviBT3 Mar 2025 8:13 UTC
149 points
29 comments108 min readLW link

[Question] Ex­am­ples of self-fulfilling prophe­cies in AI al­ign­ment?

Chris Lakin3 Mar 2025 2:45 UTC
30 points
13 comments1 min readLW link

[Question] Re­quest for Com­ments on AI-re­lated Pre­dic­tion Mar­ket Ideas

PeterMcCluskey2 Mar 2025 20:52 UTC
17 points
1 comment3 min readLW link

Statis­ti­cal Challenges with Mak­ing Su­per IQ babies

Jan Christian Refsgaard2 Mar 2025 20:26 UTC
154 points
26 comments9 min readLW link

Cau­tions about LLMs in Hu­man Cog­ni­tive Loops

Alice Blair2 Mar 2025 19:53 UTC
42 points
14 comments7 min readLW link

Self-fulfilling mis­al­ign­ment data might be poi­son­ing our AI models

TurnTrout2 Mar 2025 19:51 UTC
162 points
29 comments1 min readLW link
(turntrout.com)

Spencer Green­berg hiring a per­sonal/​pro­fes­sional/​re­search re­mote as­sis­tant for 5-10 hours per week

spencerg2 Mar 2025 18:01 UTC
13 points
0 comments1 min readLW link

[Question] Will LLM agents be­come the first takeover-ca­pa­ble AGIs?

Seth Herd2 Mar 2025 17:15 UTC
37 points
10 comments1 min readLW link

Not-yet-falsifi­able be­liefs?

Benjamin Hendricks2 Mar 2025 14:11 UTC
6 points
4 comments1 min readLW link

Sav­ing Zest

jefftk2 Mar 2025 12:00 UTC
24 points
1 comment1 min readLW link
(www.jefftk.com)