A Plat­form for Falsifi­able Con­jec­tures and Public Re­fu­ta­tion — Would This Be Use­ful?

PetrusNonius8 Apr 2025 21:09 UTC
1 point
1 comment1 min readLW link

Quan­tify­ing SAE Qual­ity with Fea­ture Steer­abil­ity Metrics

phenomanon8 Apr 2025 20:55 UTC
2 points
0 comments4 min readLW link

MATS is hiring!

8 Apr 2025 20:45 UTC
8 points
0 comments6 min readLW link

birds and mam­mals in­de­pen­dently evolved intelligence

bhauth8 Apr 2025 20:00 UTC
73 points
23 comments1 min readLW link
(www.quantamagazine.org)

Align­ment Fak­ing Re­vis­ited: Im­proved Clas­sifiers and Open Source Extensions

8 Apr 2025 17:32 UTC
147 points
20 comments12 min readLW link

Think­ing Machines

Knight Lee8 Apr 2025 17:27 UTC
3 points
0 comments6 min readLW link

Digi­tal Er­ror Cor­rec­tion and Lock-In

alamerton8 Apr 2025 15:46 UTC
1 point
0 comments5 min readLW link
(alfielamerton.substack.com)

[Question] What faith­ful­ness met­rics should gen­eral claims about CoT faith­ful­ness be based upon?

Rauno Arike8 Apr 2025 15:27 UTC
24 points
0 comments4 min readLW link

AI 2027: Responses

Zvi8 Apr 2025 12:50 UTC
111 points
3 comments30 min readLW link
(thezvi.wordpress.com)

The first AI war will be in your com­puter

Viliam8 Apr 2025 9:28 UTC
43 points
10 comments3 min readLW link

Who wants to bet me $25k at 1:7 odds that there won’t be an AI mar­ket crash in the next year?

Remmelt8 Apr 2025 8:31 UTC
25 points
19 comments1 min readLW link

Re­think­ing Fric­tion: Equity and Mo­ti­va­tion Across Domains

eltimbalino8 Apr 2025 3:58 UTC
−1 points
0 comments2 min readLW link
(www.lesswrong.com)

On differ­ent dis­cus­sion traditions

Eugene Shcherbinin7 Apr 2025 23:00 UTC
1 point
0 comments2 min readLW link

Log-lin­ear Scal­ing is Worth the Cost due to Gains in Long-Hori­zon Tasks

shash427 Apr 2025 21:50 UTC
16 points
2 comments1 min readLW link

AI Safety at the Fron­tier: Paper High­lights, March ’25

gasteigerjo7 Apr 2025 20:17 UTC
9 points
0 comments9 min readLW link
(aisafetyfrontier.substack.com)

Fac­tory farm­ing in­tel­li­gent minds

Odd anon7 Apr 2025 20:05 UTC
5 points
6 comments20 min readLW link

What al­ign­ment-rele­vant abil­ities might Ter­ence Tao lack?

Towards_Keeperhood7 Apr 2025 19:44 UTC
13 points
2 comments3 min readLW link

[Question] Are there any (semi-)de­tailed fu­ture sce­nar­ios where we win?

Jan Betley7 Apr 2025 19:13 UTC
15 points
3 comments1 min readLW link

Austin Chen on Win­ning, Risk-Tak­ing, and FTX

Elizabeth7 Apr 2025 19:00 UTC
35 points
3 comments1 min readLW link
(acesounderglass.com)

deleted

funnyfranco7 Apr 2025 18:56 UTC
−24 points
11 comments1 min readLW link

Amer­i­can Col­lege Ad­mis­sions Doesn’t Need to Be So Com­pet­i­tive

Arjun Panickssery7 Apr 2025 17:35 UTC
48 points
20 comments6 min readLW link
(arjunpanickssery.substack.com)

Cou­pling for Decouplers

Jacob Falkovich7 Apr 2025 15:40 UTC
16 points
3 comments8 min readLW link

Moon­light Reflected

Jacob Falkovich7 Apr 2025 15:35 UTC
11 points
0 comments9 min readLW link

Nav­i­ga­tion by Moonlight

Jacob Falkovich7 Apr 2025 15:32 UTC
24 points
39 comments8 min readLW link

You Are Not a Thought Experiment

Jacob Falkovich7 Apr 2025 15:27 UTC
5 points
0 comments9 min readLW link

Love is Love, Science is Fake

Jacob Falkovich7 Apr 2025 15:19 UTC
17 points
2 comments10 min readLW link

Cou­pling for De­cou­plers — Intro

Jacob Falkovich7 Apr 2025 15:12 UTC
9 points
0 comments1 min readLW link

The world ac­cord­ing to ChatGPT

Richard_Kennaway7 Apr 2025 13:44 UTC
11 points
0 comments2 min readLW link

AI 2027: Dwarkesh’s Pod­cast with Daniel Koko­ta­jlo and Scott Alexander

Zvi7 Apr 2025 13:40 UTC
67 points
2 comments26 min readLW link
(thezvi.wordpress.com)

Ar­gu­ing all sides with ChatGPT 4.5

Richard_Kennaway7 Apr 2025 13:10 UTC
6 points
0 comments8 min readLW link

The Same Heaven

Lukas Petersson7 Apr 2025 12:57 UTC
7 points
1 comment5 min readLW link
(lukaspetersson.com)

TAMing The Align­ment Problem

JasonB7 Apr 2025 8:47 UTC
11 points
2 comments11 min readLW link

Well-found­ed­ness as an or­ga­niz­ing prin­ci­ple of healthy minds and societies

Richard_Ngo7 Apr 2025 0:31 UTC
35 points
7 comments6 min readLW link
(www.mindthefuture.info)

Arusha Per­pet­ual Chicken—an un­likely iter­ated game

James Stephen Brown6 Apr 2025 22:56 UTC
15 points
1 comment5 min readLW link
(nonzerosum.games)

How Gay is the Vat­i­can?

rba6 Apr 2025 21:27 UTC
63 points
34 comments7 min readLW link

Aus­tralia’s AI Cross­roads: Elec­tion 2025 Town Hall

Peter Horniak6 Apr 2025 21:17 UTC
1 point
0 comments1 min readLW link

The Lizard­man and the Black Hat Bobcat

Screwtape6 Apr 2025 19:02 UTC
109 points
15 comments9 min readLW link

Would this solve the (outer) al­ign­ment prob­lem, or at least help?

Wes R6 Apr 2025 18:49 UTC
−2 points
1 comment13 min readLW link

[Question] What are the fun­da­men­tal differ­ences be­tween teach­ing the AIs and hu­mans?

StanislavKrym6 Apr 2025 18:17 UTC
3 points
0 comments1 min readLW link

An “Op­ti­mistic” 2027 Timeline

Yitz6 Apr 2025 16:39 UTC
13 points
16 comments9 min readLW link

Thoughts on Creat­ing a Good Language

Towards_Keeperhood6 Apr 2025 15:57 UTC
1 point
2 comments7 min readLW link

The REPHRASE Cir­cuit: How Fine-Tun­ing En­hances LLMs to REPHRASE Text

Karthik Viswanathan6 Apr 2025 15:02 UTC
4 points
0 comments5 min readLW link

[Re­search sprint] Sin­gle-model cross­coder fea­ture ab­la­tion and steering

Thomas Read6 Apr 2025 14:42 UTC
10 points
0 comments12 min readLW link

Fer­rer, Pilar, and Me

Askwho6 Apr 2025 11:22 UTC
21 points
1 comment4 min readLW link
(open.substack.com)

FlexChunk: En­abling 100M×100M Out-of-Core SpMV (~1.8 min, ~1.7 GB RAM) with Near-Lin­ear Scaling

Daniil Strizhov6 Apr 2025 5:27 UTC
1 point
0 comments7 min readLW link

A col­lec­tion of ap­proaches to con­fronting doom, and my thoughts on them

Ruby6 Apr 2025 2:11 UTC
48 points
18 comments12 min readLW link

A Slow Guide to Con­fronting Doom

Ruby6 Apr 2025 2:10 UTC
86 points
20 comments14 min readLW link

[Linkpost] Vi­sual roadmap to strong hu­man germline engineering

TsviBT5 Apr 2025 22:22 UTC
30 points
0 comments1 min readLW link

Google Deep­Mind: An Ap­proach to Tech­ni­cal AGI Safety and Security

Rohin Shah5 Apr 2025 22:00 UTC
73 points
12 comments18 min readLW link
(arxiv.org)

In­tro­duc­tion to Rep­re­sent­ing Sen­tences as Log­i­cal Statements

Towards_Keeperhood5 Apr 2025 20:35 UTC
33 points
10 comments16 min readLW link