Out­law Code

scarcegreengrassJan 30, 2025, 11:41 PM
10 points
1 comment2 min readLW link

Can some­one, any­one, make su­per­in­tel­li­gence a more con­crete con­cept?

Ori NagelJan 30, 2025, 11:25 PM
3 points
6 comments4 min readLW link

Up­com­ing Neu­ro­science Work­shop—Func­tion­al­iz­ing Brain Data, Ground-Truthing, and the Role of Ar­tifi­cial Data in Ad­vanc­ing Neuroscience

Devin WardJan 30, 2025, 11:02 PM
1 point
0 comments1 min readLW link

What’s Be­hind the SynBio Bust?

sarahconstantinJan 30, 2025, 10:30 PM
55 points
8 comments6 min readLW link
(sarahconstantin.substack.com)

The fu­ture of hu­man­ity is in management

jasoncrawfordJan 30, 2025, 10:14 PM
2 points
5 comments13 min readLW link
(newsletter.rootsofprogress.org)

[Trans­la­tion] AI Gen­er­ated Fake News is Tak­ing Over my Fam­ily Group Chat

mushroomsoupJan 30, 2025, 8:24 PM
3 points
0 comments6 min readLW link

A sketch of an AI con­trol safety case

Jan 30, 2025, 5:28 PM
57 points
0 comments5 min readLW link

Grad­ual Disem­pow­er­ment: Sys­temic Ex­is­ten­tial Risks from In­cre­men­tal AI Development

Jan 30, 2025, 5:03 PM
163 points
58 comments2 min readLW link
(gradual-disempowerment.ai)

[Question] Im­pli­ca­tion of Un­com­putable Problems

Nathan1123Jan 30, 2025, 4:48 PM
−3 points
3 comments1 min readLW link

Hello World

Charlie SandersJan 30, 2025, 3:33 PM
6 points
0 comments2 min readLW link
(www.dailymicrofiction.com)

In­tro­duc­ing the Coal­i­tion for a Baruch Plan for AI: A Call for a Rad­i­cal Treaty-Mak­ing pro­cess for the Global Gover­nance of AI

rguerreschiJan 30, 2025, 3:26 PM
11 points
0 comments2 min readLW link

AI #101: The Shal­low End

ZviJan 30, 2025, 2:50 PM
39 points
1 comment59 min readLW link
(thezvi.wordpress.com)

Me­moriza­tion-gen­er­al­iza­tion in practice

Dmitry VaintrobJan 30, 2025, 2:10 PM
7 points
1 comment4 min readLW link

ARENA 5.0 - Call for Applicants

Jan 30, 2025, 1:18 PM
35 points
2 comments6 min readLW link

You should read Hobbes, Locke, Hume, and Mill via Ear­lyModernTexts.com

Arjun PanicksseryJan 30, 2025, 12:35 PM
51 points
3 comments3 min readLW link
(arjunpanickssery.substack.com)

[Question] Should you pub­lish solu­tions to cor­rigi­bil­ity?

rvnntJan 30, 2025, 11:52 AM
13 points
13 comments1 min readLW link

Tether­ware #1: The case for hu­man­like AI with free will

Jáchym FibírJan 30, 2025, 10:58 AM
5 points
14 comments10 min readLW link
(tetherware.substack.com)

A High Level Closed-Door Ses­sion Dis­cussing Deep­Seek: Vi­sion Trumps Technology

Cosmia_NebulaJan 30, 2025, 9:53 AM
30 points
1 comment8 min readLW link
(rentry.co)

Are we the Wolves now? Hu­man Eu­gen­ics un­der AI Control

BritJan 30, 2025, 8:31 AM
−1 points
2 comments2 min readLW link

[Question] Why not train rea­son­ing mod­els with RLHF?

Caleb BiddulphJan 30, 2025, 7:58 AM
4 points
4 comments1 min readLW link

The Road to Evil Is Paved with Good Ob­jec­tives: Frame­work to Clas­sify and Fix Misal­ign­ments.

ShivamJan 30, 2025, 2:44 AM
1 point
0 comments11 min readLW link

How *ex­actly* can AI take your job in the next few years?

Ansh JunejaJan 30, 2025, 2:33 AM
9 points
0 comments21 min readLW link

Ab­sorb­ing Your Friends’ Powers

Alice BlairJan 30, 2025, 2:32 AM
7 points
1 comment2 min readLW link

De­tailed Ideal World Benchmark

Knight LeeJan 30, 2025, 2:31 AM
5 points
2 comments2 min readLW link

Fer­til­ity Will Never Recover

EneaszJan 30, 2025, 1:16 AM
11 points
31 comments2 min readLW link
(deathisbad.substack.com)

Pre­da­tion as Pay­ment for Criticism

BenquoJan 30, 2025, 1:06 AM
10 points
6 comments1 min readLW link
(benjaminrosshoffman.com)

Learn to Develop Your Advantage

ReverendBayesJan 29, 2025, 10:06 PM
16 points
1 comment5 min readLW link

Re­veal­ing al­ign­ment fak­ing with a sin­gle prompt

Florian_DietzJan 29, 2025, 9:01 PM
9 points
5 comments4 min readLW link

Alle­gory of the Tsunami

Evan HuJan 29, 2025, 7:09 PM
4 points
1 comment3 min readLW link

My Men­tal Model of AI Op­ti­mist Opinions

tailcalledJan 29, 2025, 6:44 PM
12 points
7 comments1 min readLW link

Plan­ning for Ex­treme AI Risks

joshcJan 29, 2025, 6:33 PM
139 points
5 comments16 min readLW link

[Question] Does the ChatGPT (web)app some­times show ac­tual o1 CoTs now?

Sohaib ImranJan 29, 2025, 5:27 PM
6 points
6 comments1 min readLW link

Dario Amodei: On Deep­Seek and Ex­port Controls

Zach Stein-PerlmanJan 29, 2025, 5:15 PM
53 points
3 comments1 min readLW link
(darioamodei.com)

An­thropic CEO calls for RSI

Andrea_MiottiJan 29, 2025, 4:54 PM
32 points
10 comments1 min readLW link
(darioamodei.com)

Effi­ciency spec­tra and “bucket of cir­cuits” cartoons

Dmitry VaintrobJan 29, 2025, 3:06 PM
18 points
0 comments7 min readLW link

Deep­Seek: Le­mon, It’s Wednesday

ZviJan 29, 2025, 3:00 PM
33 points
0 comments33 min readLW link
(thezvi.wordpress.com)

How To Prevent a Dystopia

ankJan 29, 2025, 2:16 PM
−3 points
4 comments1 min readLW link

Whereby: The Zoom al­ter­na­tive you prob­a­bly haven’t heard of

Itay DreyfusJan 29, 2025, 1:01 PM
4 points
0 comments7 min readLW link
(productidentity.co)

[Question] Whose track record of AI pre­dic­tions would you like to see eval­u­ated?

Jonny SpicerJan 29, 2025, 12:05 PM
2 points
3 comments1 min readLW link

Paper: Open Prob­lems in Mechanis­tic Interpretability

Jan 29, 2025, 10:25 AM
68 points
0 comments1 min readLW link
(arxiv.org)

Pos­i­tive jailbreaks in LLMs

dereshevJan 29, 2025, 8:41 AM
6 points
0 comments4 min readLW link

Un­trusted mon­i­tor­ing in­sights from watch­ing ChatGPT play co­or­di­na­tion games

jwfiredragonJan 29, 2025, 4:53 AM
14 points
9 comments9 min readLW link

The Game Board has been Flipped: Now is a good time to re­think what you’re doing

LintzAJan 28, 2025, 11:36 PM
115 points
30 comments13 min readLW link

Recon­cep­tu­al­iz­ing the Noth­ing­ness and Existence

HtarlovJan 28, 2025, 8:29 PM
8 points
1 comment2 min readLW link

Fake think­ing and real thinking

Joe CarlsmithJan 28, 2025, 8:05 PM
110 points
13 comments38 min readLW link

SAE reg­u­lariza­tion pro­duces more in­ter­pretable models

Jan 28, 2025, 8:02 PM
21 points
7 comments4 min readLW link

Operator

ZviJan 28, 2025, 8:00 PM
35 points
1 comment11 min readLW link
(thezvi.wordpress.com)

Deep­Seek Panic at the App Store

ZviJan 28, 2025, 7:30 PM
51 points
14 comments33 min readLW link
(thezvi.wordpress.com)

“Sharp Left Turn” dis­course: An opinionated review

Steven ByrnesJan 28, 2025, 6:47 PM
208 points
26 comments31 min readLW link

De­tect­ing out of dis­tri­bu­tion text with sur­prisal and entropy

Sandy FraserJan 28, 2025, 6:46 PM
16 points
4 comments11 min readLW link