Out­law Code

Commander Zander30 Jan 2025 23:41 UTC
10 points
1 comment2 min readLW link

Can some­one, any­one, make su­per­in­tel­li­gence a more con­crete con­cept?

Ori Nagel30 Jan 2025 23:25 UTC
3 points
6 comments4 min readLW link

Up­com­ing Neu­ro­science Work­shop—Func­tion­al­iz­ing Brain Data, Ground-Truthing, and the Role of Ar­tifi­cial Data in Ad­vanc­ing Neuroscience

Devin Ward30 Jan 2025 23:02 UTC
1 point
0 comments1 min readLW link

What’s Be­hind the SynBio Bust?

sarahconstantin30 Jan 2025 22:30 UTC
55 points
8 comments6 min readLW link
(sarahconstantin.substack.com)

The fu­ture of hu­man­ity is in management

jasoncrawford30 Jan 2025 22:14 UTC
3 points
5 comments13 min readLW link
(newsletter.rootsofprogress.org)

[Trans­la­tion] AI Gen­er­ated Fake News is Tak­ing Over my Fam­ily Group Chat

mushroomsoup30 Jan 2025 20:24 UTC
3 points
0 comments6 min readLW link

A sketch of an AI con­trol safety case

30 Jan 2025 17:28 UTC
61 points
0 comments5 min readLW link

Grad­ual Disem­pow­er­ment: Sys­temic Ex­is­ten­tial Risks from In­cre­men­tal AI Development

30 Jan 2025 17:03 UTC
167 points
65 comments2 min readLW link
(gradual-disempowerment.ai)

[Question] Im­pli­ca­tion of Un­com­putable Problems

Nathan112330 Jan 2025 16:48 UTC
−3 points
3 comments1 min readLW link

Hello World

Charlie Sanders30 Jan 2025 15:33 UTC
7 points
0 comments2 min readLW link
(www.dailymicrofiction.com)

In­tro­duc­ing the Coal­i­tion for a Baruch Plan for AI: A Call for a Rad­i­cal Treaty-Mak­ing pro­cess for the Global Gover­nance of AI

rguerreschi30 Jan 2025 15:26 UTC
11 points
0 comments2 min readLW link

AI #101: The Shal­low End

Zvi30 Jan 2025 14:50 UTC
39 points
1 comment59 min readLW link
(thezvi.wordpress.com)

Me­moriza­tion-gen­er­al­iza­tion in practice

Dmitry Vaintrob30 Jan 2025 14:10 UTC
7 points
1 comment4 min readLW link

ARENA 5.0 - Call for Applicants

30 Jan 2025 13:18 UTC
35 points
2 comments6 min readLW link

You should read Hobbes, Locke, Hume, and Mill via Ear­lyModernTexts.com

Arjun Panickssery30 Jan 2025 12:35 UTC
52 points
3 comments3 min readLW link
(arjunpanickssery.substack.com)

[Question] Should you pub­lish solu­tions to cor­rigi­bil­ity?

rvnnt30 Jan 2025 11:52 UTC
13 points
13 comments1 min readLW link

Tether­ware #1: The case for hu­man­like AI with free will

Jáchym Fibír30 Jan 2025 10:58 UTC
5 points
14 comments10 min readLW link
(tetherware.substack.com)

A High Level Closed-Door Ses­sion Dis­cussing Deep­Seek: Vi­sion Trumps Technology

Cosmia_Nebula30 Jan 2025 9:53 UTC
30 points
1 comment8 min readLW link
(rentry.co)

Are we the Wolves now? Hu­man Eu­gen­ics un­der AI Control

Brit30 Jan 2025 8:31 UTC
−1 points
2 comments2 min readLW link

[Question] Why not train rea­son­ing mod­els with RLHF?

Caleb Biddulph30 Jan 2025 7:58 UTC
4 points
4 comments1 min readLW link

The Road to Evil Is Paved with Good Ob­jec­tives: Frame­work to Clas­sify and Fix Misal­ign­ments.

Shivam30 Jan 2025 2:44 UTC
1 point
0 comments11 min readLW link

How *ex­actly* can AI take your job in the next few years?

Ansh Juneja30 Jan 2025 2:33 UTC
9 points
0 comments21 min readLW link

Ab­sorb­ing Your Friends’ Powers

Alice Blair30 Jan 2025 2:32 UTC
8 points
1 comment2 min readLW link

De­tailed Ideal World Benchmark

Knight Lee30 Jan 2025 2:31 UTC
5 points
2 comments2 min readLW link

Fer­til­ity Will Never Recover

Eneasz30 Jan 2025 1:16 UTC
17 points
31 comments2 min readLW link
(deathisbad.substack.com)

Pre­da­tion as Pay­ment for Criticism

Benquo30 Jan 2025 1:06 UTC
10 points
6 comments1 min readLW link
(benjaminrosshoffman.com)

Learn to Develop Your Advantage

ReverendBayes29 Jan 2025 22:06 UTC
16 points
1 comment5 min readLW link

Re­veal­ing al­ign­ment fak­ing with a sin­gle prompt

Florian_Dietz29 Jan 2025 21:01 UTC
9 points
5 comments4 min readLW link

Alle­gory of the Tsunami

Evan Hu29 Jan 2025 19:09 UTC
4 points
1 comment3 min readLW link

My Men­tal Model of AI Op­ti­mist Opinions

tailcalled29 Jan 2025 18:44 UTC
14 points
7 comments1 min readLW link

Plan­ning for Ex­treme AI Risks

joshc29 Jan 2025 18:33 UTC
143 points
5 comments16 min readLW link

[Question] Does the ChatGPT (web)app some­times show ac­tual o1 CoTs now?

Sohaib Imran29 Jan 2025 17:27 UTC
6 points
6 comments1 min readLW link

Dario Amodei: On Deep­Seek and Ex­port Controls

Zach Stein-Perlman29 Jan 2025 17:15 UTC
53 points
3 comments1 min readLW link
(darioamodei.com)

An­thropic CEO calls for RSI

Andrea_Miotti29 Jan 2025 16:54 UTC
32 points
10 comments1 min readLW link
(darioamodei.com)

Effi­ciency spec­tra and “bucket of cir­cuits” cartoons

Dmitry Vaintrob29 Jan 2025 15:06 UTC
20 points
0 comments7 min readLW link

Deep­Seek: Le­mon, It’s Wednesday

Zvi29 Jan 2025 15:00 UTC
33 points
0 comments33 min readLW link
(thezvi.wordpress.com)

How To Prevent a Dystopia

ank29 Jan 2025 14:16 UTC
−3 points
4 comments1 min readLW link

Whereby: The Zoom al­ter­na­tive you prob­a­bly haven’t heard of

Itay Dreyfus29 Jan 2025 13:01 UTC
4 points
0 comments7 min readLW link
(productidentity.co)

[Question] Whose track record of AI pre­dic­tions would you like to see eval­u­ated?

Jonny Spicer29 Jan 2025 12:05 UTC
2 points
3 comments1 min readLW link

Paper: Open Prob­lems in Mechanis­tic Interpretability

29 Jan 2025 10:25 UTC
71 points
0 comments1 min readLW link
(arxiv.org)

Pos­i­tive jailbreaks in LLMs

dereshev29 Jan 2025 8:41 UTC
6 points
0 comments4 min readLW link

Un­trusted mon­i­tor­ing in­sights from watch­ing ChatGPT play co­or­di­na­tion games

jwfiredragon29 Jan 2025 4:53 UTC
14 points
8 comments9 min readLW link

The Game Board has been Flipped: Now is a good time to re­think what you’re doing

LintzA28 Jan 2025 23:36 UTC
118 points
30 comments13 min readLW link

Recon­cep­tu­al­iz­ing the Noth­ing­ness and Existence

Htarlov28 Jan 2025 20:29 UTC
8 points
1 comment2 min readLW link

Fake think­ing and real thinking

Joe Carlsmith28 Jan 2025 20:05 UTC
111 points
17 comments38 min readLW link

SAE reg­u­lariza­tion pro­duces more in­ter­pretable models

28 Jan 2025 20:02 UTC
21 points
7 comments4 min readLW link

Operator

Zvi28 Jan 2025 20:00 UTC
35 points
1 comment11 min readLW link
(thezvi.wordpress.com)

Deep­Seek Panic at the App Store

Zvi28 Jan 2025 19:30 UTC
51 points
14 comments33 min readLW link
(thezvi.wordpress.com)

“Sharp Left Turn” dis­course: An opinionated review

Steven Byrnes28 Jan 2025 18:47 UTC
220 points
31 comments31 min readLW link

De­tect­ing out of dis­tri­bu­tion text with sur­prisal and entropy

Sandy Fraser28 Jan 2025 18:46 UTC
24 points
4 comments11 min readLW link