The Sorry State of AI X-Risk Ad­vo­cacy, and Thoughts on Do­ing Better

Thane Ruthenis21 Feb 2025 20:15 UTC
152 points
53 comments6 min readLW link

ParaS­copes: Do Lan­guage Models Plan the Up­com­ing Para­graph?

NickyP21 Feb 2025 16:50 UTC
36 points
2 comments20 min readLW link

Lin­guis­tic Im­pe­ri­al­ism in AI: En­forc­ing Hu­man-Read­able Chain-of-Thought

Lukas Petersson21 Feb 2025 15:45 UTC
5 points
0 comments5 min readLW link
(lukaspetersson.com)

On OpenAI’s Model Spec 2.0

Zvi21 Feb 2025 14:10 UTC
52 points
4 comments43 min readLW link
(thezvi.wordpress.com)

Longter­mist im­pli­ca­tions of aliens Space-Far­ing Civ­i­liza­tions—Introduction

Maxime Riché21 Feb 2025 12:08 UTC
21 points
0 comments6 min readLW link

MAISU—Min­i­mal AI Safety Un­con­fer­ence

Linda Linsefors21 Feb 2025 11:36 UTC
19 points
2 comments2 min readLW link

The case for the death penalty

Yair Halberstadt21 Feb 2025 8:30 UTC
26 points
80 comments5 min readLW link

Make Su­per­in­tel­li­gence Loving

Davey Morse21 Feb 2025 6:07 UTC
8 points
9 comments5 min readLW link

Fun, end­less art de­bates v. morally charged art de­bates that are in­trin­si­cally endless

d_el_ez21 Feb 2025 4:44 UTC
6 points
2 comments2 min readLW link

The Take­off Speeds Model Pre­dicts We May Be En­ter­ing Crunch Time

johncrox21 Feb 2025 2:26 UTC
45 points
3 comments18 min readLW link
(readtheoom.substack.com)

Hu­mans are Just Self Aware In­tel­li­gent Biolog­i­cal Machines

asksathvik21 Feb 2025 1:03 UTC
3 points
9 comments2 min readLW link
(asksathvik.substack.com)

Pre-ASI: The case for an en­light­ened mind, cap­i­tal, and AI liter­acy in max­i­miz­ing the good life

Noahh21 Feb 2025 0:03 UTC
5 points
5 comments6 min readLW link
(open.substack.com)

Ti­maeus in 2024

20 Feb 2025 23:54 UTC
99 points
1 comment8 min readLW link

Biolog­i­cal hu­mans col­lec­tively ex­ert at most 400 gi­gabits/​s of con­trol over the world.

benwr20 Feb 2025 23:44 UTC
15 points
3 comments1 min readLW link

The first RCT for GLP-1 drugs and al­co­holism isn’t what we hoped

dynomight20 Feb 2025 22:30 UTC
62 points
4 comments6 min readLW link
(dynomight.net)

Pub­lished re­port: Path­ways to short TAI timelines

Zershaaneh Qureshi20 Feb 2025 22:10 UTC
22 points
0 comments17 min readLW link
(www.convergenceanalysis.org)

Neu­ral Scal­ing Laws Rooted in the Data Distribution

aribrill20 Feb 2025 21:22 UTC
8 points
0 comments1 min readLW link
(arxiv.org)

De­mon­strat­ing speci­fi­ca­tion gam­ing in rea­son­ing models

Matrice Jacobine20 Feb 2025 19:26 UTC
4 points
0 comments1 min readLW link
(arxiv.org)

What makes a the­ory of in­tel­li­gence use­ful?

Cole Wyeth20 Feb 2025 19:22 UTC
16 points
0 comments11 min readLW link

AI #104: Amer­i­can State Ca­pac­ity on the Brink

Zvi20 Feb 2025 14:50 UTC
37 points
9 comments44 min readLW link
(thezvi.wordpress.com)

US AI Safety In­sti­tute will be ‘gut­ted,’ Ax­ios reports

Matrice Jacobine20 Feb 2025 14:40 UTC
11 points
1 comment1 min readLW link
(www.zdnet.com)

Hu­man-AI Re­la­tion­al­ity is Already Here

bridgebot20 Feb 2025 7:08 UTC
17 points
0 comments15 min readLW link

Safe Distil­la­tion With a Pow­er­ful Un­trusted AI

Alek Westover20 Feb 2025 3:14 UTC
5 points
1 comment5 min readLW link

Mo­du­lar­ity and as­sem­bly: AI safety via think­ing smaller

D Wong20 Feb 2025 0:58 UTC
2 points
0 comments11 min readLW link
(criticalreason.substack.com)

Eliezer’s Lost Align­ment Ar­ti­cles /​ The Ar­bital Sequence

20 Feb 2025 0:48 UTC
207 points
10 comments5 min readLW link

Ar­bital has been im­ported to LessWrong

20 Feb 2025 0:47 UTC
281 points
30 comments5 min readLW link

The Dilemma’s Dilemma

James Stephen Brown19 Feb 2025 23:50 UTC
9 points
12 comments7 min readLW link
(nonzerosum.games)

[Question] Why do we have the NATO logo?

KvmanThinking19 Feb 2025 22:59 UTC
1 point
4 comments1 min readLW link

Me­tac­u­lus Q4 AI Bench­mark­ing: Bots Are Clos­ing The Gap

19 Feb 2025 22:42 UTC
13 points
0 comments13 min readLW link
(www.metaculus.com)

Sev­eral Ar­gu­ments Against the Math­e­mat­i­cal Uni­verse Hy­poth­e­sis

Vittu Perkele19 Feb 2025 22:13 UTC
−4 points
6 comments3 min readLW link
(open.substack.com)

Liter­a­ture Re­view of Text AutoEncoders

NickyP19 Feb 2025 21:54 UTC
20 points
5 comments8 min readLW link

Deep­Seek Made it Even Harder for US AI Com­pa­nies to Ever Reach Profitability

garrison19 Feb 2025 21:02 UTC
10 points
1 comment3 min readLW link
(garrisonlovely.substack.com)

Won’t vs. Can’t: Sand­bag­ging-like Be­hav­ior from Claude Models

19 Feb 2025 20:47 UTC
15 points
1 comment1 min readLW link
(alignment.anthropic.com)

AI Align­ment and the Fi­nan­cial War Against Nar­cis­sis­tic Manipulation

henophilia19 Feb 2025 20:42 UTC
−17 points
2 comments3 min readLW link

How to Make Superbabies

19 Feb 2025 20:39 UTC
625 points
358 comments31 min readLW link

The New­bie’s Guide to Nav­i­gat­ing AI Futures

keithjmenezes19 Feb 2025 20:37 UTC
−1 points
0 comments40 min readLW link

Against Un­limited Ge­nius for Baby-Killers

ggggg19 Feb 2025 20:33 UTC
−7 points
1 comment3 min readLW link
(ggggggggggggggggggggggg.substack.com)

New LLM Scal­ing Law

wrmedford19 Feb 2025 20:21 UTC
2 points
0 comments1 min readLW link
(github.com)

Go Grok Yourself

Zvi19 Feb 2025 20:20 UTC
57 points
2 comments17 min readLW link
(thezvi.wordpress.com)

[Question] Take over my pro­ject: do com­putable agents plan against the uni­ver­sal dis­tri­bu­tion pes­simisti­cally?

Cole Wyeth19 Feb 2025 20:17 UTC
25 points
3 comments3 min readLW link

When should we worry about AI power-seek­ing?

Joe Carlsmith19 Feb 2025 19:44 UTC
22 points
0 comments18 min readLW link
(joecarlsmith.substack.com)

Su­perBa­bies pod­cast with Gene Smith

Eneasz19 Feb 2025 19:36 UTC
35 points
1 comment1 min readLW link
(thebayesianconspiracy.substack.com)

Un­de­sir­able Con­clu­sions and Ori­gin Adjustment

Jerdle19 Feb 2025 18:35 UTC
3 points
0 comments5 min readLW link

How might we safely pass the buck to AI?

joshc19 Feb 2025 17:48 UTC
83 points
58 comments31 min readLW link

Us­ing Prompt Eval­u­a­tion to Com­bat Bio-Weapon Research

19 Feb 2025 12:39 UTC
11 points
2 comments3 min readLW link

In­tel­li­gence Is Jagged

Adam Train19 Feb 2025 7:08 UTC
6 points
1 comment3 min readLW link

Closed-ended ques­tions aren’t as hard as you think

electroswing19 Feb 2025 3:53 UTC
6 points
0 comments3 min readLW link

Un­der­grad AI Safety Conference

JoNeedsSleep19 Feb 2025 3:43 UTC
19 points
0 comments1 min readLW link

Per­ma­nent prop­er­ties of things are a self-fulfilling prophecy

YanLyutnev19 Feb 2025 0:08 UTC
4 points
0 comments9 min readLW link

Places of Lov­ing Grace [Story]

ank18 Feb 2025 23:49 UTC
−1 points
0 comments4 min readLW link