My lat­est at­tempt to un­der­stand de­ci­sion the­ory: I asked ChatGPT to de­bate me.

bokovJan 13, 2025, 7:37 PM
−8 points
5 comments19 min readLW link

AI mod­els in­her­ently al­ter “hu­man val­ues.” So, al­ign­ment-based AI safety ap­proaches must bet­ter ac­count for value drift

bfitzgerald3132Jan 13, 2025, 7:22 PM
5 points
2 comments13 min readLW link

Chance is in the Map, not the Territory

Jan 13, 2025, 7:17 PM
67 points
18 comments7 min readLW link

Progress links and short notes, 2025-01-13

jasoncrawfordJan 13, 2025, 6:35 PM
13 points
2 comments3 min readLW link
(newsletter.rootsofprogress.org)

Bet­ter an­ti­bod­ies by en­g­ineer­ing tar­gets, not en­g­ineer­ing an­ti­bod­ies (Nabla Bio)

Abhishaike MahajanJan 13, 2025, 3:05 PM
4 points
0 comments14 min readLW link
(www.owlposting.com)

Zvi’s 2024 In Movies

ZviJan 13, 2025, 1:40 PM
44 points
4 comments15 min readLW link
(thezvi.wordpress.com)

Paper club: He et al. on mod­u­lar ar­ith­metic (part I)

Dmitry VaintrobJan 13, 2025, 11:18 AM
14 points
0 comments8 min readLW link

Cast it into the fire! De­stroy it!

Aram PanasencoJan 13, 2025, 7:30 AM
6 points
9 comments2 min readLW link

Moder­ately More Than You Wanted To Know: De­pres­sive Realism

JustisMillsJan 13, 2025, 2:57 AM
73 points
4 comments6 min readLW link
(justismills.substack.com)

Ap­ply­ing tra­di­tional eco­nomic think­ing to AGI: a trilemma

Steven ByrnesJan 13, 2025, 1:23 AM
144 points
32 comments3 min readLW link

Build­ing AI Re­search Fleets

Jan 12, 2025, 6:23 PM
130 points
11 comments5 min readLW link

Do An­tide­pres­sants work? (First Take)

Jacob GoldsmithJan 12, 2025, 5:11 PM
7 points
9 comments7 min readLW link

A Novel Idea for Har­ness­ing Mag­netic Re­con­nec­tion as an En­ergy Source

resonovaJan 12, 2025, 5:11 PM
0 points
8 comments3 min readLW link

How quickly could robots scale up?

Benjamin_ToddJan 12, 2025, 5:01 PM
47 points
25 comments1 min readLW link
(benjamintodd.substack.com)

AGI Will Not Make La­bor Worthless

Maxwell TabarrokJan 12, 2025, 3:09 PM
−7 points
16 comments5 min readLW link
(www.maximum-progress.com)

The pur­pose­ful drunkard

Dmitry VaintrobJan 12, 2025, 12:27 PM
98 points
13 comments6 min readLW link

No one has the ball on 1500 Rus­sian olympiad win­ners who’ve re­ceived HPMOR

Mikhail SaminJan 12, 2025, 11:43 AM
80 points
21 comments1 min readLW link

Why mod­el­ling multi-ob­jec­tive home­osta­sis is es­sen­tial for AI al­ign­ment (and how it helps with AI safety as well)

Roland PihlakasJan 12, 2025, 3:37 AM
46 points
7 comments10 min readLW link

Ex­tend­ing con­trol eval­u­a­tions to non-schem­ing threats

joshcJan 12, 2025, 1:42 AM
30 points
1 comment12 min readLW link

Rol­ling Thresh­olds for AGI Scal­ing Regulation

LarksJan 12, 2025, 1:30 AM
40 points
6 commentsLW link

AI Safety at the Fron­tier: Paper High­lights, De­cem­ber ’24

gasteigerjoJan 11, 2025, 10:54 PM
7 points
2 comments7 min readLW link
(aisafetyfrontier.substack.com)

Fluori­da­tion: The RCT We Still Haven’t Run (But Should)

ChristianKlJan 11, 2025, 9:02 PM
22 points
5 comments2 min readLW link

In Defense of a But­le­rian Jihad

sloonzJan 11, 2025, 7:30 PM
10 points
25 comments9 min readLW link

Near term dis­cus­sions need some­thing smaller and more con­crete than AGI

ryan_bJan 11, 2025, 6:24 PM
13 points
0 comments6 min readLW link

A pro­posal for iter­ated in­ter­pretabil­ity with known-in­ter­pretable nar­row AIs

Peter BerggrenJan 11, 2025, 2:43 PM
6 points
0 comments2 min readLW link

Have fron­tier AI sys­tems sur­passed the self-repli­cat­ing red line?

nsageJan 11, 2025, 5:31 AM
4 points
0 comments4 min readLW link

We need a uni­ver­sal defi­ni­tion of ‘agency’ and re­lated words

CstineSublimeJan 11, 2025, 3:22 AM
18 points
1 comment5 min readLW link

[Question] AI for med­i­cal care for hard-to-treat dis­eases?

CronoDASJan 10, 2025, 11:55 PM
12 points
1 comment1 min readLW link

Beliefs and state of mind into 2025

RussellThorJan 10, 2025, 10:07 PM
18 points
9 comments7 min readLW link

Recom­men­da­tions for Tech­ni­cal AI Safety Re­search Directions

Sam MarksJan 10, 2025, 7:34 PM
64 points
1 comment17 min readLW link
(alignment.anthropic.com)

Is AI Align­ment Enough?

Aram PanasencoJan 10, 2025, 6:57 PM
28 points
6 comments6 min readLW link

[Question] What are some sce­nar­ios where an al­igned AGI ac­tu­ally helps hu­man­ity, but many/​most peo­ple don’t like it?

RomanSJan 10, 2025, 6:13 PM
13 points
6 comments3 min readLW link

Hu­man takeover might be worse than AI takeover

Tom DavidsonJan 10, 2025, 4:53 PM
143 points
56 comments8 min readLW link
(forethoughtnewsletter.substack.com)

The Align­ment Map­ping Pro­gram: Forg­ing In­de­pen­dent Thinkers in AI Safety—A Pilot Retrospective

Jan 10, 2025, 4:22 PM
28 points
0 comments4 min readLW link

On Dwarkesh Pa­tel’s 4th Pod­cast With Tyler Cowen

ZviJan 10, 2025, 1:50 PM
44 points
7 comments27 min readLW link
(thezvi.wordpress.com)

Scal­ing Sparse Fea­ture Cir­cuit Find­ing to Gemma 9B

Jan 10, 2025, 11:08 AM
86 points
11 comments17 min readLW link

[Question] Is Musk still net-pos­i­tive for hu­man­ity?

mikbpJan 10, 2025, 9:34 AM
−5 points
18 comments1 min readLW link

Ac­ti­va­tion Mag­ni­tudes Mat­ter On Their Own: In­sights from Lan­guage Model Distri­bu­tional Analysis

Matt LevinsonJan 10, 2025, 6:53 AM
4 points
0 comments4 min readLW link

Dmitry’s Koan

Dmitry VaintrobJan 10, 2025, 4:27 AM
44 points
8 comments22 min readLW link

NAO Up­dates, Jan­uary 2025

jefftkJan 10, 2025, 3:37 AM
23 points
0 commentsLW link
(naobservatory.org)

MATS men­tor selection

Jan 10, 2025, 3:12 AM
44 points
12 comments6 min readLW link

AI Fore­cast­ing Bench­mark: Con­grat­u­la­tions to Q4 Win­ners + Q1 Prac­tice Ques­tions Open

ChristianWilliamsJan 10, 2025, 3:02 AM
7 points
0 commentsLW link
(www.metaculus.com)

[Question] How do you de­cide to phrase pre­dic­tions you ask of oth­ers? (and how do you make your own?)

CstineSublimeJan 10, 2025, 2:44 AM
7 points
1 comment2 min readLW link

Deleted

Yanling GuoJan 10, 2025, 1:36 AM
−10 points
0 comments1 min readLW link

You are too dumb to un­der­stand insurance

LorecJan 9, 2025, 11:33 PM
1 point
12 comments7 min readLW link

Is AI Hit­ting a Wall or Mov­ing Faster Than Ever?

garrisonJan 9, 2025, 10:18 PM
12 points
5 commentsLW link
(garrisonlovely.substack.com)

Ex­pevolu, Part II: Buy­ing land to cre­ate countries

FernandoJan 9, 2025, 9:11 PM
4 points
0 comments20 min readLW link
(expevolu.substack.com)

Last week of the Dis­cus­sion Phase

RaemonJan 9, 2025, 7:26 PM
35 points
0 comments3 min readLW link

Dis­cur­sive War­fare and Fac­tion Formation

BenquoJan 9, 2025, 4:47 PM
52 points
3 comments3 min readLW link
(benjaminrosshoffman.com)

Can we res­cue Effec­tive Altru­ism?

ElizabethJan 9, 2025, 4:40 PM
21 points
0 comments1 min readLW link
(acesounderglass.com)