Build­ing AI Re­search Fleets

12 Jan 2025 18:23 UTC
132 points
11 comments5 min readLW link

Do An­tide­pres­sants work? (First Take)

Jacob Goldsmith12 Jan 2025 17:11 UTC
7 points
9 comments7 min readLW link

A Novel Idea for Har­ness­ing Mag­netic Re­con­nec­tion as an En­ergy Source

resonova12 Jan 2025 17:11 UTC
0 points
8 comments3 min readLW link

How quickly could robots scale up?

Benjamin_Todd12 Jan 2025 17:01 UTC
46 points
25 comments1 min readLW link
(benjamintodd.substack.com)

AGI Will Not Make La­bor Worthless

Maxwell Tabarrok12 Jan 2025 15:09 UTC
−8 points
16 comments5 min readLW link
(www.maximum-progress.com)

The pur­pose­ful drunkard

Dmitry Vaintrob12 Jan 2025 12:27 UTC
98 points
13 comments6 min readLW link

No one has the ball on 1500 Rus­sian olympiad win­ners who’ve re­ceived HPMOR

Mikhail Samin12 Jan 2025 11:43 UTC
81 points
21 comments1 min readLW link

Why mod­el­ling multi-ob­jec­tive home­osta­sis is es­sen­tial for AI al­ign­ment (and how it helps with AI safety as well). Subtleties and Open Challenges.

Roland Pihlakas12 Jan 2025 3:37 UTC
47 points
7 comments12 min readLW link

Ex­tend­ing con­trol eval­u­a­tions to non-schem­ing threats

joshc12 Jan 2025 1:42 UTC
30 points
1 comment12 min readLW link

Rol­ling Thresh­olds for AGI Scal­ing Regulation

Larks12 Jan 2025 1:30 UTC
40 points
6 comments6 min readLW link

AI Safety at the Fron­tier: Paper High­lights, De­cem­ber ’24

gasteigerjo11 Jan 2025 22:54 UTC
7 points
2 comments7 min readLW link
(aisafetyfrontier.substack.com)

Fluori­da­tion: The RCT We Still Haven’t Run (But Should)

ChristianKl11 Jan 2025 21:02 UTC
22 points
5 comments2 min readLW link

In Defense of a But­le­rian Jihad

sloonz11 Jan 2025 19:30 UTC
10 points
25 comments9 min readLW link

Near term dis­cus­sions need some­thing smaller and more con­crete than AGI

ryan_b11 Jan 2025 18:24 UTC
13 points
0 comments6 min readLW link

A pro­posal for iter­ated in­ter­pretabil­ity with known-in­ter­pretable nar­row AIs

Peter Berggren11 Jan 2025 14:43 UTC
6 points
0 comments2 min readLW link

Have fron­tier AI sys­tems sur­passed the self-repli­cat­ing red line?

nsage11 Jan 2025 5:31 UTC
4 points
0 comments4 min readLW link

We need a uni­ver­sal defi­ni­tion of ‘agency’ and re­lated words

CstineSublime11 Jan 2025 3:22 UTC
18 points
1 comment5 min readLW link

[Question] AI for med­i­cal care for hard-to-treat dis­eases?

CronoDAS10 Jan 2025 23:55 UTC
12 points
1 comment1 min readLW link

Beliefs and state of mind into 2025

RussellThor10 Jan 2025 22:07 UTC
18 points
10 comments7 min readLW link

Recom­men­da­tions for Tech­ni­cal AI Safety Re­search Directions

Sam Marks10 Jan 2025 19:34 UTC
64 points
1 comment17 min readLW link
(alignment.anthropic.com)

Is AI Align­ment Enough?

Aram Panasenco10 Jan 2025 18:57 UTC
30 points
6 comments6 min readLW link

[Question] What are some sce­nar­ios where an al­igned AGI ac­tu­ally helps hu­man­ity, but many/​most peo­ple don’t like it?

RomanS10 Jan 2025 18:13 UTC
14 points
6 comments3 min readLW link

Hu­man takeover might be worse than AI takeover

Tom Davidson10 Jan 2025 16:53 UTC
147 points
56 comments8 min readLW link
(forethoughtnewsletter.substack.com)

The Align­ment Map­ping Pro­gram: Forg­ing In­de­pen­dent Thinkers in AI Safety—A Pilot Retrospective

10 Jan 2025 16:22 UTC
31 points
0 comments4 min readLW link

On Dwarkesh Pa­tel’s 4th Pod­cast With Tyler Cowen

Zvi10 Jan 2025 13:50 UTC
44 points
7 comments27 min readLW link
(thezvi.wordpress.com)

Scal­ing Sparse Fea­ture Cir­cuit Find­ing to Gemma 9B

10 Jan 2025 11:08 UTC
86 points
11 comments17 min readLW link

[Question] Is Musk still net-pos­i­tive for hu­man­ity?

mikbp10 Jan 2025 9:34 UTC
−5 points
18 comments1 min readLW link

Ac­ti­va­tion Mag­ni­tudes Mat­ter On Their Own: In­sights from Lan­guage Model Distri­bu­tional Analysis

Matt Levinson10 Jan 2025 6:53 UTC
4 points
0 comments4 min readLW link

Dmitry’s Koan

Dmitry Vaintrob10 Jan 2025 4:27 UTC
44 points
8 comments22 min readLW link

NAO Up­dates, Jan­uary 2025

jefftk10 Jan 2025 3:37 UTC
23 points
0 comments3 min readLW link
(naobservatory.org)

MATS men­tor selection

10 Jan 2025 3:12 UTC
44 points
12 comments6 min readLW link

AI Fore­cast­ing Bench­mark: Con­grat­u­la­tions to Q4 Win­ners + Q1 Prac­tice Ques­tions Open

ChristianWilliams10 Jan 2025 3:02 UTC
7 points
0 comments2 min readLW link
(www.metaculus.com)

[Question] How do you de­cide to phrase pre­dic­tions you ask of oth­ers? (and how do you make your own?)

CstineSublime10 Jan 2025 2:44 UTC
7 points
1 comment2 min readLW link

You are too dumb to un­der­stand insurance

Lorec9 Jan 2025 23:33 UTC
1 point
12 comments7 min readLW link

Is AI Hit­ting a Wall or Mov­ing Faster Than Ever?

garrison9 Jan 2025 22:18 UTC
12 points
5 comments5 min readLW link
(garrisonlovely.substack.com)

Ex­pevolu, Part II: Buy­ing land to cre­ate countries

Fernando9 Jan 2025 21:11 UTC
4 points
0 comments20 min readLW link
(expevolu.substack.com)

Last week of the Dis­cus­sion Phase

Raemon9 Jan 2025 19:26 UTC
35 points
0 comments3 min readLW link

Dis­cur­sive War­fare and Fac­tion Formation

Benquo9 Jan 2025 16:47 UTC
52 points
3 comments3 min readLW link
(benjaminrosshoffman.com)

Can we res­cue Effec­tive Altru­ism?

Elizabeth9 Jan 2025 16:40 UTC
20 points
0 comments1 min readLW link
(acesounderglass.com)

AI #98: World Ends With Six Word Story

Zvi9 Jan 2025 16:30 UTC
36 points
2 comments38 min readLW link
(thezvi.wordpress.com)

Many Wor­lds and the Prob­lems of Evil

Jonah Wilberg9 Jan 2025 16:10 UTC
−3 points
2 comments9 min readLW link

PIBBSS Fel­low­ship 2025: Boun­ties and Co­op­er­a­tive AI Track Announcement

9 Jan 2025 14:23 UTC
20 points
0 comments1 min readLW link

The “Every­one Can’t Be Wrong” Prior causes AI risk de­nial but helped pre­his­toric people

Knight Lee9 Jan 2025 5:54 UTC
1 point
0 comments2 min readLW link

Gover­nance Course—Week 1 Reflections

Alice Blair9 Jan 2025 4:48 UTC
4 points
1 comment5 min readLW link

Thoughts on the In-Con­text Schem­ing AI Experiment

ExCeph9 Jan 2025 2:19 UTC
2 points
0 comments4 min readLW link

A Sys­tem­atic Ap­proach to AI Risk Anal­y­sis Through Cog­ni­tive Capabilities

Tom DAVID9 Jan 2025 0:18 UTC
2 points
0 comments3 min readLW link

Gothen­burg LW /​ ACX meetup

Stefan8 Jan 2025 21:39 UTC
2 points
0 comments1 min readLW link

Aris­toc­racy and Hostage Capital

Arjun Panickssery8 Jan 2025 19:38 UTC
108 points
7 comments3 min readLW link
(arjunpanickssery.substack.com)

[Question] What is the most im­pres­sive game LLMs can play well?

Cole Wyeth8 Jan 2025 19:38 UTC
19 points
20 comments1 min readLW link

The Type of Writ­ing that Pushes Women Away

Dahlia8 Jan 2025 18:54 UTC
23 points
4 comments2 min readLW link