12 Jan 2025 18:23 UTC

132 points

11 comments5 min readLW link

Do Antidepressants work? (First Take)

Jacob Goldsmith12 Jan 2025 17:11 UTC

7 points

9 comments7 min readLW link

A Novel Idea for Harnessing Magnetic Reconnection as an Energy Source

resonova12 Jan 2025 17:11 UTC

0 points

8 comments3 min readLW link

How quickly could robots scale up?

Benjamin_Todd12 Jan 2025 17:01 UTC

46 points

25 comments1 min readLW link

(benjamintodd.substack.com)

AGI Will Not Make Labor Worthless

Maxwell Tabarrok12 Jan 2025 15:09 UTC

−8 points

16 comments5 min readLW link

(www.maximum-progress.com)

The purposeful drunkard

Dmitry Vaintrob12 Jan 2025 12:27 UTC

98 points

13 comments6 min readLW link

No one has the ball on 1500 Russian olympiad winners who’ve received HPMOR

Mikhail Samin12 Jan 2025 11:43 UTC

81 points

21 comments1 min readLW link

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well). Subtleties and Open Challenges.

Roland Pihlakas12 Jan 2025 3:37 UTC

47 points

7 comments12 min readLW link

Extending control evaluations to non-scheming threats

joshc12 Jan 2025 1:42 UTC

30 points

1 comment12 min readLW link

Rolling Thresholds for AGI Scaling Regulation

Larks12 Jan 2025 1:30 UTC

40 points

6 comments6 min readLW link

AI Safety at the Frontier: Paper Highlights, December ’24

gasteigerjo11 Jan 2025 22:54 UTC

7 points

2 comments7 min readLW link

(aisafetyfrontier.substack.com)

Fluoridation: The RCT We Still Haven’t Run (But Should)

ChristianKl11 Jan 2025 21:02 UTC

22 points

5 comments2 min readLW link

In Defense of a Butlerian Jihad

sloonz11 Jan 2025 19:30 UTC

10 points

25 comments9 min readLW link

Near term discussions need something smaller and more concrete than AGI

ryan_b11 Jan 2025 18:24 UTC

13 points

0 comments6 min readLW link

A proposal for iterated interpretability with known-interpretable narrow AIs

Peter Berggren11 Jan 2025 14:43 UTC

6 points

0 comments2 min readLW link

Have frontier AI systems surpassed the self-replicating red line?

nsage11 Jan 2025 5:31 UTC

4 points

0 comments4 min readLW link

We need a universal definition of ‘agency’ and related words

CstineSublime11 Jan 2025 3:22 UTC

18 points

1 comment5 min readLW link

[Question] AI for medical care for hard-to-treat diseases?

CronoDAS10 Jan 2025 23:55 UTC

12 points

1 comment1 min readLW link

Beliefs and state of mind into 2025

RussellThor10 Jan 2025 22:07 UTC

18 points

10 comments7 min readLW link

Recommendations for Technical AI Safety Research Directions

Sam Marks10 Jan 2025 19:34 UTC

64 points

1 comment17 min readLW link

(alignment.anthropic.com)

Is AI Alignment Enough?

Aram Panasenco10 Jan 2025 18:57 UTC

30 points

6 comments6 min readLW link

[Question] What are some scenarios where an aligned AGI actually helps humanity, but many/most people don’t like it?

RomanS10 Jan 2025 18:13 UTC

14 points

6 comments3 min readLW link

Human takeover might be worse than AI takeover

Tom Davidson10 Jan 2025 16:53 UTC

147 points

56 comments8 min readLW link

(forethoughtnewsletter.substack.com)

The Alignment Mapping Program: Forging Independent Thinkers in AI Safety—A Pilot Retrospective

Alvin Ånestrand, Jonas Hallgren and Utilop

10 Jan 2025 16:22 UTC

31 points

0 comments4 min readLW link

On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Zvi10 Jan 2025 13:50 UTC

44 points

7 comments27 min readLW link

(thezvi.wordpress.com)

Scaling Sparse Feature Circuit Finding to Gemma 9B

Diego Caples, Jatin Nainani, CallumMcDougall and rrenaud

10 Jan 2025 11:08 UTC

86 points

11 comments17 min readLW link

[Question] Is Musk still net-positive for humanity?

mikbp10 Jan 2025 9:34 UTC

−5 points

18 comments1 min readLW link

Activation Magnitudes Matter On Their Own: Insights from Language Model Distributional Analysis

Matt Levinson10 Jan 2025 6:53 UTC

4 points

0 comments4 min readLW link

Dmitry’s Koan

Dmitry Vaintrob10 Jan 2025 4:27 UTC

44 points

8 comments22 min readLW link

NAO Updates, January 2025

jefftk10 Jan 2025 3:37 UTC

23 points

0 comments3 min readLW link

(naobservatory.org)

MATS mentor selection

DanielFilan and Ryan Kidd

10 Jan 2025 3:12 UTC

44 points

12 comments6 min readLW link

AI Forecasting Benchmark: Congratulations to Q4 Winners + Q1 Practice Questions Open

ChristianWilliams10 Jan 2025 3:02 UTC

7 points

0 comments2 min readLW link

(www.metaculus.com)

[Question] How do you decide to phrase predictions you ask of others? (and how do you make your own?)

CstineSublime10 Jan 2025 2:44 UTC

7 points

1 comment2 min readLW link

You are too dumb to understand insurance

Lorec9 Jan 2025 23:33 UTC

1 point

12 comments7 min readLW link

Is AI Hitting a Wall or Moving Faster Than Ever?

garrison9 Jan 2025 22:18 UTC

12 points

5 comments5 min readLW link

(garrisonlovely.substack.com)

Expevolu, Part II: Buying land to create countries

Fernando9 Jan 2025 21:11 UTC

4 points

0 comments20 min readLW link

(expevolu.substack.com)

Last week of the Discussion Phase

Raemon9 Jan 2025 19:26 UTC

35 points

0 comments3 min readLW link

Discursive Warfare and Faction Formation

Benquo9 Jan 2025 16:47 UTC

52 points

3 comments3 min readLW link

(benjaminrosshoffman.com)

Can we rescue Effective Altruism?

Elizabeth9 Jan 2025 16:40 UTC

20 points

0 comments1 min readLW link

(acesounderglass.com)

AI #98: World Ends With Six Word Story

Zvi9 Jan 2025 16:30 UTC

36 points

2 comments38 min readLW link

(thezvi.wordpress.com)

Many Worlds and the Problems of Evil

Jonah Wilberg9 Jan 2025 16:10 UTC

−3 points

2 comments9 min readLW link

PIBBSS Fellowship 2025: Bounties and Cooperative AI Track Announcement

DusanDNesic and Lucas Teixeira

9 Jan 2025 14:23 UTC

20 points

0 comments1 min readLW link

The “Everyone Can’t Be Wrong” Prior causes AI risk denial but helped prehistoric people

Knight Lee9 Jan 2025 5:54 UTC

1 point

0 comments2 min readLW link

Governance Course—Week 1 Reflections

Alice Blair9 Jan 2025 4:48 UTC

4 points

1 comment5 min readLW link

Thoughts on the In-Context Scheming AI Experiment

ExCeph9 Jan 2025 2:19 UTC

2 points

0 comments4 min readLW link

A Systematic Approach to AI Risk Analysis Through Cognitive Capabilities

Tom DAVID9 Jan 2025 0:18 UTC

2 points

0 comments3 min readLW link

Gothenburg LW / ACX meetup

Stefan8 Jan 2025 21:39 UTC

2 points

0 comments1 min readLW link

Aristocracy and Hostage Capital

Arjun Panickssery8 Jan 2025 19:38 UTC

108 points

7 comments3 min readLW link

(arjunpanickssery.substack.com)

[Question] What is the most impressive game LLMs can play well?

Cole Wyeth8 Jan 2025 19:38 UTC

19 points

20 comments1 min readLW link

The Type of Writing that Pushes Women Away

Dahlia8 Jan 2025 18:54 UTC

23 points

4 comments2 min readLW link