All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

AllJanFeb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 131415 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

My latest attempt to understand decision theory: I asked ChatGPT to debate me.

bokov13 Jan 2025 19:37 UTC

−8 points

5 comments19 min readLW link

AI models inherently alter “human values.” So, alignment-based AI safety approaches must better account for value drift

bfitzgerald313213 Jan 2025 19:22 UTC

5 points

2 comments13 min readLW link

Chance is in the Map, not the Territory

Daniel Herrmann, ben_levinstein and Aydin Mohseni

13 Jan 2025 19:17 UTC

67 points

18 comments7 min readLW link

Progress links and short notes, 2025-01-13

jasoncrawford13 Jan 2025 18:35 UTC

13 points

2 comments3 min readLW link

(newsletter.rootsofprogress.org)

Better antibodies by engineering targets, not engineering antibodies (Nabla Bio)

Abhishaike Mahajan13 Jan 2025 15:05 UTC

4 points

0 comments14 min readLW link

(www.owlposting.com)

Zvi’s 2024 In Movies

Zvi13 Jan 2025 13:40 UTC

44 points

4 comments15 min readLW link

(thezvi.wordpress.com)

Paper club: He et al. on modular arithmetic (part I)

Dmitry Vaintrob13 Jan 2025 11:18 UTC

14 points

0 comments8 min readLW link

Cast it into the fire! Destroy it!

Aram Panasenco13 Jan 2025 7:30 UTC

6 points

9 comments2 min readLW link

Moderately More Than You Wanted To Know: Depressive Realism

JustisMills13 Jan 2025 2:57 UTC

73 points

4 comments6 min readLW link

(justismills.substack.com)

Applying traditional economic thinking to AGI: a trilemma

Steven Byrnes13 Jan 2025 1:23 UTC

153 points

32 comments3 min readLW link

Building AI Research Fleets

Ben Goldhaber and Jesse Hoogland

12 Jan 2025 18:23 UTC

132 points

11 comments5 min readLW link

Do Antidepressants work? (First Take)

Jacob Goldsmith12 Jan 2025 17:11 UTC

7 points

9 comments7 min readLW link

A Novel Idea for Harnessing Magnetic Reconnection as an Energy Source

resonova12 Jan 2025 17:11 UTC

0 points

8 comments3 min readLW link

How quickly could robots scale up?

Benjamin_Todd12 Jan 2025 17:01 UTC

46 points

25 comments1 min readLW link

(benjamintodd.substack.com)

AGI Will Not Make Labor Worthless

Maxwell Tabarrok12 Jan 2025 15:09 UTC

−8 points

16 comments5 min readLW link

(www.maximum-progress.com)

The purposeful drunkard

Dmitry Vaintrob12 Jan 2025 12:27 UTC

98 points

13 comments6 min readLW link

No one has the ball on 1500 Russian olympiad winners who’ve received HPMOR

Mikhail Samin12 Jan 2025 11:43 UTC

81 points

21 comments1 min readLW link

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well). Subtleties and Open Challenges.

Roland Pihlakas12 Jan 2025 3:37 UTC

47 points

7 comments12 min readLW link

Extending control evaluations to non-scheming threats

joshc12 Jan 2025 1:42 UTC

30 points

1 comment12 min readLW link

Rolling Thresholds for AGI Scaling Regulation

Larks12 Jan 2025 1:30 UTC

40 points

6 comments6 min readLW link

AI Safety at the Frontier: Paper Highlights, December ’24

gasteigerjo11 Jan 2025 22:54 UTC

7 points

2 comments7 min readLW link

(aisafetyfrontier.substack.com)

Fluoridation: The RCT We Still Haven’t Run (But Should)

ChristianKl11 Jan 2025 21:02 UTC

22 points

5 comments2 min readLW link

In Defense of a Butlerian Jihad

sloonz11 Jan 2025 19:30 UTC

10 points

25 comments9 min readLW link

Near term discussions need something smaller and more concrete than AGI

ryan_b11 Jan 2025 18:24 UTC

13 points

0 comments6 min readLW link

A proposal for iterated interpretability with known-interpretable narrow AIs

Peter Berggren11 Jan 2025 14:43 UTC

6 points

0 comments2 min readLW link

Have frontier AI systems surpassed the self-replicating red line?

nsage11 Jan 2025 5:31 UTC

4 points

0 comments4 min readLW link

We need a universal definition of ‘agency’ and related words

CstineSublime11 Jan 2025 3:22 UTC

18 points

1 comment5 min readLW link

[Question] AI for medical care for hard-to-treat diseases?

CronoDAS10 Jan 2025 23:55 UTC

12 points

1 comment1 min readLW link

Beliefs and state of mind into 2025

RussellThor10 Jan 2025 22:07 UTC

18 points

10 comments7 min readLW link

Recommendations for Technical AI Safety Research Directions

Sam Marks10 Jan 2025 19:34 UTC

64 points

1 comment17 min readLW link

(alignment.anthropic.com)

Is AI Alignment Enough?

Aram Panasenco10 Jan 2025 18:57 UTC

30 points

6 comments6 min readLW link

[Question] What are some scenarios where an aligned AGI actually helps humanity, but many/most people don’t like it?

RomanS10 Jan 2025 18:13 UTC

14 points

6 comments3 min readLW link

Human takeover might be worse than AI takeover

Tom Davidson10 Jan 2025 16:53 UTC

147 points

56 comments8 min readLW link

(forethoughtnewsletter.substack.com)

The Alignment Mapping Program: Forging Independent Thinkers in AI Safety—A Pilot Retrospective

Alvin Ånestrand, Jonas Hallgren and Utilop

10 Jan 2025 16:22 UTC

31 points

0 comments4 min readLW link

On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Zvi10 Jan 2025 13:50 UTC

44 points

7 comments27 min readLW link

(thezvi.wordpress.com)

Scaling Sparse Feature Circuit Finding to Gemma 9B

Diego Caples, Jatin Nainani, CallumMcDougall and rrenaud

10 Jan 2025 11:08 UTC

86 points

11 comments17 min readLW link

[Question] Is Musk still net-positive for humanity?

mikbp10 Jan 2025 9:34 UTC

−5 points

18 comments1 min readLW link

Activation Magnitudes Matter On Their Own: Insights from Language Model Distributional Analysis

Matt Levinson10 Jan 2025 6:53 UTC

4 points

0 comments4 min readLW link

Dmitry’s Koan

Dmitry Vaintrob10 Jan 2025 4:27 UTC

44 points

8 comments22 min readLW link

NAO Updates, January 2025

jefftk10 Jan 2025 3:37 UTC

23 points

0 comments3 min readLW link

(naobservatory.org)

MATS mentor selection

DanielFilan and Ryan Kidd

10 Jan 2025 3:12 UTC

44 points

12 comments6 min readLW link

AI Forecasting Benchmark: Congratulations to Q4 Winners + Q1 Practice Questions Open

ChristianWilliams10 Jan 2025 3:02 UTC

7 points

0 comments2 min readLW link

(www.metaculus.com)

[Question] How do you decide to phrase predictions you ask of others? (and how do you make your own?)

CstineSublime10 Jan 2025 2:44 UTC

7 points

1 comment2 min readLW link

You are too dumb to understand insurance

Lorec9 Jan 2025 23:33 UTC

1 point

12 comments7 min readLW link

Is AI Hitting a Wall or Moving Faster Than Ever?

garrison9 Jan 2025 22:18 UTC

12 points

5 comments5 min readLW link

(garrisonlovely.substack.com)

Expevolu, Part II: Buying land to create countries

Fernando9 Jan 2025 21:11 UTC

4 points

0 comments20 min readLW link

(expevolu.substack.com)

Last week of the Discussion Phase

Raemon9 Jan 2025 19:26 UTC

35 points

0 comments3 min readLW link

Discursive Warfare and Faction Formation

Benquo9 Jan 2025 16:47 UTC

52 points

3 comments3 min readLW link

(benjaminrosshoffman.com)

Can we rescue Effective Altruism?

Elizabeth9 Jan 2025 16:40 UTC

20 points

0 comments1 min readLW link

(acesounderglass.com)

AI #98: World Ends With Six Word Story

Zvi9 Jan 2025 16:30 UTC

36 points

2 comments38 min readLW link

(thezvi.wordpress.com)