All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

AllJanFeb Mar Apr May Jun

All 1 2 3 4 5 6 7 8 9 10 111213 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

AI Safety at the Frontier: Paper Highlights, December ’24

gasteigerjoJan 11, 2025, 10:54 PM

7 points

2 comments7 min readLW link

(aisafetyfrontier.substack.com)

Fluoridation: The RCT We Still Haven’t Run (But Should)

ChristianKlJan 11, 2025, 9:02 PM

22 points

5 comments2 min readLW link

In Defense of a Butlerian Jihad

sloonzJan 11, 2025, 7:30 PM

10 points

25 comments9 min readLW link

Near term discussions need something smaller and more concrete than AGI

ryan_bJan 11, 2025, 6:24 PM

13 points

0 comments6 min readLW link

A proposal for iterated interpretability with known-interpretable narrow AIs

Peter BerggrenJan 11, 2025, 2:43 PM

6 points

0 comments2 min readLW link

Have frontier AI systems surpassed the self-replicating red line?

nsageJan 11, 2025, 5:31 AM

4 points

0 comments4 min readLW link

We need a universal definition of ‘agency’ and related words

CstineSublimeJan 11, 2025, 3:22 AM

18 points

1 comment5 min readLW link

[Question] AI for medical care for hard-to-treat diseases?

CronoDASJan 10, 2025, 11:55 PM

12 points

1 comment1 min readLW link

Beliefs and state of mind into 2025

RussellThorJan 10, 2025, 10:07 PM

18 points

9 comments7 min readLW link

Recommendations for Technical AI Safety Research Directions

Sam MarksJan 10, 2025, 7:34 PM

64 points

1 comment17 min readLW link

(alignment.anthropic.com)

Is AI Alignment Enough?

Aram PanasencoJan 10, 2025, 6:57 PM

28 points

6 comments6 min readLW link

[Question] What are some scenarios where an aligned AGI actually helps humanity, but many/most people don’t like it?

RomanSJan 10, 2025, 6:13 PM

13 points

6 comments3 min readLW link

Human takeover might be worse than AI takeover

Tom DavidsonJan 10, 2025, 4:53 PM

143 points

56 comments8 min readLW link

(forethoughtnewsletter.substack.com)

The Alignment Mapping Program: Forging Independent Thinkers in AI Safety—A Pilot Retrospective

Alvin Ånestrand, Jonas Hallgren and Utilop

Jan 10, 2025, 4:22 PM

28 points

0 comments4 min readLW link

On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

ZviJan 10, 2025, 1:50 PM

44 points

7 comments27 min readLW link

(thezvi.wordpress.com)

Scaling Sparse Feature Circuit Finding to Gemma 9B

Diego Caples, Jatin Nainani, CallumMcDougall and rrenaud

Jan 10, 2025, 11:08 AM

86 points

11 comments17 min readLW link

[Question] Is Musk still net-positive for humanity?

mikbpJan 10, 2025, 9:34 AM

−5 points

18 comments1 min readLW link

Activation Magnitudes Matter On Their Own: Insights from Language Model Distributional Analysis

Matt LevinsonJan 10, 2025, 6:53 AM

4 points

0 comments4 min readLW link

Dmitry’s Koan

Dmitry VaintrobJan 10, 2025, 4:27 AM

44 points

8 comments22 min readLW link

NAO Updates, January 2025

jefftkJan 10, 2025, 3:37 AM

23 points

0 comments LW link

(naobservatory.org)

MATS mentor selection

DanielFilan and Ryan Kidd

Jan 10, 2025, 3:12 AM

44 points

12 comments6 min readLW link

AI Forecasting Benchmark: Congratulations to Q4 Winners + Q1 Practice Questions Open

ChristianWilliamsJan 10, 2025, 3:02 AM

7 points

0 comments LW link

(www.metaculus.com)

[Question] How do you decide to phrase predictions you ask of others? (and how do you make your own?)

CstineSublimeJan 10, 2025, 2:44 AM

7 points

1 comment2 min readLW link

Deleted

Yanling GuoJan 10, 2025, 1:36 AM

−10 points

0 comments1 min readLW link

You are too dumb to understand insurance

LorecJan 9, 2025, 11:33 PM

1 point

12 comments7 min readLW link

Is AI Hitting a Wall or Moving Faster Than Ever?

garrisonJan 9, 2025, 10:18 PM

12 points

5 comments LW link

(garrisonlovely.substack.com)

Expevolu, Part II: Buying land to create countries

FernandoJan 9, 2025, 9:11 PM

4 points

0 comments20 min readLW link

(expevolu.substack.com)

Last week of the Discussion Phase

RaemonJan 9, 2025, 7:26 PM

35 points

0 comments3 min readLW link

Discursive Warfare and Faction Formation

BenquoJan 9, 2025, 4:47 PM

52 points

3 comments3 min readLW link

(benjaminrosshoffman.com)

Can we rescue Effective Altruism?

ElizabethJan 9, 2025, 4:40 PM

21 points

0 comments1 min readLW link

(acesounderglass.com)

AI #98: World Ends With Six Word Story

ZviJan 9, 2025, 4:30 PM

36 points

2 comments38 min readLW link

(thezvi.wordpress.com)

Many Worlds and the Problems of Evil

Jonah WilbergJan 9, 2025, 4:10 PM

−3 points

2 comments9 min readLW link

PIBBSS Fellowship 2025: Bounties and Cooperative AI Track Announcement

DusanDNesic and Lucas Teixeira

Jan 9, 2025, 2:23 PM

20 points

0 comments1 min readLW link

The “Everyone Can’t Be Wrong” Prior causes AI risk denial but helped prehistoric people

Knight LeeJan 9, 2025, 5:54 AM

1 point

0 comments2 min readLW link

Governance Course—Week 1 Reflections

Alice BlairJan 9, 2025, 4:48 AM

4 points

1 comment5 min readLW link

Thoughts on the In-Context Scheming AI Experiment

ExCephJan 9, 2025, 2:19 AM

2 points

0 comments4 min readLW link

A Systematic Approach to AI Risk Analysis Through Cognitive Capabilities

Tom DAVIDJan 9, 2025, 12:18 AM

2 points

0 comments3 min readLW link

Gothenburg LW / ACX meetup

StefanJan 8, 2025, 9:39 PM

2 points

0 comments1 min readLW link

Aristocracy and Hostage Capital

Arjun PanicksseryJan 8, 2025, 7:38 PM

108 points

7 comments3 min readLW link

(arjunpanickssery.substack.com)

[Question] What is the most impressive game LLMs can play well?

Cole WyethJan 8, 2025, 7:38 PM

19 points

20 comments1 min readLW link

The Type of Writing that Pushes Women Away

DahliaJan 8, 2025, 6:54 PM

22 points

4 comments2 min readLW link

Ann Altman has filed a lawsuit in US federal court alleging that she was sexually abused by Sam Altman

quanticleJan 8, 2025, 2:59 PM

7 points

3 comments1 min readLW link

AI Safety Outreach Seminar & Social (online)

Linda Linsefors8 Jan 2025 13:25 UTC

9 points

0 comments1 min readLW link

XX by Rian Hughes: Pretentious Bullshit

Yair Halberstadt8 Jan 2025 13:02 UTC

33 points

5 comments5 min readLW link

Activation space interpretability may be doomed

bilalchughtai and Lucius Bushnaq

8 Jan 2025 12:49 UTC

148 points

34 comments8 min readLW link

AI Safety as a YC Startup

Lukas Petersson8 Jan 2025 10:46 UTC

56 points

9 comments5 min readLW link

The absolute basics of representation theory of finite groups

Dmitry Vaintrob8 Jan 2025 9:47 UTC

21 points

1 comment10 min readLW link

Implications of the AI Security Gap

Dan Braun8 Jan 2025 8:31 UTC

45 points

0 comments9 min readLW link

What are polysemantic neurons?

Vishakha and Algon

8 Jan 2025 7:35 UTC

8 points

0 comments4 min readLW link

(aisafety.info)

Tips On Empirical Research Slides

James Chua, John Hughes, Ethan Perez and Owain_Evans

8 Jan 2025 5:06 UTC

90 points

4 comments6 min readLW link