All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb MarAprMay Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

A Platform for Falsifiable Conjectures and Public Refutation — Would This Be Useful?

PetrusNonius8 Apr 2025 21:09 UTC

1 point

1 comment1 min readLW link

Quantifying SAE Quality with Feature Steerability Metrics

phenomanon8 Apr 2025 20:55 UTC

2 points

0 comments4 min readLW link

MATS is hiring!

Ryan Kidd and VVN

8 Apr 2025 20:45 UTC

8 points

0 comments6 min readLW link

birds and mammals independently evolved intelligence

bhauth8 Apr 2025 20:00 UTC

73 points

23 comments1 min readLW link

(www.quantamagazine.org)

Alignment Faking Revisited: Improved Classifiers and Open Source Extensions

John Hughes, abhayesian, Akbir Khan and Fabien Roger

8 Apr 2025 17:32 UTC

147 points

20 comments12 min readLW link

Thinking Machines

Knight Lee8 Apr 2025 17:27 UTC

3 points

0 comments6 min readLW link

Digital Error Correction and Lock-In

alamerton8 Apr 2025 15:46 UTC

1 point

0 comments5 min readLW link

(alfielamerton.substack.com)

[Question] What faithfulness metrics should general claims about CoT faithfulness be based upon?

Rauno Arike8 Apr 2025 15:27 UTC

24 points

0 comments4 min readLW link

AI 2027: Responses

Zvi8 Apr 2025 12:50 UTC

111 points

3 comments30 min readLW link

(thezvi.wordpress.com)

The first AI war will be in your computer

Viliam8 Apr 2025 9:28 UTC

43 points

10 comments3 min readLW link

Who wants to bet me $25k at 1:7 odds that there won’t be an AI market crash in the next year?

Remmelt8 Apr 2025 8:31 UTC

25 points

19 comments1 min readLW link

Rethinking Friction: Equity and Motivation Across Domains

eltimbalino8 Apr 2025 3:58 UTC

−1 points

0 comments2 min readLW link

(www.lesswrong.com)

On different discussion traditions

Eugene Shcherbinin7 Apr 2025 23:00 UTC

1 point

0 comments2 min readLW link

Log-linear Scaling is Worth the Cost due to Gains in Long-Horizon Tasks

shash427 Apr 2025 21:50 UTC

16 points

2 comments1 min readLW link

AI Safety at the Frontier: Paper Highlights, March ’25

gasteigerjo7 Apr 2025 20:17 UTC

9 points

0 comments9 min readLW link

(aisafetyfrontier.substack.com)

Factory farming intelligent minds

Odd anon7 Apr 2025 20:05 UTC

5 points

6 comments20 min readLW link

What alignment-relevant abilities might Terence Tao lack?

Towards_Keeperhood7 Apr 2025 19:44 UTC

13 points

2 comments3 min readLW link

[Question] Are there any (semi-)detailed future scenarios where we win?

Jan Betley7 Apr 2025 19:13 UTC

15 points

3 comments1 min readLW link

Austin Chen on Winning, Risk-Taking, and FTX

Elizabeth7 Apr 2025 19:00 UTC

35 points

3 comments1 min readLW link

(acesounderglass.com)

deleted

funnyfranco7 Apr 2025 18:56 UTC

−24 points

11 comments1 min readLW link

American College Admissions Doesn’t Need to Be So Competitive

Arjun Panickssery7 Apr 2025 17:35 UTC

48 points

20 comments6 min readLW link

(arjunpanickssery.substack.com)

Coupling for Decouplers

Jacob Falkovich7 Apr 2025 15:40 UTC

16 points

3 comments8 min readLW link

Moonlight Reflected

Jacob Falkovich7 Apr 2025 15:35 UTC

11 points

0 comments9 min readLW link

Navigation by Moonlight

Jacob Falkovich7 Apr 2025 15:32 UTC

24 points

39 comments8 min readLW link

You Are Not a Thought Experiment

Jacob Falkovich7 Apr 2025 15:27 UTC

5 points

0 comments9 min readLW link

Love is Love, Science is Fake

Jacob Falkovich7 Apr 2025 15:19 UTC

17 points

2 comments10 min readLW link

Coupling for Decouplers — Intro

Jacob Falkovich7 Apr 2025 15:12 UTC

9 points

0 comments1 min readLW link

The world according to ChatGPT

Richard_Kennaway7 Apr 2025 13:44 UTC

11 points

0 comments2 min readLW link

AI 2027: Dwarkesh’s Podcast with Daniel Kokotajlo and Scott Alexander

Zvi7 Apr 2025 13:40 UTC

67 points

2 comments26 min readLW link

(thezvi.wordpress.com)

Arguing all sides with ChatGPT 4.5

Richard_Kennaway7 Apr 2025 13:10 UTC

6 points

0 comments8 min readLW link

The Same Heaven

Lukas Petersson7 Apr 2025 12:57 UTC

7 points

1 comment5 min readLW link

(lukaspetersson.com)

TAMing The Alignment Problem

JasonB7 Apr 2025 8:47 UTC

11 points

2 comments11 min readLW link

Well-foundedness as an organizing principle of healthy minds and societies

Richard_Ngo7 Apr 2025 0:31 UTC

35 points

7 comments6 min readLW link

(www.mindthefuture.info)

Arusha Perpetual Chicken—an unlikely iterated game

James Stephen Brown6 Apr 2025 22:56 UTC

15 points

1 comment5 min readLW link

(nonzerosum.games)

How Gay is the Vatican?

rba6 Apr 2025 21:27 UTC

63 points

34 comments7 min readLW link

Australia’s AI Crossroads: Election 2025 Town Hall

Peter Horniak6 Apr 2025 21:17 UTC

1 point

0 comments1 min readLW link

The Lizardman and the Black Hat Bobcat

Screwtape6 Apr 2025 19:02 UTC

109 points

15 comments9 min readLW link

Would this solve the (outer) alignment problem, or at least help?

Wes R6 Apr 2025 18:49 UTC

−2 points

1 comment13 min readLW link

[Question] What are the fundamental differences between teaching the AIs and humans?

StanislavKrym6 Apr 2025 18:17 UTC

3 points

0 comments1 min readLW link

An “Optimistic” 2027 Timeline

Yitz6 Apr 2025 16:39 UTC

13 points

16 comments9 min readLW link

Thoughts on Creating a Good Language

Towards_Keeperhood6 Apr 2025 15:57 UTC

1 point

2 comments7 min readLW link

The REPHRASE Circuit: How Fine-Tuning Enhances LLMs to REPHRASE Text

Karthik Viswanathan6 Apr 2025 15:02 UTC

4 points

0 comments5 min readLW link

[Research sprint] Single-model crosscoder feature ablation and steering

Thomas Read6 Apr 2025 14:42 UTC

10 points

0 comments12 min readLW link

Ferrer, Pilar, and Me

Askwho6 Apr 2025 11:22 UTC

21 points

1 comment4 min readLW link

(open.substack.com)

FlexChunk: Enabling 100M×100M Out-of-Core SpMV (~1.8 min, ~1.7 GB RAM) with Near-Linear Scaling

Daniil Strizhov6 Apr 2025 5:27 UTC

1 point

0 comments7 min readLW link

A collection of approaches to confronting doom, and my thoughts on them

Ruby6 Apr 2025 2:11 UTC

48 points

18 comments12 min readLW link

A Slow Guide to Confronting Doom

Ruby6 Apr 2025 2:10 UTC

86 points

20 comments14 min readLW link

[Linkpost] Visual roadmap to strong human germline engineering

TsviBT5 Apr 2025 22:22 UTC

30 points

0 comments1 min readLW link

Google DeepMind: An Approach to Technical AGI Safety and Security

Rohin Shah5 Apr 2025 22:00 UTC

73 points

12 comments18 min readLW link

(arxiv.org)

Introduction to Representing Sentences as Logical Statements

Towards_Keeperhood5 Apr 2025 20:35 UTC

33 points

10 comments16 min readLW link