18 Apr 2025 22:56 UTC

12 points

0 comments13 min readLW link

LLM-based Fact Checking for Popular Posts?

azergante18 Apr 2025 21:26 UTC

1 point

2 comments62 min readLW link

o3 Will Use Its Tools For You

Zvi18 Apr 2025 21:20 UTC

46 points

3 comments45 min readLW link

(thezvi.wordpress.com)

AI Control Methods Literature Review

Ram Potham18 Apr 2025 21:15 UTC

12 points

1 comment9 min readLW link

Consequentialists should have a comprehensive set of deontological beliefs they adhere to

Jay9518 Apr 2025 20:50 UTC

3 points

2 comments1 min readLW link

What Makes an AI Startup “Net Positive” for Safety?

jacquesthibs18 Apr 2025 20:33 UTC

82 points

23 comments2 min readLW link

Alignment Does Not Need to Be Opaque! An Introduction to Feature Steering with Reinforcement Learning

Jeremias Ferrao18 Apr 2025 19:34 UTC

10 points

0 comments10 min readLW link

Evaluating Collaborative AI Performance Subject to Sabotage

Matthew Khoriaty18 Apr 2025 19:33 UTC

3 points

0 comments19 min readLW link

Inside OpenAI’s Controversial Plan to Abandon its Nonprofit Roots

garrison18 Apr 2025 18:46 UTC

21 points

0 comments11 min readLW link

(garrisonlovely.substack.com)

Could LLMs Learn to Detect Bias Autonomously, Like Tesla’s Self-Driving Cars?

Omnipheasant18 Apr 2025 18:45 UTC

0 points

0 comments3 min readLW link

Scaffolding Skills

Screwtape18 Apr 2025 17:39 UTC

37 points

9 comments4 min readLW link

[Rockville] Rationalist Shabbat

maia18 Apr 2025 15:38 UTC

8 points

0 comments1 min readLW link

Handling schemers if shutdown is not an option

Buck18 Apr 2025 14:39 UTC

43 points

2 comments14 min readLW link

British and American Connotations

jefftk18 Apr 2025 13:00 UTC

14 points

4 comments1 min readLW link

(www.jefftk.com)

Towards Understanding the Representation of Belief State Geometry in Transformers

Karthik Viswanathan18 Apr 2025 12:39 UTC

6 points

0 comments12 min readLW link

Training AGI in Secret would be Unsafe and Unethical

Daniel Kokotajlo18 Apr 2025 12:27 UTC

149 points

16 comments6 min readLW link

Karma Tests in Logical Counterfactual Simulations motivates strong agents to protect weak agents

Knight Lee18 Apr 2025 11:11 UTC

9 points

8 comments3 min readLW link

What If Galaxies Are Alive and Atoms Have Minds? A Thought Experiment on Life Across Scales

Saif Khan18 Apr 2025 10:01 UTC

−2 points

5 comments3 min readLW link

Three Months In, Evaluating Three Rationalist Cases for Trump

Arjun Panickssery18 Apr 2025 8:27 UTC

118 points

33 comments4 min readLW link

[Question] Comprehensive up-to-date resources on the Chinese Communist Party’s AI strategy, etc?

Mateusz Bagiński18 Apr 2025 4:58 UTC

14 points

6 comments1 min readLW link

Conditional Forecasting as Model Parameterization

Molly18 Apr 2025 2:35 UTC

15 points

0 comments7 min readLW link

(cuttyshark.substack.com)

One Night in Delphi

Eggs18 Apr 2025 2:17 UTC

4 points

2 comments3 min readLW link

The Russell Conjugation Illuminator

TimmyM17 Apr 2025 19:33 UTC

51 points

14 comments1 min readLW link

(russellconjugations.com)

Announcing Progress Conference 2025

jasoncrawford17 Apr 2025 17:12 UTC

12 points

0 comments1 min readLW link

(newsletter.rootsofprogress.org)

The Mirror Paradox

Jeremy Kraybill17 Apr 2025 16:23 UTC

−6 points

0 comments1 min readLW link

Memory Decoding Journal Club

Devin Ward17 Apr 2025 16:19 UTC

1 point

0 comments1 min readLW link

Host Keys and SSHing to EC2

jefftk17 Apr 2025 15:10 UTC

10 points

6 comments1 min readLW link

(www.jefftk.com)

AI #112: Release the Everything

Zvi17 Apr 2025 15:10 UTC

41 points

6 comments40 min readLW link

(thezvi.wordpress.com)

On AI personhood

p.b.17 Apr 2025 12:31 UTC

4 points

7 comments1 min readLW link

Automating Mechanistic Interpretability via Program Synthesis

Edy Nastase17 Apr 2025 10:58 UTC

1 point

1 comment1 min readLW link

Understanding and overcoming AGI apathy

Dhruv Sumathi17 Apr 2025 1:04 UTC

25 points

1 comment13 min readLW link

(dhruvsumathi.substack.com)

ALLFED emergency appeal: Help us raise $800,000 to avoid cutting half of programs

denkenberger16 Apr 2025 21:47 UTC

49 points

9 comments3 min readLW link

Prodromes and Biomarkers in Chronic Disease

sarahconstantin16 Apr 2025 21:30 UTC

23 points

2 comments3 min readLW link

(sarahconstantin.substack.com)

The Practical Imperative for AI Control Research

Archana Vaidheeswaran16 Apr 2025 20:27 UTC

1 point

0 comments4 min readLW link

METR’s preliminary evaluation of o3 and o4-mini

Christopher King16 Apr 2025 20:23 UTC

14 points

7 comments1 min readLW link

(metr.github.io)

Mass Exposure Paradox

max-sixty16 Apr 2025 20:18 UTC

6 points

2 comments2 min readLW link

GPT-4.5 is Cognitive Empathy, Sonnet 3.5 is Affective Empathy

Jack16 Apr 2025 19:12 UTC

15 points

2 comments4 min readLW link

GPT-4.1 Is a Mini Upgrade

Zvi16 Apr 2025 19:00 UTC

31 points

6 comments8 min readLW link

(thezvi.wordpress.com)

Doing Prioritization Better

arvomm16 Apr 2025 18:46 UTC

3 points

1 comment19 min readLW link

(forum.effectivealtruism.org)

Kamelo: A Rule-Based Constructed Language for Universal, Logical Communication

Saif Khan16 Apr 2025 18:44 UTC

13 points

8 comments2 min readLW link

Understanding Trust: Overview Presentations

abramdemski16 Apr 2025 18:08 UTC

22 points

0 comments1 min readLW link

Understanding Trust—Overview Presentations

abramdemski16 Apr 2025 18:05 UTC

13 points

0 comments1 min readLW link

Telescoping

za3k16 Apr 2025 17:05 UTC

13 points

1 comment1 min readLW link

(blog.za3k.com)

Finance and AI Timelines

DAL16 Apr 2025 16:55 UTC

5 points

2 comments3 min readLW link

AI-enabled coups: a small group could use AI to seize power

Tom Davidson, Lukas Finnveden and rosehadshar

16 Apr 2025 16:51 UTC

138 points

23 comments7 min readLW link

Ctrl-Z: Controlling AI Agents via Resampling

Aryan Bhatt, Buck, Adam Kaufman and Tyler Tracy

16 Apr 2025 16:21 UTC

128 points

0 comments20 min readLW link

Gamify life from BayesianMind

Fire Brito de S, Gabriel16 Apr 2025 16:17 UTC

6 points

2 comments1 min readLW link

Top OpenAI Catastrophic Risk Official Steps Down Abruptly

garrison16 Apr 2025 16:04 UTC

14 points

0 comments5 min readLW link

(garrisonlovely.substack.com)

An artistic illustration of Scalable Oversight—“A world apart, neither gods nor mortals”

Marius Adrian Nicoară16 Apr 2025 12:41 UTC

1 point

0 comments1 min readLW link

Can LLM-based models do model-based planning?

Jennifer Lin16 Apr 2025 12:38 UTC

11 points

1 comment2 min readLW link

(docs.google.com)