All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All JanFebMar Apr May Jun Jul Aug Sep Oct

All 1 2 3 4 5 6 7 8 9 10 11 12 131415 16 17 18 19 20 21 22 23 24 25 26 27 28

Notes on the Presidential Election of 1836

Arjun Panickssery13 Feb 2025 23:40 UTC

23 points

0 comments7 min readLW link

(arjunpanickssery.substack.com)

Static Place AI Makes Agentic AI Redundant: Multiversal AI Alignment & Rational Utopia

ank13 Feb 2025 22:35 UTC

1 point

2 comments11 min readLW link

I’m making a ttrpg about life in an intentional community during the last year before the Singularity

bgaesop13 Feb 2025 21:54 UTC

11 points

2 comments2 min readLW link

SWE Automation Is Coming: Consider Selling Your Crypto

A_donor13 Feb 2025 20:17 UTC

12 points

8 comments1 min readLW link

≤10-year Timelines Remain Unlikely Despite DeepSeek and o3

Rafael Harth13 Feb 2025 19:21 UTC

52 points

67 comments15 min readLW link

System 2 Alignment

Seth Herd13 Feb 2025 19:17 UTC

35 points

0 comments22 min readLW link

Murder plots are infohazards

Chris Monteiro13 Feb 2025 19:15 UTC

311 points

44 comments2 min readLW link

Sparse Autoencoder Feature Ablation for Unlearning

aludert13 Feb 2025 19:13 UTC

3 points

0 comments11 min readLW link

What is it to solve the alignment problem?

Joe Carlsmith13 Feb 2025 18:42 UTC

31 points

6 comments19 min readLW link

(joecarlsmith.substack.com)

Self-dialogue: Do behaviorist rewards make scheming AGIs?

Steven Byrnes13 Feb 2025 18:39 UTC

43 points

1 comment46 min readLW link

How do we solve the alignment problem?

Joe Carlsmith13 Feb 2025 18:27 UTC

63 points

9 comments7 min readLW link

(joecarlsmith.substack.com)

Ambiguous out-of-distribution generalization on an algorithmic task

Wilson Wu and Louis Jaburi

13 Feb 2025 18:24 UTC

83 points

6 comments11 min readLW link

Teaching AI to reason: this year’s most important story

Benjamin_Todd13 Feb 2025 17:40 UTC

10 points

0 comments10 min readLW link

(benjamintodd.substack.com)

AI #103: Show Me the Money

Zvi13 Feb 2025 15:20 UTC

30 points

9 comments58 min readLW link

(thezvi.wordpress.com)

OpenAI’s NSFW policy: user safety, harm reduction, and AI consent

8e913 Feb 2025 13:59 UTC

4 points

3 comments2 min readLW link

Studies of Human Error Rate

tin48213 Feb 2025 13:43 UTC

15 points

3 comments1 min readLW link

the dumbest theory of everything

lostinwilliamsburg13 Feb 2025 7:57 UTC

−1 points

0 comments7 min readLW link

Skepticism towards claims about the views of powerful institutions

tlevin13 Feb 2025 7:40 UTC

46 points

2 comments4 min readLW link

Virtue signaling, and the “humans-are-wonderful” bias, as a trust exercise

lc13 Feb 2025 6:59 UTC

44 points

16 comments4 min readLW link

My model of what is going on with LLMs

Cole Wyeth13 Feb 2025 3:43 UTC

110 points

49 comments7 min readLW link

Not all capabilities will be created equal: focus on strategically superhuman agents

benwr13 Feb 2025 1:24 UTC

62 points

9 comments3 min readLW link

LLMs can teach themselves to better predict the future

Ben Turtel13 Feb 2025 1:01 UTC

0 points

1 comment1 min readLW link

(arxiv.org)

Dovetail’s agent foundations fellowship talks & discussion

Alex_Altair13 Feb 2025 0:49 UTC

10 points

0 comments1 min readLW link

Extended analogy between humans, corporations, and AIs.

Daniel Kokotajlo13 Feb 2025 0:03 UTC

36 points

2 comments6 min readLW link

Moral Hazard in Democratic Voting

lsusr12 Feb 2025 23:17 UTC

20 points

8 comments1 min readLW link

MATS Spring 2024 Extension Retrospective

HenningB, Matthew Wearden, Cameron Holmes and Ryan Kidd

12 Feb 2025 22:43 UTC

26 points

1 comment15 min readLW link

Hunting for AI Hackers: LLM Agent Honeypot

Reworr R and jacobhaimes

12 Feb 2025 20:29 UTC

35 points

0 comments5 min readLW link

(www.apartresearch.com)

Probability of AI-Caused Disaster

Alvin Ånestrand12 Feb 2025 19:40 UTC

2 points

2 comments10 min readLW link

(forecastingaifutures.substack.com)

Two flaws in the Machiavelli Benchmark

TheManxLoiner12 Feb 2025 19:34 UTC

24 points

0 comments3 min readLW link

Gradient Anatomy’s—Hallucination Robustness in Medical Q&A

DieSab12 Feb 2025 19:16 UTC

2 points

0 comments10 min readLW link

Are current LLMs safe for psychotherapy?

CanYouFeelTheBenefits12 Feb 2025 19:16 UTC

5 points

4 comments1 min readLW link

Comparing the effectiveness of top-down and bottom-up activation steering for bypassing refusal on harmful prompts

Ana Kapros12 Feb 2025 19:12 UTC

7 points

0 comments5 min readLW link

The Paris AI Anti-Safety Summit

Zvi12 Feb 2025 14:00 UTC

129 points

21 comments21 min readLW link

(thezvi.wordpress.com)

Inside the dark forests of the internet

Itay Dreyfus12 Feb 2025 10:20 UTC

10 points

0 comments6 min readLW link

(productidentity.co)

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Matrice Jacobine12 Feb 2025 9:15 UTC

53 points

49 comments1 min readLW link

(www.emergent-values.ai)

Why you maybe should lift weights, and How to.

samusasuke12 Feb 2025 5:15 UTC

33 points

30 comments9 min readLW link

[Question] how do the CEOs respond to our concerns?

KvmanThinking11 Feb 2025 23:39 UTC

−10 points

7 comments1 min readLW link

Where Would Good Forecasts Most Help AI Governance Efforts?

Violet Hour11 Feb 2025 18:15 UTC

11 points

1 comment6 min readLW link

AI Safety at the Frontier: Paper Highlights, January ’25

gasteigerjo11 Feb 2025 16:14 UTC

7 points

0 comments8 min readLW link

(aisafetyfrontier.substack.com)

If Neuroscientists Succeed

Mordechai Rorvig11 Feb 2025 15:33 UTC

9 points

6 comments18 min readLW link

The News is Never Neglected

lsusr11 Feb 2025 14:59 UTC

113 points

18 comments1 min readLW link

Rethinking AI Safety Approach in the Era of Open-Source AI

Weibing Wang11 Feb 2025 14:01 UTC

4 points

0 comments6 min readLW link

What About The Horses?

Maxwell Tabarrok11 Feb 2025 13:59 UTC

15 points

17 comments7 min readLW link

(www.maximum-progress.com)

On Deliberative Alignment

Zvi11 Feb 2025 13:00 UTC

53 points

1 comment6 min readLW link

(thezvi.wordpress.com)

Detecting AI Agent Failure Modes in Simulations

Michael Soareverix11 Feb 2025 11:10 UTC

17 points

0 comments8 min readLW link

World Citizen Assembly about AI—Announcement

Camille Berger 11 Feb 2025 10:51 UTC

26 points

1 comment5 min readLW link

Visual Reference for Frontier Large Language Models

kenakofer11 Feb 2025 5:14 UTC

14 points

0 comments1 min readLW link

(kenan.schaefkofer.com)

Rational Effective Utopia & Narrow Way There: Math-Proven Safe Static Multiversal mAX-Intelligence (AXI), Multiversal Alignment, New Ethicophysics… (Aug 11)

ank11 Feb 2025 3:21 UTC

13 points

8 comments38 min readLW link

Arguing for the Truth? An Inference-Only Study into AI Debate

denisemester11 Feb 2025 3:04 UTC

7 points

0 comments16 min readLW link

Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?

garrison11 Feb 2025 0:20 UTC

208 points

8 comments6 min readLW link

(garrisonlovely.substack.com)