All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 121314 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Moral Hazard in Democratic Voting

lsusr12 Feb 2025 23:17 UTC

21 points

11 comments1 min readLW link

MATS Spring 2024 Extension Retrospective

HenningB, Matthew Wearden, Cameron Holmes and Ryan Kidd

12 Feb 2025 22:43 UTC

27 points

1 comment15 min readLW link

Hunting for AI Hackers: LLM Agent Honeypot

Reworr R and jacobhaimes

12 Feb 2025 20:29 UTC

35 points

1 comment5 min readLW link

(www.apartresearch.com)

Probability of AI-Caused Disaster

Alvin Ånestrand12 Feb 2025 19:40 UTC

2 points

2 comments10 min readLW link

(forecastingaifutures.substack.com)

Two flaws in the Machiavelli Benchmark

TheManxLoiner12 Feb 2025 19:34 UTC

25 points

0 comments3 min readLW link

Gradient Anatomy’s—Hallucination Robustness in Medical Q&A

DieSab12 Feb 2025 19:16 UTC

3 points

0 comments10 min readLW link

Are current LLMs safe for psychotherapy?

CanYouFeelTheBenefits12 Feb 2025 19:16 UTC

5 points

4 comments1 min readLW link

Comparing the effectiveness of top-down and bottom-up activation steering for bypassing refusal on harmful prompts

Ana Kapros12 Feb 2025 19:12 UTC

7 points

0 comments5 min readLW link

The Paris AI Anti-Safety Summit

Zvi12 Feb 2025 14:00 UTC

129 points

22 comments21 min readLW link

(thezvi.wordpress.com)

Inside the dark forests of the internet

Itay Dreyfus12 Feb 2025 10:20 UTC

10 points

0 comments6 min readLW link

(productidentity.co)

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Matrice Jacobine12 Feb 2025 9:15 UTC

51 points

51 comments1 min readLW link

(www.emergent-values.ai)

Why you maybe should lift weights, and How to.

samusasuke12 Feb 2025 5:15 UTC

37 points

30 comments9 min readLW link

[Question] how do the CEOs respond to our concerns?

KvmanThinking11 Feb 2025 23:39 UTC

−10 points

7 comments1 min readLW link

Where Would Good Forecasts Most Help AI Governance Efforts?

Violet Hour11 Feb 2025 18:15 UTC

11 points

1 comment6 min readLW link

AI Safety at the Frontier: Paper Highlights, January ’25

gasteigerjo11 Feb 2025 16:14 UTC

7 points

0 comments8 min readLW link

(aisafetyfrontier.substack.com)

If Neuroscientists Succeed

Mordechai Rorvig11 Feb 2025 15:33 UTC

9 points

6 comments18 min readLW link

The News is Never Neglected

lsusr11 Feb 2025 14:59 UTC

113 points

18 comments1 min readLW link

Rethinking AI Safety Approach in the Era of Open-Source AI

Weibing Wang11 Feb 2025 14:01 UTC

4 points

0 comments6 min readLW link

What About The Horses?

Maxwell Tabarrok11 Feb 2025 13:59 UTC

16 points

17 comments7 min readLW link

(www.maximum-progress.com)

On Deliberative Alignment

Zvi11 Feb 2025 13:00 UTC

51 points

2 comments6 min readLW link

(thezvi.wordpress.com)

Detecting AI Agent Failure Modes in Simulations

Michael Soareverix11 Feb 2025 11:10 UTC

17 points

0 comments8 min readLW link

World Citizen Assembly about AI—Announcement

Camille B. 11 Feb 2025 10:51 UTC

26 points

1 comment5 min readLW link

Visual Reference for Frontier Large Language Models

kenakofer11 Feb 2025 5:14 UTC

14 points

0 comments1 min readLW link

(kenan.schaefkofer.com)

Effective Utopia & Startup Way There: Ethophysics, Ethicality Equation That Makes Super-Intelligence Want to Bring the Best Futures For All & Static mAX-Intelligence (AXI)

ank11 Feb 2025 3:21 UTC

13 points

8 comments40 min readLW link

Arguing for the Truth? An Inference-Only Study into AI Debate

denisemester11 Feb 2025 3:04 UTC

7 points

0 comments16 min readLW link

Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?

garrison11 Feb 2025 0:20 UTC

208 points

8 comments6 min readLW link

(garrisonlovely.substack.com)

Positive Directions

G Wood11 Feb 2025 0:00 UTC

0 points

0 comments4 min readLW link

Logical Correlation

niplav10 Feb 2025 23:29 UTC

24 points

7 comments10 min readLW link

Proof idea: SLT to AIT

Lucius Bushnaq10 Feb 2025 23:14 UTC

42 points

15 comments6 min readLW link

LW/ACX social meetup

Stefan10 Feb 2025 21:12 UTC

2 points

0 comments1 min readLW link

A Bearish Take on AI, as a Treat

rats10 Feb 2025 19:22 UTC

11 points

0 comments4 min readLW link

(open.substack.com)

Beyond ELO: Rethinking Chess Skill as a Multidimensional Random Variable

Oliver Oswald10 Feb 2025 19:19 UTC

6 points

8 comments2 min readLW link

Claude is More Anxious than GPT; Personality is an axis of interpretability in language models

future_detective10 Feb 2025 19:19 UTC

2 points

2 comments8 min readLW link

(dhealy.substack.com)

Notes on Occam via Solomonoff vs. hierarchical Bayes

JesseClifton10 Feb 2025 17:55 UTC

29 points

7 comments4 min readLW link

Sleeping Beauty: an Accuracy-based Approach

glauberdebona10 Feb 2025 15:40 UTC

7 points

2 comments7 min readLW link

Political Idolatry

Arturo Macias10 Feb 2025 15:26 UTC

−8 points

7 comments2 min readLW link

ML4Good Colombia—Applications Open to LatAm Participants

Alejandro Acelas and Manuela García

10 Feb 2025 15:03 UTC

5 points

0 comments1 min readLW link

Nonpartisan AI safety

Yair Halberstadt10 Feb 2025 14:55 UTC

30 points

4 comments2 min readLW link

Opinion Article Scoring System

ciaran 10 Feb 2025 14:32 UTC

1 point

0 comments5 min readLW link

Levels of Friction

Zvi10 Feb 2025 13:10 UTC

162 points

9 comments12 min readLW link

(thezvi.wordpress.com)

Baumol effect vs Jevons paradox

Hzn10 Feb 2025 8:28 UTC

0 points

0 comments1 min readLW link

(hzn33.neocities.org)

[Question] A Simulation of Automation economics?

qbolec10 Feb 2025 8:11 UTC

10 points

1 comment1 min readLW link

[Question] Should I Divest from AI?

Oliver Kuperman10 Feb 2025 3:29 UTC

6 points

4 comments1 min readLW link

OpenAI lied about SFT vs. RLHF

sanxiyn10 Feb 2025 3:24 UTC

10 points

2 comments1 min readLW link

(x.com)

“Self-Blackmail” and Alternatives

jessicata9 Feb 2025 23:20 UTC

20 points

12 comments7 min readLW link

(unstableontology.com)

Altman blog on post-AGI world

Julian Bradshaw9 Feb 2025 21:52 UTC

29 points

10 comments1 min readLW link

(blog.samaltman.com)

Forecasting newsletter #2/2025: Forecasting meetup network

NunoSempere9 Feb 2025 18:07 UTC

13 points

0 comments4 min readLW link

(forecasting.substack.com)

How identical twin sisters feel about nieces vs their own daughters

Dave92F19 Feb 2025 17:36 UTC

4 points

19 comments1 min readLW link

Two hemispheres—I do not think it means what you think it means

Viliam9 Feb 2025 15:33 UTC

117 points

22 comments14 min readLW link

The Structure of Professional Revolutions

SebastianG 9 Feb 2025 13:23 UTC

8 points

0 comments4 min readLW link