All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All JanFebMar Apr May Jun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 212223 24 25 26 27 28

Did Claude 3 Opus align itself via gradient hacking?

Fiora Starlight21 Feb 2026 22:24 UTC

391 points

49 comments20 min readLW link

If you don’t feel deeply confused about AGI risk, something’s wrong

Dave Banerjee21 Feb 2026 15:34 UTC

95 points

18 comments5 min readLW link

(open.substack.com)

Ponzi schemes as a demonstration of out-of-distribution generalization

TFD21 Feb 2026 13:19 UTC

9 points

2 comments6 min readLW link

(www.thefloatingdroid.com)

LLMs and Literature: Where Value Actually Comes From

derelict543221 Feb 2026 13:16 UTC

13 points

13 comments4 min readLW link

The Spectre haunting the “AI Safety” Community

Gabriel Alfour21 Feb 2026 11:14 UTC

233 points

28 comments6 min readLW link

(cognition.cafe)

LessWrong’s goals overlap HowTruthful’s

Bruce Lewis21 Feb 2026 4:19 UTC

7 points

4 comments2 min readLW link

Alignment to Evil

Matrice Jacobine21 Feb 2026 3:29 UTC

61 points

12 comments1 min readLW link

(tetraspace.substack.com)

Reporting Tasks as Reward-Hackable: Better Than Inoculation Prompting?

RogerDearnaley21 Feb 2026 1:59 UTC

40 points

4 comments5 min readLW link

Robert Sapolsky Is Simply Not Talking About Compatibilism

Julius21 Feb 2026 1:27 UTC

26 points

4 comments8 min readLW link

(thegreymatter.substack.com)

TT Self Study Journal # 7

TristanTrim21 Feb 2026 1:22 UTC

13 points

2 comments4 min readLW link

How will we do SFT on models with opaque reasoning?

Alek Westover, Vivek Hebbar and egan

21 Feb 2026 0:00 UTC

32 points

17 comments7 min readLW link

Agent-first context menus

Surya Kasturi20 Feb 2026 23:45 UTC

3 points

1 comment2 min readLW link

Human perception of relational knowledge on graphical interfaces

Surya Kasturi20 Feb 2026 23:45 UTC

3 points

1 comment1 min readLW link

Hodoscope: Visualization for Efficient Human Supervision

Ziqian Zhong and Shashwat Saxena

20 Feb 2026 23:41 UTC

9 points

0 comments2 min readLW link

(hodoscope.dev)

Carrot-Parsnip: A Social Deduction Game for LLM Evals

Bicuspid Valve20 Feb 2026 23:06 UTC

11 points

0 comments7 min readLW link

Can Current AI Match (or Outmatch) Professionals in Economically Valuable Tasks?

saahir.vazirani20 Feb 2026 21:38 UTC

6 points

0 comments5 min readLW link

METR’s 14h 50% Horizon Impacts The Economy More Than ASI Timelines

Michaël Trazzi20 Feb 2026 21:08 UTC

45 points

11 comments2 min readLW link

New video from Palisade Research: No One Understands Why AI Works

peterbarnett20 Feb 2026 20:29 UTC

62 points

2 comments1 min readLW link

(www.youtube.com)

Announcing: Iliad Intensive + Iliad Fellowship

David Udell and Alexander Gietelink Oldenziel

20 Feb 2026 20:13 UTC

82 points

15 comments1 min readLW link

ARENA 8.0 - Call for Applicants

JScriven, JamesH, David Quarel and CallumMcDougall

20 Feb 2026 18:28 UTC

31 points

1 comment6 min readLW link

Militaries are going autonomous. But will AI lead to new wars? A tour of recent research

Mordechai Rorvig20 Feb 2026 18:26 UTC

1 point

0 comments2 min readLW link

(www.foommagazine.org)

Unprecedented Catastrophes Have Non-Canonical Probabilities

E.G. Blee-Goldman20 Feb 2026 18:23 UTC

6 points

2 comments14 min readLW link

Mechanistic Interpretability of Biological Foundation Models

Ihor Kendiukhov20 Feb 2026 18:01 UTC

34 points

1 comment26 min readLW link

On Steven Byrnes’ ruthless ASI, (dis)analogies with humans and alignment proposals

StanislavKrym20 Feb 2026 15:32 UTC

9 points

2 comments2 min readLW link

Some Questions For Democrats About Epstein

Alexander Turok20 Feb 2026 15:24 UTC

−28 points

3 comments4 min readLW link

AGI is Here

Gordon Seidoh Worley20 Feb 2026 15:21 UTC

68 points

39 comments2 min readLW link

Mind the Gap

Bridgett Kay20 Feb 2026 14:35 UTC

6 points

0 comments5 min readLW link

(dxmrevealed.wordpress.com)

AI #156 Part 2: Errors in Rhetoric

Zvi20 Feb 2026 14:31 UTC

45 points

0 comments32 min readLW link

(thezvi.wordpress.com)

AI for societal decision making—How promising is the space? 80,000 Hours profile

Zershaaneh Qureshi20 Feb 2026 13:28 UTC

3 points

0 comments2 min readLW link

How To Escape Super Mario Bros

omegastick20 Feb 2026 11:54 UTC

70 points

8 comments9 min readLW link

(dumbideas.xyz)

Human Fine-Tuning

PranavG and Gabriel Alfour

20 Feb 2026 10:20 UTC

3 points

0 comments16 min readLW link

(cognition.cafe)

The Problem of Counterevidence and the Futility of Theodicy

Ape in the coat20 Feb 2026 7:36 UTC

2 points

6 comments4 min readLW link

(substack.com)

A Claude Skill To Comment On Docs

Tim Hua20 Feb 2026 2:28 UTC

26 points

1 comment2 min readLW link

Cooperationism: first draft for a moral framework that does not require consciousness

Épiphanie Gédéon19 Feb 2026 21:07 UTC

26 points

5 comments8 min readLW link

Flamingos (among other things) reduce emergent misalignment

eekay19 Feb 2026 19:17 UTC

13 points

3 comments7 min readLW link

Funkering!

flying buttress19 Feb 2026 18:14 UTC

13 points

0 comments1 min readLW link

Subjectivity vs Agency: AI “Waking Up”?

Jonathan Moregård19 Feb 2026 17:19 UTC

4 points

0 comments5 min readLW link

(honestliving.substack.com)

You May Already Be Canadian

jefftk19 Feb 2026 16:00 UTC

120 points

14 comments1 min readLW link

(www.jefftk.com)

AI Researchers and Executives Continue to Underestimate the Near-Future Risks of Open Models

Andrew Dickson19 Feb 2026 15:56 UTC

23 points

1 comment16 min readLW link

AI #156 Part 1: They Do Mean The Effect On Jobs

Zvi19 Feb 2026 14:20 UTC

53 points

7 comments36 min readLW link

(thezvi.wordpress.com)

Terminal Cynicism

PranavG and Gabriel Alfour

19 Feb 2026 13:51 UTC

24 points

25 comments10 min readLW link

(cognition.cafe)

How much information does an optimal policy contain about its environment?

Alfred Harwood, Alex_Altair and JoseFaustino

19 Feb 2026 13:05 UTC

30 points

0 comments10 min readLW link

All hands on deck to build the datacenter lie detector

Naci Cankaya19 Feb 2026 11:42 UTC

32 points

2 comments5 min readLW link

(open.substack.com)

A Technical Primer on Mechanistic Interpretability

Alexei G19 Feb 2026 7:42 UTC

1 point

0 comments11 min readLW link

(alexeigannon.com)

Power Laws Are Not Enough

CarolusRenniusVitellius19 Feb 2026 4:31 UTC

10 points

3 comments4 min readLW link

(charlesr-w.github.io)

Be skeptical of milestone announcements by young AI startups

lc19 Feb 2026 4:19 UTC

25 points

0 comments3 min readLW link

Opus 4.5 made a biodevice (w me)

Raye19 Feb 2026 2:31 UTC

23 points

0 comments10 min readLW link

Review of If Anyone Builds It, Everyone Dies

James Brobin19 Feb 2026 1:53 UTC

23 points

4 comments5 min readLW link

I want to actually get good at forecasting this year (Group Invite)

Vojtech Brynych19 Feb 2026 1:41 UTC

12 points

4 comments1 min readLW link

Does GPT-2 Represent Controversy? A Small Mech Interp Investigation

CharlesL19 Feb 2026 1:36 UTC

6 points

0 comments2 min readLW link