All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All JanFebMar Apr May Jun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 222324 25 26 27 28

Changing the world for the worse

mingyuan22 Feb 2026 23:55 UTC

129 points

17 comments3 min readLW link

(mingyuan.substack.com)

The Scalable Formal Oversight Research Program

Max von Hippel22 Feb 2026 22:40 UTC

34 points

4 comments9 min readLW link

Adapters as Representational Hypotheses: What Adapter Methods Tell Us About Transformer Geometry

wassname22 Feb 2026 22:12 UTC

18 points

0 comments5 min readLW link

A Dialectic on Classical Utilitarianism

James Brobin22 Feb 2026 19:32 UTC

1 point

1 comment2 min readLW link

My RSS Reader is Done

Brendan Long22 Feb 2026 19:06 UTC

36 points

2 comments1 min readLW link

(www.brendanlong.com)

What to Do About AGI

Gordon Seidoh Worley22 Feb 2026 19:00 UTC

18 points

1 comment2 min readLW link

Mapping LLM attractor states

Adam Bricknell22 Feb 2026 18:10 UTC

18 points

8 comments3 min readLW link

InsanityBench: Cryptic Puzzles as a Probe for Lateral Thinking

RobinHa22 Feb 2026 14:20 UTC

48 points

1 comment4 min readLW link

(www.robinhaselhorst.com)

The world won’t end, but we should be ashamed for trying

George3d622 Feb 2026 13:01 UTC

−20 points

0 comments12 min readLW link

(cerebralab.com)

First Forecasting Dojo Group Meetup

Vojtech Brynych22 Feb 2026 7:19 UTC

3 points

2 comments1 min readLW link

Life’s paradox and AI’s accentuation of it

geyab4661722 Feb 2026 4:50 UTC

−1 points

0 comments3 min readLW link

Multiple Independent Semantic Axes in Gemma 3 270M

CharlesL22 Feb 2026 1:55 UTC

15 points

2 comments3 min readLW link

A Taxonomy of Traces

aleph_four22 Feb 2026 1:28 UTC

0 points

0 comments10 min readLW link

Hierarchical Goal Induction With Ethics

aleph_four22 Feb 2026 0:53 UTC

3 points

0 comments4 min readLW link

Did Claude 3 Opus align itself via gradient hacking?

Fiora Starlight21 Feb 2026 22:24 UTC

391 points

49 comments20 min readLW link

If you don’t feel deeply confused about AGI risk, something’s wrong

Dave Banerjee21 Feb 2026 15:34 UTC

95 points

18 comments5 min readLW link

(open.substack.com)

Ponzi schemes as a demonstration of out-of-distribution generalization

TFD21 Feb 2026 13:19 UTC

9 points

2 comments6 min readLW link

(www.thefloatingdroid.com)

LLMs and Literature: Where Value Actually Comes From

derelict543221 Feb 2026 13:16 UTC

13 points

13 comments4 min readLW link

The Spectre haunting the “AI Safety” Community

Gabriel Alfour21 Feb 2026 11:14 UTC

233 points

28 comments6 min readLW link

(cognition.cafe)

LessWrong’s goals overlap HowTruthful’s

Bruce Lewis21 Feb 2026 4:19 UTC

7 points

4 comments2 min readLW link

Alignment to Evil

Matrice Jacobine21 Feb 2026 3:29 UTC

61 points

12 comments1 min readLW link

(tetraspace.substack.com)

Reporting Tasks as Reward-Hackable: Better Than Inoculation Prompting?

RogerDearnaley21 Feb 2026 1:59 UTC

40 points

4 comments5 min readLW link

Robert Sapolsky Is Simply Not Talking About Compatibilism

Julius21 Feb 2026 1:27 UTC

26 points

4 comments8 min readLW link

(thegreymatter.substack.com)

TT Self Study Journal # 7

TristanTrim21 Feb 2026 1:22 UTC

13 points

2 comments4 min readLW link

How will we do SFT on models with opaque reasoning?

Alek Westover, Vivek Hebbar and egan

21 Feb 2026 0:00 UTC

32 points

17 comments7 min readLW link

Agent-first context menus

Surya Kasturi20 Feb 2026 23:45 UTC

3 points

1 comment2 min readLW link

Human perception of relational knowledge on graphical interfaces

Surya Kasturi20 Feb 2026 23:45 UTC

3 points

1 comment1 min readLW link

Hodoscope: Visualization for Efficient Human Supervision

Ziqian Zhong and Shashwat Saxena

20 Feb 2026 23:41 UTC

9 points

0 comments2 min readLW link

(hodoscope.dev)

Carrot-Parsnip: A Social Deduction Game for LLM Evals

Bicuspid Valve20 Feb 2026 23:06 UTC

11 points

0 comments7 min readLW link

Can Current AI Match (or Outmatch) Professionals in Economically Valuable Tasks?

saahir.vazirani20 Feb 2026 21:38 UTC

6 points

0 comments5 min readLW link

METR’s 14h 50% Horizon Impacts The Economy More Than ASI Timelines

Michaël Trazzi20 Feb 2026 21:08 UTC

45 points

11 comments2 min readLW link

New video from Palisade Research: No One Understands Why AI Works

peterbarnett20 Feb 2026 20:29 UTC

62 points

2 comments1 min readLW link

(www.youtube.com)

Announcing: Iliad Intensive + Iliad Fellowship

David Udell and Alexander Gietelink Oldenziel

20 Feb 2026 20:13 UTC

82 points

16 comments1 min readLW link

ARENA 8.0 - Call for Applicants

JScriven, JamesH, David Quarel and CallumMcDougall

20 Feb 2026 18:28 UTC

31 points

1 comment6 min readLW link

Militaries are going autonomous. But will AI lead to new wars? A tour of recent research

Mordechai Rorvig20 Feb 2026 18:26 UTC

1 point

0 comments2 min readLW link

(www.foommagazine.org)

Unprecedented Catastrophes Have Non-Canonical Probabilities

E.G. Blee-Goldman20 Feb 2026 18:23 UTC

6 points

2 comments14 min readLW link

Mechanistic Interpretability of Biological Foundation Models

Ihor Kendiukhov20 Feb 2026 18:01 UTC

34 points

1 comment26 min readLW link

On Steven Byrnes’ ruthless ASI, (dis)analogies with humans and alignment proposals

StanislavKrym20 Feb 2026 15:32 UTC

9 points

2 comments2 min readLW link

Some Questions For Democrats About Epstein

Alexander Turok20 Feb 2026 15:24 UTC

−28 points

3 comments4 min readLW link

AGI is Here

Gordon Seidoh Worley20 Feb 2026 15:21 UTC

68 points

39 comments2 min readLW link

Mind the Gap

Bridgett Kay20 Feb 2026 14:35 UTC

6 points

0 comments5 min readLW link

(dxmrevealed.wordpress.com)

AI #156 Part 2: Errors in Rhetoric

Zvi20 Feb 2026 14:31 UTC

45 points

0 comments32 min readLW link

(thezvi.wordpress.com)

AI for societal decision making—How promising is the space? 80,000 Hours profile

Zershaaneh Qureshi20 Feb 2026 13:28 UTC

3 points

0 comments2 min readLW link

How To Escape Super Mario Bros

omegastick20 Feb 2026 11:54 UTC

70 points

8 comments9 min readLW link

(dumbideas.xyz)

Human Fine-Tuning

PranavG and Gabriel Alfour

20 Feb 2026 10:20 UTC

3 points

0 comments16 min readLW link

(cognition.cafe)

The Problem of Counterevidence and the Futility of Theodicy

Ape in the coat20 Feb 2026 7:36 UTC

2 points

6 comments4 min readLW link

(substack.com)

A Claude Skill To Comment On Docs

Tim Hua20 Feb 2026 2:28 UTC

26 points

1 comment2 min readLW link

Cooperationism: first draft for a moral framework that does not require consciousness

Épiphanie Gédéon19 Feb 2026 21:07 UTC

26 points

5 comments8 min readLW link

Flamingos (among other things) reduce emergent misalignment

eekay19 Feb 2026 19:17 UTC

13 points

3 comments7 min readLW link

Funkering!

flying buttress19 Feb 2026 18:14 UTC

13 points

0 comments1 min readLW link