All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All JanFebMar Apr May Jun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 232425 26 27 28

Using fiction to imagine a pathway to friendlyAGI

Rick Moss23 Feb 2026 23:48 UTC

3 points

0 comments2 min readLW link

When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift

Max Fomin23 Feb 2026 23:44 UTC

1 point

2 comments6 min readLW link

The persona selection model

Sam Marks23 Feb 2026 22:56 UTC

176 points

53 comments43 min readLW link

(alignment.anthropic.com)

Agenda Reflection: Testing Automated Alignment

Ariel_23 Feb 2026 21:53 UTC

11 points

0 comments2 min readLW link

(zenodo.org)

Claude Sonnet 4.6 Gives You Flexibility

Zvi23 Feb 2026 20:30 UTC

29 points

1 comment9 min readLW link

(thezvi.wordpress.com)

Secrets of the LessWrong RSS Feed

Brendan Long23 Feb 2026 20:12 UTC

36 points

6 comments4 min readLW link

Which questions can’t we punt?

Lizka23 Feb 2026 19:17 UTC

39 points

2 comments15 min readLW link

Exponential GDP growth from linear growth in variety of goods

Will_Howard23 Feb 2026 18:50 UTC

4 points

2 comments5 min readLW link

(open.substack.com)

Pre-training data poisoning likely makes installing secret loyalties easier

Joe Kwon23 Feb 2026 18:12 UTC

12 points

0 comments4 min readLW link

The 2028 Global Intelligence Crisis—a finance-oriented vignette

Rasool23 Feb 2026 17:12 UTC

50 points

13 comments1 min readLW link

(www.citriniresearch.com)

AI Impact Summit 2026 : A Field Report

Aditya and bhishma

23 Feb 2026 16:58 UTC

38 points

1 comment9 min readLW link

The map of the map is not the map

jimmy23 Feb 2026 16:54 UTC

18 points

3 comments9 min readLW link

Fact-checking an AI optimist article in The Economist

ToSummarise23 Feb 2026 13:56 UTC

41 points

3 comments4 min readLW link

(www.tosummarise.com)

Review: “We can’t disagree forever”

Martin Randall23 Feb 2026 13:17 UTC

15 points

0 comments3 min readLW link

Why I Think Pause is Impossible

E.G. Blee-Goldman23 Feb 2026 11:58 UTC

1 point

4 comments6 min readLW link

Can Aha Moments be Fake? Identifying True and Decorative Thinking Steps in CoT

Jiachen Zhao23 Feb 2026 11:51 UTC

24 points

0 comments10 min readLW link

(arxiv.org)

A World Without Violet: Peculiar Consequences of Granting Moral Status to Artificial Intelligences

Sever Topan23 Feb 2026 7:23 UTC

17 points

8 comments4 min readLW link

(severtopan.substack.com)

Was It Owl a Dream?

Yovel Rom23 Feb 2026 5:07 UTC

17 points

4 comments4 min readLW link

(yovelrom.substack.com)

Innate Immunity

joec23 Feb 2026 5:00 UTC

23 points

2 comments6 min readLW link

Why I Transitioned: A Third (FtM) Perspective

Character#273623 Feb 2026 4:39 UTC

22 points

6 comments14 min readLW link

The power of a simple 3-way truth scale

Bruce Lewis23 Feb 2026 2:41 UTC

4 points

2 comments2 min readLW link

Storing Food

jefftk23 Feb 2026 1:40 UTC

77 points

9 comments2 min readLW link

(www.jefftk.com)

Old SUNY Dorm Logic is not helping rural population collapse in NY.

Edd Schneider23 Feb 2026 1:28 UTC

9 points

4 comments3 min readLW link

Changing the world for the worse

mingyuan22 Feb 2026 23:55 UTC

129 points

17 comments3 min readLW link

(mingyuan.substack.com)

The Scalable Formal Oversight Research Program

Max von Hippel22 Feb 2026 22:40 UTC

34 points

4 comments9 min readLW link

Adapters as Representational Hypotheses: What Adapter Methods Tell Us About Transformer Geometry

wassname22 Feb 2026 22:12 UTC

18 points

0 comments5 min readLW link

A Dialectic on Classical Utilitarianism

James Brobin22 Feb 2026 19:32 UTC

1 point

1 comment2 min readLW link

My RSS Reader is Done

Brendan Long22 Feb 2026 19:06 UTC

36 points

2 comments1 min readLW link

(www.brendanlong.com)

What to Do About AGI

Gordon Seidoh Worley22 Feb 2026 19:00 UTC

18 points

1 comment2 min readLW link

Mapping LLM attractor states

Adam Bricknell22 Feb 2026 18:10 UTC

18 points

8 comments3 min readLW link

InsanityBench: Cryptic Puzzles as a Probe for Lateral Thinking

RobinHa22 Feb 2026 14:20 UTC

48 points

1 comment4 min readLW link

(www.robinhaselhorst.com)

The world won’t end, but we should be ashamed for trying

George3d622 Feb 2026 13:01 UTC

−20 points

0 comments12 min readLW link

(cerebralab.com)

First Forecasting Dojo Group Meetup

Vojtech Brynych22 Feb 2026 7:19 UTC

3 points

2 comments1 min readLW link

Life’s paradox and AI’s accentuation of it

geyab4661722 Feb 2026 4:50 UTC

−1 points

0 comments3 min readLW link

Multiple Independent Semantic Axes in Gemma 3 270M

CharlesL22 Feb 2026 1:55 UTC

15 points

2 comments3 min readLW link

A Taxonomy of Traces

aleph_four22 Feb 2026 1:28 UTC

0 points

0 comments10 min readLW link

Hierarchical Goal Induction With Ethics

aleph_four22 Feb 2026 0:53 UTC

3 points

0 comments4 min readLW link

Did Claude 3 Opus align itself via gradient hacking?

Fiora Starlight21 Feb 2026 22:24 UTC

391 points

49 comments20 min readLW link

If you don’t feel deeply confused about AGI risk, something’s wrong

Dave Banerjee21 Feb 2026 15:34 UTC

95 points

18 comments5 min readLW link

(open.substack.com)

Ponzi schemes as a demonstration of out-of-distribution generalization

TFD21 Feb 2026 13:19 UTC

9 points

2 comments6 min readLW link

(www.thefloatingdroid.com)

LLMs and Literature: Where Value Actually Comes From

derelict543221 Feb 2026 13:16 UTC

13 points

13 comments4 min readLW link

The Spectre haunting the “AI Safety” Community

Gabriel Alfour21 Feb 2026 11:14 UTC

233 points

28 comments6 min readLW link

(cognition.cafe)

LessWrong’s goals overlap HowTruthful’s

Bruce Lewis21 Feb 2026 4:19 UTC

7 points

4 comments2 min readLW link

Alignment to Evil

Matrice Jacobine21 Feb 2026 3:29 UTC

61 points

12 comments1 min readLW link

(tetraspace.substack.com)

Reporting Tasks as Reward-Hackable: Better Than Inoculation Prompting?

RogerDearnaley21 Feb 2026 1:59 UTC

40 points

4 comments5 min readLW link

Robert Sapolsky Is Simply Not Talking About Compatibilism

Julius21 Feb 2026 1:27 UTC

26 points

4 comments8 min readLW link

(thegreymatter.substack.com)

TT Self Study Journal # 7

TristanTrim21 Feb 2026 1:22 UTC

13 points

2 comments4 min readLW link

How will we do SFT on models with opaque reasoning?

Alek Westover, Vivek Hebbar and egan

21 Feb 2026 0:00 UTC

32 points

17 comments7 min readLW link

Agent-first context menus

Surya Kasturi20 Feb 2026 23:45 UTC

3 points

1 comment2 min readLW link

Human perception of relational knowledge on graphical interfaces

Surya Kasturi20 Feb 2026 23:45 UTC

3 points

1 comment1 min readLW link