All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 123 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

AI-202X: a game between humans and AGIs aligned to different futures?

StanislavKrym1 Jul 2025 23:37 UTC

5 points

0 comments16 min readLW link

Aether July 2025 Update

RohanS, Rauno Arike and Shubhorup Biswas

1 Jul 2025 21:08 UTC

26 points

7 comments3 min readLW link

AI Moratorium Stripped From BBB

Zvi1 Jul 2025 18:50 UTC

70 points

4 comments6 min readLW link

(thezvi.wordpress.com)

Manipulating Self-Preference In LLMs

Matthew Nguyen, Jou Barzdukas, Matthew Bozoukov and Hongyu Fu

1 Jul 2025 18:03 UTC

13 points

0 comments7 min readLW link

A Simple Explanation of AGI Risk

TurnTrout1 Jul 2025 16:18 UTC

58 points

4 comments5 min readLW link

(turntrout.com)

Authors Have a Responsibility to Communicate Clearly

TurnTrout1 Jul 2025 15:41 UTC

127 points

29 comments6 min readLW link

(turntrout.com)

Road to AnimalHarmBench

Arturs and Constance Li

1 Jul 2025 13:38 UTC

−1 points

0 comments1 min readLW link

(forum.effectivealtruism.org)

Embedded Altruism [slides]

owencb1 Jul 2025 13:02 UTC

22 points

3 comments1 min readLW link

Senate Strikes Potential AI Moratorium

T_W1 Jul 2025 11:49 UTC

16 points

0 comments1 min readLW link

(www.reuters.com)

[Question] Can AIs be shown their messages aren’t tampered with?

mruwnik1 Jul 2025 9:39 UTC

4 points

10 comments1 min readLW link

SLT for AI Safety

Jesse Hoogland1 Jul 2025 4:52 UTC

78 points

0 comments3 min readLW link

Problematic Professors

Eggs1 Jul 2025 2:54 UTC

16 points

5 comments2 min readLW link

I can’t tell if my ideas are good anymore because I talked to robots too much

Tyson30 Jun 2025 21:21 UTC

13 points

10 comments1 min readLW link

Q1 AI Benchmark Results: Pro Forecasters Crush Bots

Ben Wilson30 Jun 2025 21:12 UTC

14 points

0 comments22 min readLW link

(www.metaculus.com)

ACX Meetup Cape Town

tegan30 Jun 2025 21:11 UTC

1 point

0 comments1 min readLW link

The best simple argument for Pausing AI?

Gary Marcus30 Jun 2025 20:38 UTC

155 points

23 comments1 min readLW link

Hiring* an AI** Artist for LessWrong/Lightcone

Raemon30 Jun 2025 19:01 UTC

30 points

8 comments1 min readLW link

SAE on activation differences

Santiago Aranguri, jacob_drori and Neel Nanda

30 Jun 2025 17:50 UTC

45 points

3 comments5 min readLW link

The Spectrum of Attention: From Empathy to Hypnosis

jimmy30 Jun 2025 17:42 UTC

14 points

2 comments14 min readLW link

Substack and Other Blog Recommendations

Zvi30 Jun 2025 17:20 UTC

30 points

7 comments16 min readLW link

(thezvi.wordpress.com)

What We Learned Trying to Diff Base and Chat Models (And Why It Matters)

Clément Dumas, Julian Minder and Neel Nanda

30 Jun 2025 17:17 UTC

106 points

2 comments7 min readLW link

Don’t Eat Honey

Bentham's Bulldog30 Jun 2025 15:57 UTC

−15 points

70 comments6 min readLW link

Primary-budget voting registration

eg30 Jun 2025 15:39 UTC

1 point

4 comments2 min readLW link

Project Vend: Can Claude run a small shop?

Gunnar_Zarncke30 Jun 2025 15:22 UTC

53 points

8 comments1 min readLW link

(www.anthropic.com)

If you want to be vegan but you worry about health effects of no meat, consider being vegan except for mussels/oysters

KatWoods30 Jun 2025 13:28 UTC

80 points

15 comments1 min readLW link

How dangerous is encoded reasoning?

Artem Karpov30 Jun 2025 11:54 UTC

17 points

0 comments10 min readLW link

Circuits in Superposition 2: Now with Less Wrong Math

Linda Linsefors and Lucius Bushnaq

30 Jun 2025 10:25 UTC

73 points

0 comments22 min readLW link

life lessons from poker

thiccythot30 Jun 2025 4:20 UTC

63 points

14 comments4 min readLW link

From Diamond Mining to Open-World Survival: Alignment and Emergent Behavior in Minecraft Agents

Sunishchal Dev, muggleschoolbus, Melvin Huang, hassandawy, veda_duddu, vasusharma55 and Kevin Zhu

30 Jun 2025 3:17 UTC

15 points

0 comments16 min readLW link

When Machines Do Our Jobs, Will We Remember How to Live?

Ahmed Elsayyad30 Jun 2025 3:03 UTC

4 points

1 comment3 min readLW link

Paradigms for computation

Cole Wyeth30 Jun 2025 0:37 UTC

67 points

10 comments12 min readLW link

The Internet Is Like a City (But Not in the Way You’d Think)

antonomon29 Jun 2025 22:25 UTC

20 points

0 comments8 min readLW link

(novum.substack.com)

Scientific Discovery in the Age of Artificial Intelligence

Jessica Rumbelow29 Jun 2025 20:45 UTC

42 points

3 comments10 min readLW link

An Alternative Way to Forecast AGI: Counting Down Capabilities

shash4229 Jun 2025 19:52 UTC

3 points

0 comments3 min readLW link

(open.substack.com)

Is Optimal Reflection Competitive with Extinction Risk Reduction? - Requesting Reviewers

Jordan Arel29 Jun 2025 18:42 UTC

7 points

0 comments11 min readLW link

Let’s look at another “LLMs lack true understanding” paper

Expertium29 Jun 2025 14:00 UTC

3 points

0 comments4 min readLW link

I underestimated safety research speedups from safe AI

Dan Braun29 Jun 2025 13:29 UTC

38 points

2 comments3 min readLW link

Inflight Auctions

jefftk29 Jun 2025 12:10 UTC

12 points

1 comment2 min readLW link

(www.jefftk.com)

Do Self-Perceived Superintelligent LLMs Exhibit Misalignment?

Dave Banerjee29 Jun 2025 11:06 UTC

29 points

3 comments12 min readLW link

(davebanerjee.xyz)

Conciseness Manifesto

Vasyl Dotsenko29 Jun 2025 5:33 UTC

35 points

5 comments1 min readLW link

Feedback wanted: Shortlist of AI safety ideas

mmKALLL29 Jun 2025 4:28 UTC

8 points

3 comments5 min readLW link

Build Your Exoskeleton

mrmoxon29 Jun 2025 1:54 UTC

1 point

0 comments9 min readLW link

Why Reasoning Isn’t Enough: How LLM Agents Struggle with Ethics and Cooperation

Zhijing Jin, David Guzman Piedrahita, Yongjin Yang and Steffen Backmann

28 Jun 2025 20:43 UTC

6 points

0 comments4 min readLW link

Support for bedrock liberal principles seems to be in pretty bad shape these days

Max H28 Jun 2025 20:37 UTC

32 points

52 comments4 min readLW link

A Depressed Shrink Tries Shrooms

AlphaAndOmega28 Jun 2025 17:16 UTC

45 points

11 comments1 min readLW link

(open.substack.com)

Time Machine as Existential Risk

avturchin28 Jun 2025 15:17 UTC

15 points

7 comments45 min readLW link

The next wave of model improvements will be due to data quality

ChristianKl28 Jun 2025 14:34 UTC

17 points

4 comments1 min readLW link

AXRP Episode 44 - Peter Salib on AI Rights for Human Safety

DanielFilan28 Jun 2025 1:40 UTC

12 points

0 comments103 min readLW link

Prediction Markets Have an Anthropic Bias to Deal With

ar-sht28 Jun 2025 1:16 UTC

7 points

1 comment11 min readLW link

Emergent Misalignment & Realignment

LizaT, JasperTimm, KevinWei and David Quarel

27 Jun 2025 21:31 UTC

46 points

1 comment17 min readLW link