All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All 1 2 3 456 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Towards Alignment Auditing as a Numbers-Go-Up Science

Sam Marks4 Aug 2025 22:30 UTC

127 points

15 comments6 min readLW link

It turns out that DNNs are remarkably interpretable.

Maciej Satkiewicz4 Aug 2025 22:18 UTC

12 points

8 comments1 min readLW link

(arxiv.org)

Dissolving moral philosophy: from pain to meta-ethics

Charbel-Raphaël4 Aug 2025 20:20 UTC

8 points

3 comments2 min readLW link

Navigating Security: Fighting flammability with fire (when safe)

jimmy4 Aug 2025 19:58 UTC

4 points

4 comments16 min readLW link

ACX Atlanta August Meetup

Steve French4 Aug 2025 19:52 UTC

2 points

0 comments1 min readLW link

Permanent Disempowerment is the Baseline

Vladimir_Nesov4 Aug 2025 17:43 UTC

93 points

23 comments6 min readLW link

Exploring entropy gradient propulsion via the Casimir Effect

bobdavis624 Aug 2025 15:48 UTC

8 points

14 comments1 min readLW link

If you can generate obfuscated chain-of-thought, can you monitor it?

Asa Cooper Stickland and Tomek Korbak

4 Aug 2025 15:46 UTC

36 points

6 comments11 min readLW link

On Altman’s Interview With Theo Von

Zvi4 Aug 2025 15:10 UTC

41 points

1 comment9 min readLW link

(thezvi.wordpress.com)

Should we aim for flourishing over mere survival? The Better Futures series.

wdmacaskill4 Aug 2025 14:28 UTC

65 points

8 comments5 min readLW link

Луна Лавгуд и Комната Тайн, Часть 8

Kongo Landwalker and lsusr

4 Aug 2025 10:28 UTC

2 points

0 comments2 min readLW link

Framework I made for general “productivity”

Mark Wang4 Aug 2025 8:40 UTC

4 points

2 comments1 min readLW link

🫵YOU🫵 get to help the AGI Safety Act in Congress! This is real!

Wes R4 Aug 2025 3:13 UTC

10 points

5 comments1 min readLW link

Saying Goodbye

sapphire3 Aug 2025 23:52 UTC

81 points

75 comments4 min readLW link

[Linkpost] Avatar’s Dirty Secret: Nature Is Just Fancy Infrastructure

AlphaAndOmega3 Aug 2025 19:37 UTC

15 points

2 comments1 min readLW link

(open.substack.com)

[Question] How to tolerate boredom?

tryhard10003 Aug 2025 17:16 UTC

7 points

3 comments1 min readLW link

Persona Vectors—Anthropic Paper

Stephen Martin3 Aug 2025 16:11 UTC

11 points

3 comments1 min readLW link

(www.anthropic.com)

Alcohol is so bad for society that you should probably stop drinking

KatWoods3 Aug 2025 15:31 UTC

39 points

29 comments8 min readLW link

Explosive growth from substitution: the case of the Industrial Revolution

ParrotRobot3 Aug 2025 7:52 UTC

9 points

1 comment5 min readLW link

Emotions Make Sense

DaystarEld3 Aug 2025 7:03 UTC

212 points

43 comments25 min readLW link

(daystareld.com)

Creative writing with LLMs, part 2: Co-writing techniques

Kaj_Sotala3 Aug 2025 6:44 UTC

8 points

4 comments18 min readLW link

The Ethics of Copying Conscious States and the Many-Worlds Interpretation of Quantum Mechanics

TobyC2 Aug 2025 22:48 UTC

15 points

6 comments27 min readLW link

Astronomical Waste & Conscientious Objection

Lydia Nottingham2 Aug 2025 22:37 UTC

8 points

1 comment2 min readLW link

The Inkhaven Residency

Ben Pace2 Aug 2025 18:51 UTC

137 points

39 comments3 min readLW link

[Question] Feedback request: `eval-crypt` a simple utility to mitigate eval contamination.

Matan Shtepel, Justin Olive and Daniel Polatajko

2 Aug 2025 17:04 UTC

9 points

4 comments2 min readLW link

The Observer Effect for belief measurement

Roman Malov2 Aug 2025 13:57 UTC

9 points

4 comments2 min readLW link

Many prediction markets would be better off as batched auctions

William Howard2 Aug 2025 12:04 UTC

177 points

21 comments5 min readLW link

(antidiluvian.substack.com)

2025 ACX Grants project pitches

duck_master2 Aug 2025 5:04 UTC

2 points

2 comments1 min readLW link

The deep history of intelligence

Dan MacKinlay2 Aug 2025 4:04 UTC

10 points

0 comments1 min readLW link

(danmackinlay.name)

How many species has humanity driven extinct?

Raemon2 Aug 2025 2:50 UTC

42 points

9 comments1 min readLW link

[Question] At what point do you abandon ship?

Gesild Muka2 Aug 2025 1:13 UTC

7 points

4 comments1 min readLW link

Three Quotes on Transformative Technology

Chris_Leong1 Aug 2025 22:57 UTC

8 points

3 comments1 min readLW link

SB-1047 Documentary: The Post-Mortem

Michaël Trazzi1 Aug 2025 21:42 UTC

130 points

0 comments5 min readLW link

Persona vectors: monitoring and controlling character traits in language models

RunjinChen and Andy Arditi

1 Aug 2025 21:19 UTC

26 points

3 comments5 min readLW link

(arxiv.org)

Boots theory and Wikipedia

philh1 Aug 2025 20:30 UTC

9 points

12 comments12 min readLW link

(reasonableapproximation.net)

Podcast: Lincoln Quirk from Wave

Elizabeth1 Aug 2025 19:00 UTC

41 points

1 comment1 min readLW link

(acesounderglass.com)

AI in a vat: Fundamental limits of efficient world modelling for safe agent sandboxing

Fernando Rosas1 Aug 2025 18:37 UTC

34 points

3 comments15 min readLW link

The Dark Arts As A Scaffolding Skill For Rationality

Screwtape1 Aug 2025 17:12 UTC

85 points

25 comments7 min readLW link

Steve Petersen seeking funding

abramdemski1 Aug 2025 17:03 UTC

87 points

0 comments1 min readLW link

The Week in AI Governance

Zvi1 Aug 2025 12:20 UTC

18 points

1 comment24 min readLW link

(thezvi.wordpress.com)

Research Areas in AI Control (The Alignment Project by UK AISI)

Julian Stastny, Tomek Korbak, Mojmir, Buck and Alan Cooney

1 Aug 2025 10:27 UTC

25 points

0 comments18 min readLW link