All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 171819 20 21 22 23 24 25 26 27 28 29 30 31

Looking for judges for critiques of Alignment Plans

Iknownothing17 Aug 2023 22:35 UTC

6 points

0 comments1 min readLW link

How is ChatGPT’s behavior changing over time?

worse17 Aug 2023 20:54 UTC

3 points

0 comments1 min readLW link

(arxiv.org)

Progress links digest, 2023-08-17: Cloud seeding, robotic sculptors, and rogue planets

jasoncrawford17 Aug 2023 20:29 UTC

15 points

1 comment4 min readLW link

(rootsofprogress.org)

Model of psychosis, take 2

Steven Byrnes17 Aug 2023 19:11 UTC

34 points

14 comments4 min readLW link

[Linkpost] Robustified ANNs Reveal Wormholes Between Human Category Percepts

Bogdan Ionut Cirstea17 Aug 2023 19:10 UTC

6 points

2 comments1 min readLW link

Against Almost Every Theory of Impact of Interpretability

Charbel-Raphaël17 Aug 2023 18:44 UTC

336 points

93 comments26 min readLW link 2 reviews

Goldilocks and the Three Optimisers

dkl917 Aug 2023 18:15 UTC

−10 points

0 comments5 min readLW link

(dkl9.net)

Announcing Foresight Institute’s AI Safety Grants Program

Allison Duettmann17 Aug 2023 17:34 UTC

35 points

2 comments1 min readLW link

The Negentropy Cliff

mephistopheles17 Aug 2023 17:08 UTC

6 points

10 comments1 min readLW link

“AI Wellbeing” and the Ongoing Debate on Phenomenal Consciousness

FlorianH17 Aug 2023 15:47 UTC

10 points

6 comments7 min readLW link

AI #25: Inflection Point

Zvi17 Aug 2023 14:40 UTC

59 points

9 comments36 min readLW link

(thezvi.wordpress.com)

[Question] Why might General Intelligences have long term goals?

yrimon17 Aug 2023 14:10 UTC

3 points

17 comments1 min readLW link

Understanding Counterbalanced Subtractions for Better Activation Additions

ojorgensen17 Aug 2023 13:53 UTC

21 points

0 comments14 min readLW link

Reflections on “Making the Atomic Bomb”

Boaz Barak17 Aug 2023 2:48 UTC

51 points

7 comments8 min readLW link

Autonomous replication and adaptation: an attempt at a concrete danger threshold

Hjalmar_Wijk17 Aug 2023 1:31 UTC

45 points

1 comment13 min readLW link

[Question] (Thought experiment) If you had to choose, which would you prefer?

kuira17 Aug 2023 0:57 UTC

9 points

2 comments1 min readLW link

Some rules for life (v.0,0)

Neil 17 Aug 2023 0:43 UTC

48 points

13 comments12 min readLW link

(neilwarren.substack.com)

When AI critique works even with misaligned models

Fabien Roger17 Aug 2023 0:12 UTC

23 points

0 comments2 min readLW link

Book Launch: “The Carving of Reality,” Best of LessWrong vol. III

Raemon16 Aug 2023 23:52 UTC

131 points

22 comments5 min readLW link

If we had known the atmosphere would ignite

Jeffs16 Aug 2023 20:28 UTC

59 points

64 comments2 min readLW link

Stampy’s AI Safety Info—New Distillations #4 [July 2023]

markov16 Aug 2023 19:03 UTC

22 points

10 comments1 min readLW link

(aisafety.info)

A Proof of Löb’s Theorem using Computability Theory

jessicata16 Aug 2023 18:57 UTC

79 points

0 comments17 min readLW link

(unstableontology.com)

Summary of and Thoughts on the Hotz/Yudkowsky Debate

Zvi16 Aug 2023 16:50 UTC

106 points

47 comments9 min readLW link

(thezvi.wordpress.com)

Red Pill vs Blue Pill, Bayes style

ErickBall16 Aug 2023 15:23 UTC

28 points

33 comments1 min readLW link

What does it mean to “trust science”?

jasoncrawford16 Aug 2023 14:56 UTC

34 points

9 comments1 min readLW link

(rootsofprogress.org)

Jason Crawford / The Roots of Progress in Bangalore, August 21 to September 8

jasoncrawford16 Aug 2023 13:36 UTC

13 points

1 comment1 min readLW link

(rootsofprogress.org)

Gaining knowledge at a price

DavidMadsen16 Aug 2023 10:21 UTC

−4 points

5 comments1 min readLW link

Understanding and visualizing sycophancy datasets

Nina Panickssery16 Aug 2023 5:34 UTC

47 points

0 comments6 min readLW link

George Hotz vs Eliezer Yudkowsky AI Safety Debate—link and brief discussion

Gerald Monroe16 Aug 2023 4:31 UTC

11 points

26 comments2 min readLW link

(www.youtube.com)

[Question] How to take advanage of the market’s irrationality regarding AGI?

GeneSmith16 Aug 2023 3:30 UTC

24 points

7 comments2 min readLW link

Infinite Ethics: Infinite Problems

Bentham's Bulldog16 Aug 2023 2:44 UTC

−2 points

25 comments23 min readLW link

Private Biostasis & Cryonics Social

Mati_Roy16 Aug 2023 2:34 UTC

11 points

0 comments1 min readLW link

Some thoughts on George Hotz vs Eliezer Yudkowsky

TristanTrim15 Aug 2023 23:33 UTC

11 points

3 comments2 min readLW link

Understanding the Information Flow inside Large Language Models

Felix Hofstätter and cozyfractal

15 Aug 2023 21:13 UTC

19 points

0 comments17 min readLW link

[Question] Any research in “probe-tuning” of LLMs?

Roman Leventov15 Aug 2023 21:01 UTC

20 points

3 comments1 min readLW link

Can AI Transform the Electorate into a Citizen’s Assembly

RoscoHunter15 Aug 2023 17:52 UTC

−3 points

5 comments3 min readLW link

Ten Thousand Years of Solitude

agp15 Aug 2023 17:45 UTC

137 points

19 comments4 min readLW link

(www.discovermagazine.com)

AISN #19: US-China Competition on AI Chips, Measuring Language Agent Developments, Economic Analysis of Language Model Propaganda, and White House AI Cyber Challenge

Dan H15 Aug 2023 16:10 UTC

21 points

0 comments5 min readLW link

(newsletter.safe.ai)

[Question] What is the most effective anti-tyranny charity?

lc15 Aug 2023 15:26 UTC

20 points

10 comments1 min readLW link

My checklist for publishing a blog post

Steven Byrnes15 Aug 2023 15:04 UTC

95 points

6 comments4 min readLW link

The Dunbar Playbook: A CRM system for your friends

Severin T. Seehrich15 Aug 2023 8:44 UTC

32 points

16 comments5 min readLW link

(amoretlicentia.substack.com)

Optical Illusions are Out of Distribution Errors

vitaliya15 Aug 2023 2:23 UTC

31 points

8 comments2 min readLW link

A short calculation about a Twitter poll

Ege Erdil14 Aug 2023 19:48 UTC

64 points

64 comments11 min readLW link

Decomposing independent generalizations in neural networks via Hessian analysis

Dmitry Vaintrob and Nina Panickssery

14 Aug 2023 17:04 UTC

87 points

4 comments1 min readLW link

Memetic Judo #2: Incorporal Switches and Levers Compendium

Max TK14 Aug 2023 16:53 UTC

19 points

6 comments17 min readLW link

Existentially relevant thought experiment: To kill or not to kill, a sniper, a man and a button.

AlexFromSafeTransition14 Aug 2023 10:53 UTC

−18 points

6 comments4 min readLW link

Stepping down as moderator on LW

Kaj_Sotala14 Aug 2023 10:46 UTC

83 points

1 comment1 min readLW link

Announcing Manifest 2023 (Sep 22-24 in Berkeley)

Saul Munn and Austin Chen

14 Aug 2023 5:13 UTC

31 points

0 comments2 min readLW link

Listen For What You Don’t Hear: The Case for Contrarianism

Yashvardhan Sharma14 Aug 2023 2:53 UTC

1 point

1 comment5 min readLW link

Recipe: Hessian eigenvector computation for PyTorch models

Nina Panickssery14 Aug 2023 2:48 UTC

32 points

5 comments5 min readLW link