All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 3031

An ML paper on data stealing provides a construction for “gradient hacking”

David Scott Krueger (formerly: capybaralet)Jul 30, 2024, 9:44 PM

21 points

1 comment1 min readLW link

(arxiv.org)

Open Source Automated Interpretability for Sparse Autoencoder Features

kh4dien, SrGonao, jacob_drori and Nora Belrose

Jul 30, 2024, 9:11 PM

67 points

1 comment13 min readLW link

(blog.eleuther.ai)

Caterpillars and Philosophy

Zero ContradictionsJul 30, 2024, 8:54 PM

2 points

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

François Chollet on the limitations of LLMs in reasoning

2PuNCheeZJul 30, 2024, 8:04 PM

1 point

1 comment2 min readLW link

(x.com)

Against AI As An Existential Risk

Noah BirnbaumJul 30, 2024, 7:10 PM

6 points

13 comments1 min readLW link

(irrationalitycommunity.substack.com)

[Question] Is objective morality self-defeating?

dialecticaJul 30, 2024, 6:23 PM

−4 points

3 comments2 min readLW link

Limitations on the Interpretability of Learned Features from Sparse Dictionary Learning

Tom AngstenJul 30, 2024, 4:36 PM

6 points

0 comments9 min readLW link

Self-Other Overlap: A Neglected Approach to AI Alignment

Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena, Cameron Berg and AE Studio

Jul 30, 2024, 4:22 PM

223 points

51 comments12 min readLW link

Investigating the Ability of LLMs to Recognize Their Own Writing

Christopher Ackerman and Nina Panickssery

Jul 30, 2024, 3:41 PM

32 points

0 comments15 min readLW link

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?

scasperJul 30, 2024, 2:57 PM

25 points

0 comments4 min readLW link

RTFB: California’s AB 3211

ZviJul 30, 2024, 1:10 PM

62 points

2 comments11 min readLW link

(thezvi.wordpress.com)

If You Can Climb Up, You Can Climb Down

jefftkJul 30, 2024, 12:00 AM

34 points

9 comments1 min readLW link

(www.jefftk.com)

What is Morality?

Zero ContradictionsJul 29, 2024, 7:19 PM

−1 points

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

Arch-anarchism and immortality

Peter lawless Jul 29, 2024, 6:10 PM

−5 points

1 comment2 min readLW link

AI Safety Newsletter #39: Implications of a Trump Administration for AI Policy Plus, Safety Engineering

Corin Katzke, Alexa Pan, Julius and Dan H

Jul 29, 2024, 5:50 PM

17 points

1 comment6 min readLW link

(newsletter.safe.ai)

New Blog Post Against AI Doom

Noah BirnbaumJul 29, 2024, 5:21 PM

1 point

5 comments1 min readLW link

(substack.com)

An Interpretability Illusion from Population Statistics in Causal Analysis

Daniel TanJul 29, 2024, 2:50 PM

9 points

3 comments1 min readLW link

[Question] How tokenization influences prompting?

Boris KashirinJul 29, 2024, 10:28 AM

9 points

4 comments1 min readLW link

Understanding Positional Features in Layer 0 SAEs

bilalchughtai and Yeu-Tong Lau

Jul 29, 2024, 9:36 AM

43 points

0 comments5 min readLW link

Prediction Markets Explained

Benjamin_SturiskyJul 29, 2024, 8:02 AM

8 points

0 comments9 min readLW link

Relativity Theory for What the Future ‘You’ Is and Isn’t

FlorianHJul 29, 2024, 2:01 AM

7 points

49 comments4 min readLW link

Wittgenstein and Word2vec: Capturing Relational Meaning in Language and Thought

cleanwhiteroomJul 28, 2024, 7:55 PM

2 points

2 comments2 min readLW link

Making Beliefs Pay Rent

Screwtape and NoSignalNoNoise

Jul 28, 2024, 5:59 PM

7 points

2 comments1 min readLW link

This is already your second chance

MalmesburyJul 28, 2024, 5:13 PM

185 points

13 comments8 min readLW link

[Question] Has Eliezer publicly and satisfactorily responded to attempted rebuttals of the analogy to evolution?

kalerJul 28, 2024, 12:23 PM

10 points

14 comments1 min readLW link

Family and Society

Zero ContradictionsJul 28, 2024, 7:05 AM

1 point

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

[Question] What is AI Safety’s line of retreat?

RemmeltJul 28, 2024, 5:43 AM

12 points

12 comments LW link

AXRP Episode 34 - AI Evaluations with Beth Barnes

DanielFilanJul 28, 2024, 3:30 AM

23 points

0 comments69 min readLW link

Rats, Back a Candidate

BlakeJul 28, 2024, 3:19 AM

−40 points

19 comments1 min readLW link

AI existential risk probabilities are too unreliable to inform policy

Oleg TrottJul 28, 2024, 12:59 AM

18 points

5 comments1 min readLW link

(www.aisnakeoil.com)

Idle Speculations on Pipeline Parallelism

DaemonicSigilJul 27, 2024, 10:40 PM

1 point

0 comments4 min readLW link

(pbement.com)

Re: Anthropic’s suggested SB-1047 amendments

RobertMJul 27, 2024, 10:32 PM

87 points

13 comments9 min readLW link

(www.documentcloud.org)

The problem with psychology is that it has no theory.

Nicholas D.Jul 27, 2024, 7:36 PM

2 points

7 comments4 min readLW link

(nicholasdecker.substack.com)

Bryan Johnson and a search for healthy longevity

NancyLebovitzJul 27, 2024, 3:28 PM

18 points

17 comments1 min readLW link

What are matching markets?

ohmurphyJul 27, 2024, 3:05 PM

12 points

0 comments8 min readLW link

(ohmurphy.substack.com)

Safety consultations for AI lab employees

Zach Stein-PerlmanJul 27, 2024, 3:00 PM

181 points

4 comments1 min readLW link

The Case Against UBI

Zero ContradictionsJul 27, 2024, 6:36 AM

−1 points

2 comments2 min readLW link

(thewaywardaxolotl.blogspot.com)

Unlocking Solutions—By Understanding Coordination Problems

James Stephen BrownJul 27, 2024, 4:52 AM

56 points

4 comments5 min readLW link

(nonzerosum.games)

Utilitarianism and the replaceability of desires and attachments

MichaelStJulesJul 27, 2024, 1:57 AM

5 points

2 comments LW link

Inspired by: Failures in Kindness

X4vierJul 27, 2024, 1:21 AM

60 points

2 comments3 min readLW link

My Experience Using Gamification

Wyatt SJul 26, 2024, 11:06 PM

13 points

4 comments4 min readLW link

How the AI safety technical landscape has changed in the last year, according to some practitioners

tlevinJul 26, 2024, 7:06 PM

55 points

6 comments2 min readLW link

A Visual Task that’s Hard for GPT-4o, but Doable for Primary Schoolers

Lennart FinkeJul 26, 2024, 5:51 PM

25 points

6 comments2 min readLW link

Unaligned AI is coming regardless.

verbalshadowJul 26, 2024, 4:41 PM

−15 points

3 comments2 min readLW link

Index of rationalist groups in the Bay Area July 2024

Lucie Philippon, Czynski and Screwtape

Jul 26, 2024, 4:32 PM

39 points

14 comments2 min readLW link

End Single Family Zoning by Overturning Euclid V Ambler

Maxwell TabarrokJul 26, 2024, 2:08 PM

32 points

1 comment7 min readLW link

(www.maximum-progress.com)

Common Uses of “Acceptance”

Yi-YangJul 26, 2024, 11:18 AM

14 points

5 comments24 min readLW link

Universal Basic Income and Poverty

Eliezer YudkowskyJul 26, 2024, 7:23 AM

328 points

141 comments9 min readLW link

A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication

johnswentworth and David Lorell

Jul 26, 2024, 12:33 AM

93 points

2 comments13 min readLW link

What does a Gambler’s Verity world look like?

ErioirEJul 25, 2024, 10:03 PM

7 points

6 comments1 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer