All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Recommendation: reports on the search for missing hiker Bill Ewasko

eukaryoteJul 31, 2024, 10:15 PM

168 points

86 votes

Overall karma indicates overall quality.

28 comments14 min readLW link

(eukaryotewritesblog.com)

Economics101 predicted the failure of special card payments for refugees, 3 months later whole of Germany wants to adopt it

Yanling GuoJul 31, 2024, 9:09 PM

3 points

6 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

Ambiguity in Prediction Market Resolution is Still Harmful

aphyerJul 31, 2024, 8:32 PM

43 points

24 votes

Overall karma indicates overall quality.

17 comments3 min readLW link

AI labs can boost external safety research

Zach Stein-PerlmanJul 31, 2024, 7:30 PM

31 points

19 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Women in AI Safety London Meetup

njgJul 31, 2024, 6:13 PM

1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

Constructing Neural Network Parameters with Downstream Trainability

ch271828nJul 31, 2024, 6:13 PM

1 point

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

(github.com)

Want to work on US emerging tech policy? Consider the Horizon Fellowship.

ElikaJul 31, 2024, 6:12 PM

4 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] What are your cruxes for imprecise probabilities / decision rules?

Anthony DiGiovanniJul 31, 2024, 3:42 PM

36 points

14 votes

Overall karma indicates overall quality.

33 comments1 min readLW link

The new UK government’s stance on AI safety

Elliot MckernonJul 31, 2024, 3:23 PM

17 points

9 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Cat Sustenance Fortification

jefftkJul 31, 2024, 2:30 AM

14 points

6 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

(www.jefftk.com)

Twitter thread on open-source AI

Richard_NgoJul 31, 2024, 12:26 AM

33 points

16 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

(x.com)

Twitter thread on AI takeover scenarios

Richard_NgoJul 31, 2024, 12:24 AM

37 points

15 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

(x.com)

Twitter thread on AI safety evals

Richard_NgoJul 31, 2024, 12:18 AM

63 points

25 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

(x.com)

Twitter thread on politics of AI safety

Richard_NgoJul 31, 2024, 12:00 AM

35 points

17 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

(x.com)

An ML paper on data stealing provides a construction for “gradient hacking”

David Scott Krueger (formerly: capybaralet)Jul 30, 2024, 9:44 PM

21 points

10 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

(arxiv.org)

Open Source Automated Interpretability for Sparse Autoencoder Features

kh4dien, SrGonao, jacob_drori and Nora Belrose

Jul 30, 2024, 9:11 PM

67 points

31 votes

Overall karma indicates overall quality.

1 comment13 min readLW link

(blog.eleuther.ai)

Caterpillars and Philosophy

Zero ContradictionsJul 30, 2024, 8:54 PM

2 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

François Chollet on the limitations of LLMs in reasoning

2PuNCheeZJul 30, 2024, 8:04 PM

1 point

8 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

(x.com)

Against AI As An Existential Risk

Noah BirnbaumJul 30, 2024, 7:10 PM

9 points

12 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

(irrationalitycommunity.substack.com)

[Question] Is objective morality self-defeating?

dialecticaJul 30, 2024, 6:23 PM

−4 points

7 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

Limitations on the Interpretability of Learned Features from Sparse Dictionary Learning

Tom AngstenJul 30, 2024, 4:36 PM

6 points

4 votes

Overall karma indicates overall quality.

0 comments9 min readLW link

Self-Other Overlap: A Neglected Approach to AI Alignment

Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena, Cameron Berg and Trent Hodgeson

Jul 30, 2024, 4:22 PM

226 points

120 votes

Overall karma indicates overall quality.

51 comments12 min readLW link

Investigating the Ability of LLMs to Recognize Their Own Writing

Christopher Ackerman and Nina Panickssery

Jul 30, 2024, 3:41 PM

32 points

9 votes

Overall karma indicates overall quality.

0 comments15 min readLW link

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?

scasperJul 30, 2024, 2:57 PM

25 points

12 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

RTFB: California’s AB 3211

ZviJul 30, 2024, 1:10 PM

62 points

25 votes

Overall karma indicates overall quality.

2 comments11 min readLW link

(thezvi.wordpress.com)

If You Can Climb Up, You Can Climb Down

jefftkJul 30, 2024, 12:00 AM

34 points

21 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

(www.jefftk.com)

What is Morality?

Zero ContradictionsJul 29, 2024, 7:19 PM

−1 points

6 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

Arch-anarchism and immortality

Peter lawless Jul 29, 2024, 6:10 PM

−5 points

5 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

AI Safety Newsletter #39: Implications of a Trump Administration for AI Policy Plus, Safety Engineering

Corin Katzke, Alexa Pan, Julius and Dan H

Jul 29, 2024, 5:50 PM

17 points

7 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

(newsletter.safe.ai)

New Blog Post Against AI Doom

Noah BirnbaumJul 29, 2024, 5:21 PM

2 points

5 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

(substack.com)

An Interpretability Illusion from Population Statistics in Causal Analysis

Daniel TanJul 29, 2024, 2:50 PM

9 points

6 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

[Question] How tokenization influences prompting?

Boris KashirinJul 29, 2024, 10:28 AM

9 points

5 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Understanding Positional Features in Layer 0 SAEs

bilalchughtai and Yeu-Tong Lau

Jul 29, 2024, 9:36 AM

43 points

22 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Prediction Markets Explained

Benjamin_SturiskyJul 29, 2024, 8:02 AM

8 points

3 votes

Overall karma indicates overall quality.

0 comments9 min readLW link

Relativity Theory for What the Future ‘You’ Is and Isn’t

FlorianHJul 29, 2024, 2:01 AM

7 points

14 votes

Overall karma indicates overall quality.

50 comments4 min readLW link

Wittgenstein and Word2vec: Capturing Relational Meaning in Language and Thought

cleanwhiteroomJul 28, 2024, 7:55 PM

2 points

4 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Making Beliefs Pay Rent

Screwtape and NoSignalNoNoise

Jul 28, 2024, 5:59 PM

7 points

1 vote

Overall karma indicates overall quality.

2 comments1 min readLW link

This is already your second chance

MalmesburyJul 28, 2024, 5:13 PM

194 points

137 votes

Overall karma indicates overall quality.

13 comments8 min readLW link

[Question] Has Eliezer publicly and satisfactorily responded to attempted rebuttals of the analogy to evolution?

kalerJul 28, 2024, 12:23 PM

10 points

16 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

Family and Society

Zero ContradictionsJul 28, 2024, 7:05 AM

1 point

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

[Question] What is AI Safety’s line of retreat?

RemmeltJul 28, 2024, 5:43 AM

12 points

12 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

AXRP Episode 34 - AI Evaluations with Beth Barnes

DanielFilanJul 28, 2024, 3:30 AM

23 points

10 votes

Overall karma indicates overall quality.

0 comments69 min readLW link

Rats, Back a Candidate

BlakeJul 28, 2024, 3:19 AM

−36 points

30 votes

Overall karma indicates overall quality.

19 comments1 min readLW link

AI existential risk probabilities are too unreliable to inform policy

Oleg TrottJul 28, 2024, 12:59 AM

18 points

10 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

(www.aisnakeoil.com)

Idle Speculations on Pipeline Parallelism

DaemonicSigilJul 27, 2024, 10:40 PM

1 point

5 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

(pbement.com)

Re: Anthropic’s suggested SB-1047 amendments

RobertMJul 27, 2024, 10:32 PM

87 points

35 votes

Overall karma indicates overall quality.

13 comments9 min readLW link

(www.documentcloud.org)

The problem with psychology is that it has no theory.

Nicholas D.Jul 27, 2024, 7:36 PM

2 points

10 votes

Overall karma indicates overall quality.

7 comments4 min readLW link

(nicholasdecker.substack.com)

Bryan Johnson and a search for healthy longevity

NancyLebovitzJul 27, 2024, 3:28 PM

18 points

5 votes

Overall karma indicates overall quality.

17 comments1 min readLW link

What are matching markets?

ohmurphyJul 27, 2024, 3:05 PM

12 points

7 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

(ohmurphy.substack.com)

Safety consultations for AI lab employees

Zach Stein-PerlmanJul 27, 2024, 3:00 PM

183 points

55 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer