All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

What sorts of systems can be deceptive?

Andrei Alexandru31 Oct 2022 22:00 UTC

17 points

0 comments7 min readLW link

“Cars and Elephants”: a handwavy argument/analogy against mechanistic interpretability

David Scott Krueger (formerly: capybaralet)31 Oct 2022 21:26 UTC

51 points

25 comments2 min readLW link

Superintelligent AI is necessary for an amazing future, but far from sufficient

So8res31 Oct 2022 21:16 UTC

134 points

48 comments34 min readLW link

Sanity-checking in an age of hyperbole

Ciprian Elliu Ivanof31 Oct 2022 20:04 UTC

2 points

4 comments2 min readLW link

Why Aren’t There More Schelling Holidays?

johnswentworth31 Oct 2022 19:31 UTC

63 points

21 comments1 min readLW link

The circular problem of epistemic irresponsibility

Roman Leventov31 Oct 2022 17:23 UTC

5 points

2 comments8 min readLW link

AI as a Civilizational Risk Part 3/6: Anti-economy and Signal Pollution

PashaKamyshev31 Oct 2022 17:03 UTC

7 points

4 comments14 min readLW link

Average utilitarianism is non-local

Yair Halberstadt31 Oct 2022 16:36 UTC

29 points

13 comments1 min readLW link

Marvel Snap: Phase 1

Zvi31 Oct 2022 15:20 UTC

23 points

1 comment14 min readLW link

(thezvi.wordpress.com)

Boundaries vs Frames

Scott Garrabrant31 Oct 2022 15:14 UTC

58 points

10 comments7 min readLW link

Embedding safety in ML development

zeshen31 Oct 2022 12:27 UTC

24 points

1 comment18 min readLW link

[Book] Interpretable Machine Learning: A Guide for Making Black Box Models Explainable

Esben Kran31 Oct 2022 11:38 UTC

20 points

1 comment1 min readLW link

(christophm.github.io)

My (naive) take on Risks from Learned Optimization

artkpv31 Oct 2022 10:59 UTC

7 points

0 comments5 min readLW link

Tactical Nuclear Weapons Aren’t Cost-Effective Compared to Precision Artillery

Lao Mein31 Oct 2022 4:33 UTC

28 points

7 comments3 min readLW link

Gandalf or Saruman? A Soldier in Scout’s Clothing

DirectedEvolution31 Oct 2022 2:40 UTC

42 points

1 comment4 min readLW link

Me (Steve Byrnes) on the “Brain Inspired” podcast

Steven Byrnes30 Oct 2022 19:15 UTC

26 points

1 comment1 min readLW link

(braininspired.co)

“Normal” is the equilibrium state of past optimization processes

Alex_Altair30 Oct 2022 19:03 UTC

96 points

5 comments5 min readLW link

AI as a Civilizational Risk Part 2/6: Behavioral Modification

PashaKamyshev30 Oct 2022 16:57 UTC

9 points

0 comments10 min readLW link

Instrumental ignoring AI, Dumb but not useless.

Donald Hobson30 Oct 2022 16:55 UTC

7 points

6 comments2 min readLW link

Weekly Roundup #3

Zvi30 Oct 2022 12:20 UTC

23 points

5 comments15 min readLW link

(thezvi.wordpress.com)

Quickly refactoring the U.S. Constitution

lc30 Oct 2022 7:17 UTC

8 points

25 comments4 min readLW link

«Boundaries», Part 3a: Defining boundaries as directed Markov blankets

Andrew_Critch30 Oct 2022 6:31 UTC

90 points

20 comments15 min readLW link

Am I secretly excited for AI getting weird?

porby29 Oct 2022 22:16 UTC

116 points

4 comments4 min readLW link

AI as a Civilizational Risk Part 1/6: Historical Priors

PashaKamyshev29 Oct 2022 21:59 UTC

2 points

2 comments7 min readLW link

Don’t expect your life partner to be better than your exes in more than one way: a mathematical model

mdd29 Oct 2022 18:47 UTC

7 points

1 comment9 min readLW link

The Social Recession: By the Numbers

antonomon29 Oct 2022 18:45 UTC

165 points

29 comments8 min readLW link

(novum.substack.com)

Electric Kettle vs Stove

jefftk29 Oct 2022 12:50 UTC

18 points

7 comments1 min readLW link

(www.jefftk.com)

Quantum Immortality, foiled

Ben29 Oct 2022 11:00 UTC

29 points

4 comments2 min readLW link

Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

RowanWang, Alexandre Variengien, Arthur Conmy, Buck and jsteinhardt

28 Oct 2022 23:55 UTC

101 points

9 comments9 min readLW link 2 reviews

(arxiv.org)

Resources that (I think) new alignment researchers should know about

Orpheus1628 Oct 2022 22:13 UTC

70 points

9 comments4 min readLW link

How often does One Person succeed?

Mayank Modi28 Oct 2022 19:32 UTC

1 point

3 comments3 min readLW link

aisafety.community—A living document of AI safety communities

zeshen and plex

28 Oct 2022 17:50 UTC

58 points

23 comments1 min readLW link

Rapid Test Throat Swabbing?

jefftk28 Oct 2022 16:30 UTC

18 points

2 comments1 min readLW link

(www.jefftk.com)

Join the interpretability research hackathon

Esben Kran28 Oct 2022 16:26 UTC

15 points

0 comments5 min readLW link

Syncretism

Annapurna28 Oct 2022 16:08 UTC

16 points

4 comments1 min readLW link

(jorgevelez.substack.com)

Pondering computation in the real world

Adam Shai28 Oct 2022 15:57 UTC

24 points

13 comments5 min readLW link

Ukraine and the Crimea Question

ChristianKl28 Oct 2022 12:26 UTC

−2 points

152 comments11 min readLW link

New book on s-risks

Tobias_Baumann28 Oct 2022 9:36 UTC

70 points

1 comment1 min readLW link

Cryptic symbols

Adam Scherlis28 Oct 2022 6:44 UTC

6 points

17 comments1 min readLW link

(adam.scherlis.com)

All life’s helpers’ beliefs

Tehdastehdas28 Oct 2022 5:47 UTC

−12 points

1 comment5 min readLW link

Prizes for ML Safety Benchmark Ideas

joshc28 Oct 2022 2:51 UTC

36 points

5 comments1 min readLW link

Worldview iPeople—Future Fund’s AI Worldview Prize

Toni MUENDEL28 Oct 2022 1:53 UTC

−21 points

4 comments9 min readLW link

Anatomy of change

Jose Miguel Cruz y Celis28 Oct 2022 1:21 UTC

1 point

0 comments1 min readLW link

Nash equilibria of symmetric zero-sum games

Ege Erdil27 Oct 2022 23:50 UTC

14 points

0 comments14 min readLW link

[Question] Good psychology books/books that contain good psychological models?

shuffled-cantaloupe27 Oct 2022 23:04 UTC

1 point

1 comment1 min readLW link

Podcast: The Left and Effective Altruism with Habiba Islam

garrison27 Oct 2022 17:41 UTC

2 points

2 comments1 min readLW link

Lessons from ‘Famine, Affluence, and Morality’ and its reflection on today.

Mayank Modi27 Oct 2022 17:20 UTC

4 points

0 comments4 min readLW link

[Question] Is the Orthogonality Thesis true for humans?

Noosphere8927 Oct 2022 14:41 UTC

12 points

20 comments1 min readLW link

Historicism in the math-adjacent sciences

mrcbarbier27 Oct 2022 14:38 UTC

3 points

0 comments5 min readLW link

How Risky Is Trick-or-Treating?

jefftk27 Oct 2022 14:10 UTC

58 points

18 comments2 min readLW link

(www.jefftk.com)