All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 272829 30 31

Is checking that a state of the world is not dystopian easier than constructing a non-dystopian state?

No77e27 Dec 2022 20:57 UTC

5 points

3 comments1 min readLW link

Crypto-currency as pro-alignment mechanism

False Name27 Dec 2022 17:45 UTC

−10 points

2 comments2 min readLW link

My Reservations about Discovering Latent Knowledge (Burns, Ye, et al)

Robert_AIZI27 Dec 2022 17:27 UTC

50 points

0 comments4 min readLW link

(aizi.substack.com)

Things that can kill you quickly: What everyone should know about first aid

jasoncrawford27 Dec 2022 16:23 UTC

167 points

21 comments2 min readLW link 1 review

(jasoncrawford.org)

[Question] Why The Focus on Expected Utility Maximisers?

DragonGod27 Dec 2022 15:49 UTC

118 points

84 comments3 min readLW link

Presumptive Listening: sticking to familiar concepts and missing the outer reasoning paths

Remmelt27 Dec 2022 15:40 UTC

−16 points

8 comments2 min readLW link

(mflb.com)

Mere exposure effect: Bias in Evaluating AGI X-Risks

Remmelt and flandry19

27 Dec 2022 14:05 UTC

0 points

2 comments1 min readLW link

Housing and Transportation Roundup #2

Zvi27 Dec 2022 13:10 UTC

25 points

0 comments12 min readLW link

(thezvi.wordpress.com)

[Question] Are tulpas moral patients?

ChristianKl27 Dec 2022 11:30 UTC

16 points

28 comments1 min readLW link

Reflections on my 5-month alignment upskilling grant

Jay Bailey27 Dec 2022 10:51 UTC

82 points

4 comments8 min readLW link

Institutions Cannot Restrain Dark-Triad AI Exploitation

Remmelt and flandry19

27 Dec 2022 10:34 UTC

5 points

0 comments5 min readLW link

(mflb.com)

Introduction: Bias in Evaluating AGI X-Risks

Remmelt and flandry19

27 Dec 2022 10:27 UTC

1 point

0 comments3 min readLW link

MDPs and the Bellman Equation, Intuitively Explained

Jack O'Brien27 Dec 2022 5:50 UTC

11 points

3 comments14 min readLW link

How ‘Human-Human’ dynamics give way to ‘Human-AI’ and then ‘AI-AI’ dynamics

Remmelt and flandry19

27 Dec 2022 3:16 UTC

−2 points

5 comments2 min readLW link

(mflb.com)

Nine Points of Collective Insanity

Remmelt and flandry19

27 Dec 2022 3:14 UTC

−2 points

3 comments1 min readLW link

(mflb.com)

Fractional Resignation

jefftk27 Dec 2022 2:30 UTC

19 points

6 comments1 min readLW link

(www.jefftk.com)

[Question] What policies have most thoroughly crippled (otherwise-promising) industries or technologies?

benwr27 Dec 2022 2:25 UTC

40 points

4 comments1 min readLW link

Recent advances in Natural Language Processing—Some Woolly speculations (2019 essay on semantics and language models)

philosophybear27 Dec 2022 2:11 UTC

1 point

0 comments7 min readLW link

Against Agents as an Approach to Aligned Transformative AI

DragonGod27 Dec 2022 0:47 UTC

12 points

9 comments2 min readLW link

Can we efficiently distinguish different mechanisms?

paulfchristiano27 Dec 2022 0:20 UTC

91 points

30 comments16 min readLW link

(ai-alignment.com)

Air-gapping evaluation and support

Ryan Kidd26 Dec 2022 22:52 UTC

53 points

1 comment2 min readLW link

Slightly against aligning with neo-luddites

Matthew Barnett26 Dec 2022 22:46 UTC

104 points

31 comments4 min readLW link

Avoiding perpetual risk from TAI

scasper26 Dec 2022 22:34 UTC

15 points

6 comments5 min readLW link

Announcing: The Independent AI Safety Registry

Shoshannah Tekofsky26 Dec 2022 21:22 UTC

53 points

9 comments1 min readLW link

Are men harder to help?

braces26 Dec 2022 21:11 UTC

36 points

1 comment2 min readLW link

[Question] How much should I update on the fact that my dentist is named Dennis?

MichaelDickens26 Dec 2022 19:11 UTC

2 points

3 comments1 min readLW link

Theodicy and the simulation hypothesis, or: The problem of simulator evil

philosophybear26 Dec 2022 18:55 UTC

12 points

12 comments19 min readLW link

(philosophybear.substack.com)

Safety of Self-Assembled Neuromorphic Hardware

Can26 Dec 2022 18:51 UTC

16 points

2 comments10 min readLW link

(forum.effectivealtruism.org)

Coherent extrapolated dreaming

Alex Flint26 Dec 2022 17:29 UTC

38 points

10 comments17 min readLW link

An overview of some promising work by junior alignment researchers

Orpheus1626 Dec 2022 17:23 UTC

34 points

0 comments4 min readLW link

Solstice song: Here Lies the Dragon

jchan26 Dec 2022 16:08 UTC

8 points

1 comment2 min readLW link

The Usefulness Paradigm

Aprillion26 Dec 2022 13:23 UTC

4 points

4 comments1 min readLW link

Looking Back on Posts From 2022

Zvi26 Dec 2022 13:20 UTC

50 points

8 comments17 min readLW link

(thezvi.wordpress.com)

Analogies between Software Reverse Engineering and Mechanistic Interpretability

Neel Nanda and Itay Yona

26 Dec 2022 12:26 UTC

34 points

6 comments11 min readLW link

(www.neelnanda.io)

Mlyyrczo

lsusr26 Dec 2022 7:58 UTC

44 points

14 comments3 min readLW link

Causal abstractions vs infradistributions

Pablo Villalobos26 Dec 2022 0:21 UTC

24 points

0 comments6 min readLW link

Concrete Steps to Get Started in Transformer Mechanistic Interpretability

Neel Nanda25 Dec 2022 22:21 UTC

57 points

7 comments12 min readLW link

(www.neelnanda.io)

It’s time to worry about online privacy again

Malmesbury25 Dec 2022 21:05 UTC

71 points

23 comments6 min readLW link

[Hebbian Natural Abstractions] Mathematical Foundations

Samuel Nellessen and Jan

25 Dec 2022 20:58 UTC

15 points

2 comments6 min readLW link

(www.snellessen.com)

[Question] Oracle AGI—How can it escape, other than security issues? (Steganography?)

RationalSieve25 Dec 2022 20:14 UTC

3 points

6 comments1 min readLW link

YCombinator fraud rates

Xodarap25 Dec 2022 19:21 UTC

56 points

3 comments4 min readLW link

How evolutionary lineages of LLMs can plan their own future and act on these plans

Roman Leventov25 Dec 2022 18:11 UTC

39 points

16 comments8 min readLW link

Accurate Models of AI Risk Are Hyperexistential Exfohazards

Thane Ruthenis25 Dec 2022 16:50 UTC

33 points

38 comments9 min readLW link

ChatGPT is our Wright Brothers moment

Ron J25 Dec 2022 16:26 UTC

10 points

9 comments1 min readLW link

The Meditation on Winter

Raemon25 Dec 2022 16:12 UTC

59 points

3 comments3 min readLW link

I’ve updated towards AI boxing being surprisingly easy

Noosphere8925 Dec 2022 15:40 UTC

8 points

20 comments2 min readLW link

Take 14: Corrigibility isn’t that great.

Charlie Steiner25 Dec 2022 13:04 UTC

15 points

3 comments3 min readLW link

Simplified Level Up

jefftk25 Dec 2022 13:00 UTC

12 points

16 comments2 min readLW link

(www.jefftk.com)

Hyperfinite graphs ~ manifolds

Alok Singh25 Dec 2022 12:24 UTC

11 points

5 comments2 min readLW link

Inconsistent math is great

Alok Singh25 Dec 2022 3:20 UTC

1 point

2 comments1 min readLW link