All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025 2026

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 3 4 5 6 7 8910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

A first success story for Outer Alignment: InstructGPT

Noosphere898 Nov 2022 22:52 UTC

6 points

1 comment1 min readLW link

(openai.com)

Trying Mastodon

jefftk8 Nov 2022 19:10 UTC

12 points

4 comments1 min readLW link

(www.jefftk.com)

Inverse scaling can become U-shaped

Edouard Harris8 Nov 2022 19:04 UTC

27 points

15 comments1 min readLW link

(arxiv.org)

People care about each other even though they have imperfect motivational pointers?

TurnTrout8 Nov 2022 18:15 UTC

33 points

25 comments7 min readLW link

Applying superintelligence without collusion

Eric Drexler8 Nov 2022 18:08 UTC

109 points

63 comments4 min readLW link

[Question] Binance is buying FTX.com: How did it happen and what are the implications?

Caerulean8 Nov 2022 17:14 UTC

16 points

6 comments1 min readLW link

Some advice on independent research

Marius Hobbhahn8 Nov 2022 14:46 UTC

57 points

5 comments10 min readLW link

Mysteries of mode collapse

janus8 Nov 2022 10:37 UTC

303 points

57 comments14 min readLW link 1 review

[ASoT] Thoughts on GPT-N

Ulisse Mini8 Nov 2022 7:14 UTC

8 points

0 comments1 min readLW link

Michael Simm—Introducing Myself

Michael Simm8 Nov 2022 5:45 UTC

4 points

0 comments2 min readLW link

EA & LW Forums Weekly Summary (31st Oct − 6th Nov 22′)

Zoe Williams8 Nov 2022 3:58 UTC

12 points

1 comment18 min readLW link

[Question] Value of Querying 100+ People About Humanity’s Future

T4318 Nov 2022 0:41 UTC

9 points

3 comments2 min readLW link

How could we know that an AGI system will have good consequences?

So8res7 Nov 2022 22:42 UTC

112 points

25 comments5 min readLW link

A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)

Neel Nanda7 Nov 2022 22:39 UTC

30 points

15 comments3 min readLW link

(youtu.be)

Intercept article about lab accidents

ChristianKl7 Nov 2022 21:10 UTC

23 points

9 comments1 min readLW link

(theintercept.com)

The biological function of love for non-kin is to gain the trust of people we cannot deceive

chaosmage7 Nov 2022 20:26 UTC

43 points

3 comments8 min readLW link

Distillation Experiment: Chunk-Knitting

DirectedEvolution7 Nov 2022 19:56 UTC

10 points

3 comments6 min readLW link

Thinking About Mastodon

jefftk7 Nov 2022 19:40 UTC

33 points

17 comments1 min readLW link

(www.jefftk.com)

[Question] Ideas for tiny research projects related to rationality?

Frej7 Nov 2022 18:45 UTC

3 points

1 comment1 min readLW link

Loss of control of AI is not a likely source of AI x-risk

squek7 Nov 2022 18:44 UTC

−6 points

0 comments5 min readLW link

AI Safety Unconference NeurIPS 2022

Orpheus7 Nov 2022 15:39 UTC

25 points

0 comments1 min readLW link

(aisafetyevents.org)

Hacker-AI – Does it already exist?

Erland Wittkotter7 Nov 2022 14:01 UTC

3 points

12 comments11 min readLW link

What’s the Deal with Elon Musk and Twitter?

Zvi7 Nov 2022 13:50 UTC

60 points

13 comments31 min readLW link

(thezvi.wordpress.com)

How to Make Easy Decisions

lynettebye7 Nov 2022 13:17 UTC

17 points

3 comments2 min readLW link

Opportunities that surprised us during our Clearer Thinking Regrants program

spencerg7 Nov 2022 13:09 UTC

20 points

0 comments9 min readLW link

4 Key Assumptions in AI Safety

Prometheus7 Nov 2022 10:50 UTC

20 points

5 comments7 min readLW link

Google Search as a Washed Up Service Dog: “I HALP!”

Shmi7 Nov 2022 7:02 UTC

20 points

8 comments1 min readLW link

[Book Review] “Station Eleven” by Emily St. John Mandel

lsusr7 Nov 2022 5:56 UTC

17 points

1 comment1 min readLW link

Counterfactability

Scott Garrabrant7 Nov 2022 5:39 UTC

40 points

5 comments11 min readLW link

2022 LessWrong Census?

Matt He7 Nov 2022 5:16 UTC

67 points

13 comments1 min readLW link

A philosopher’s critique of RLHF

TW1237 Nov 2022 2:42 UTC

55 points

8 comments2 min readLW link

[Question] Is there any discussion on avoiding being Dutch-booked or otherwise taken advantage of one’s bounded rationality by refusing to engage?

Shmi7 Nov 2022 2:36 UTC

38 points

29 comments1 min readLW link

Exams-Only Universities

Mati_Roy6 Nov 2022 22:05 UTC

80 points

40 comments2 min readLW link

Democracy Is in Danger, but Not for the Reasons You Think

ExCeph6 Nov 2022 21:15 UTC

−7 points

4 comments12 min readLW link

(ginnungagapfoundation.wordpress.com)

Playground Game: Monster

jefftk6 Nov 2022 16:00 UTC

14 points

4 comments1 min readLW link

(www.jefftk.com)

[Question] Has Pascal’s Mugging problem been completely solved yet?

EniScien6 Nov 2022 12:52 UTC

3 points

11 comments1 min readLW link

[Question] Should I Pursue a PhD?

DragonGod6 Nov 2022 10:58 UTC

8 points

8 comments2 min readLW link

You won’t solve alignment without agent foundations

Mikhail Samin6 Nov 2022 8:07 UTC

29 points

3 comments8 min readLW link

Word-Distance vs Idea-Distance: The Case for Lanoitaring

Sable6 Nov 2022 5:25 UTC

7 points

7 comments7 min readLW link

(affablyevil.substack.com)

Apple Cider Syrup

jefftk6 Nov 2022 2:10 UTC

11 points

6 comments1 min readLW link

(www.jefftk.com)

What is epigenetics?

Metacelsus6 Nov 2022 1:24 UTC

78 points

4 comments6 min readLW link

(denovo.substack.com)

Response

Jarred Filmer6 Nov 2022 1:03 UTC

29 points

2 comments12 min readLW link

[Question] Has anyone increased their AGI timelines?

Darren McKee6 Nov 2022 0:03 UTC

39 points

12 comments1 min readLW link

Takeaways from a survey on AI alignment resources

DanielFilan5 Nov 2022 23:40 UTC

73 points

10 comments6 min readLW link 1 review

(danielfilan.com)

Unpricable Information and Certificate Hell

eva_5 Nov 2022 22:56 UTC

13 points

2 comments6 min readLW link

Recommend HAIST resources for assessing the value of RLHF-related alignment research

Sam Marks and Xander Davies

5 Nov 2022 20:58 UTC

26 points

9 comments3 min readLW link

Instead of technical research, more people should focus on buying time

Orpheus16, Olive Branch and Thomas Larsen

5 Nov 2022 20:43 UTC

101 points

45 comments14 min readLW link

Provably Honest—A First Step

Srijanak De5 Nov 2022 19:18 UTC

10 points

2 comments8 min readLW link

Should AI focus on problem-solving or strategic planning? Why not both?

Oliver Siegel5 Nov 2022 19:17 UTC

−12 points

3 comments1 min readLW link

How to store human values on a computer

Oliver Siegel5 Nov 2022 19:17 UTC

−12 points

17 comments1 min readLW link