All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025 2026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 345 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

[Question] Will the first AGI agent have been designed as an agent (in addition to an AGI)?

nahoj3 Dec 2022 20:32 UTC

1 point

8 comments1 min readLW link

Logical induction for software engineers

Alex Flint3 Dec 2022 19:55 UTC

163 points

8 comments27 min readLW link 1 review

Utilitarianism is the only option

aelwood3 Dec 2022 17:14 UTC

−12 points

7 comments6 min readLW link

(pursuingreality.substack.com)

Our 2022 Giving

jefftk3 Dec 2022 15:40 UTC

33 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] Is school good or bad?

tailcalled3 Dec 2022 13:14 UTC

10 points

76 comments1 min readLW link

MrBeast’s Squid Game Tricked Me

lsusr3 Dec 2022 5:50 UTC

76 points

1 comment2 min readLW link

Great Cryonics Survey of 2022

Mati_Roy3 Dec 2022 5:10 UTC

16 points

0 comments1 min readLW link

Causal scrubbing: results on induction heads

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, Tao Lin, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:59 UTC

34 points

1 comment17 min readLW link

Causal scrubbing: results on a paren balance checker

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, Tao Lin, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:59 UTC

39 points

2 comments30 min readLW link

Causal scrubbing: Appendix

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:58 UTC

18 points

4 comments20 min readLW link

Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:58 UTC

208 points

35 comments20 min readLW link 1 review

Take 2: Building tools to help build FAI is a legitimate strategy, but it’s dual-use.

Charlie Steiner3 Dec 2022 0:54 UTC

17 points

1 comment2 min readLW link

D&D.Sci December 2022: The Boojumologist

abstractapplic2 Dec 2022 23:39 UTC

32 points

9 comments2 min readLW link

Subsets and quotients in interpretability

Erik Jenner2 Dec 2022 23:13 UTC

26 points

1 comment7 min readLW link

Research Principles for 6 Months of AI Alignment Studies

Shoshannah Tekofsky2 Dec 2022 22:55 UTC

23 points

3 comments6 min readLW link

Three Fables of Magical Girls and Longtermism

Ulisse Mini2 Dec 2022 22:01 UTC

33 points

11 comments2 min readLW link

Brun’s theorem and sieve theory

Ege Erdil2 Dec 2022 20:57 UTC

31 points

1 comment73 min readLW link

Apply for the ML Upskilling Winter Camp in Cambridge, UK [2-10 Jan]

hannah wing-yee2 Dec 2022 20:45 UTC

3 points

0 comments2 min readLW link

Takeoff speeds, the chimps analogy, and the Cultural Intelligence Hypothesis

NickGabs2 Dec 2022 19:14 UTC

17 points

3 comments4 min readLW link

[ASoT] Finetuning, RL, and GPT’s world prior

Jozdien2 Dec 2022 16:33 UTC

45 points

8 comments5 min readLW link

NeurIPS Safety & ChatGPT. MLAISU W48

Esben Kran and Steinthal

2 Dec 2022 15:50 UTC

3 points

0 comments4 min readLW link

(newsletter.apartresearch.com)

[Question] Is ChatGPT rigth when advising to brush the tongue when brushing teeth?

ChristianKl2 Dec 2022 14:53 UTC

13 points

14 comments2 min readLW link

Jailbreaking ChatGPT on Release Day

Zvi2 Dec 2022 13:10 UTC

243 points

77 comments6 min readLW link 1 review

(thezvi.wordpress.com)

Deconfusing Direct vs Amortised Optimization

beren2 Dec 2022 11:30 UTC

137 points

19 comments10 min readLW link

Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout2 Dec 2022 2:43 UTC

150 points

23 comments47 min readLW link 3 reviews

New Feature: Collaborative editing now supports logged-out users

RobertM2 Dec 2022 2:41 UTC

10 points

0 comments1 min readLW link

Mastering Stratego (Deepmind)

svemirski2 Dec 2022 2:21 UTC

6 points

0 comments1 min readLW link

(www.deepmind.com)

Update on Harvard AI Safety Team and MIT AI Alignment

Xander Davies, Sam Marks, kaivu, tlevin, leni, maxnadeau and Naomi Bashkansky

2 Dec 2022 0:56 UTC

60 points

4 comments8 min readLW link

Quick look: cognitive damage from well-administered anesthesia

Elizabeth2 Dec 2022 0:40 UTC

28 points

0 comments4 min readLW link

(acesounderglass.com)

Against meta-ethical hedonism

Joe Carlsmith2 Dec 2022 0:23 UTC

25 points

5 comments35 min readLW link

Lumenators for very lazy British people

shakeelh2 Dec 2022 0:18 UTC

16 points

3 comments1 min readLW link

 Understanding goals in complex systems

Johannes C. Mayer1 Dec 2022 23:49 UTC

9 points

0 comments1 min readLW link

(www.youtube.com)

A challenge for AGI organizations, and a challenge for readers

Rob Bensinger and Eliezer Yudkowsky

1 Dec 2022 23:11 UTC

304 points

33 comments2 min readLW link

Playing with Aerial Photos

jefftk1 Dec 2022 22:50 UTC

9 points

0 comments1 min readLW link

(www.jefftk.com)

Take 1: We’re not going to reverse-engineer the AI.

Charlie Steiner1 Dec 2022 22:41 UTC

38 points

4 comments4 min readLW link

Re-Examining LayerNorm

Eric Winsor1 Dec 2022 22:20 UTC

128 points

12 comments5 min readLW link

The LessWrong 2021 Review: Intellectual Circle Expansion

Ruby and Raemon

1 Dec 2022 21:17 UTC

95 points

55 comments8 min readLW link

The Plan − 2022 Update

johnswentworth1 Dec 2022 20:43 UTC

240 points

37 comments8 min readLW link 1 review

Finding gliders in the game of life

paulfchristiano1 Dec 2022 20:40 UTC

104 points

8 comments16 min readLW link

(ai-alignment.com)

The Machine Stops (Chapter 9)

Justin Bullock1 Dec 2022 19:20 UTC

3 points

0 comments47 min readLW link

Covid 12/1/22: China Protests

Zvi1 Dec 2022 17:10 UTC

38 points

2 comments10 min readLW link

(thezvi.wordpress.com)

ChatGPT: First Impressions

specbug1 Dec 2022 16:36 UTC

18 points

2 comments13 min readLW link

(sixeleven.in)

[LINK] - ChatGPT discussion

JanB1 Dec 2022 15:04 UTC

13 points

8 comments1 min readLW link

(openai.com)

Research request (alignment strategy): Deep dive on “making AI solve alignment for us”

JanB1 Dec 2022 14:55 UTC

16 points

3 comments1 min readLW link

Theories of impact for Science of Deep Learning

Marius Hobbhahn1 Dec 2022 14:39 UTC

25 points

0 comments11 min readLW link

Safe Development of Hacker-AI Countermeasures – What if we are too late?

Erland Wittkotter1 Dec 2022 7:59 UTC

3 points

0 comments14 min readLW link

Did ChatGPT just gaslight me?

TW1231 Dec 2022 5:41 UTC

124 points

45 comments9 min readLW link

(aiwatchtower.substack.com)

SBF’s comments on ethics are no surprise to virtue ethicists

c.trout1 Dec 2022 4:18 UTC

36 points

30 comments16 min readLW link

Notes on Caution

David Gross1 Dec 2022 3:05 UTC

14 points

0 comments19 min readLW link

Reestablishing Reliable Sources: A System for Tagging URLs

Riley Mueller1 Dec 2022 2:27 UTC

7 points

1 comment3 min readLW link