All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025 2026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 456 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Historical Notes on Charitable Funds

jefftk4 Dec 2022 23:30 UTC

28 points

0 comments3 min readLW link

(www.jefftk.com)

AGI as a Black Swan Event

Stephen McAleese4 Dec 2022 23:00 UTC

8 points

8 comments7 min readLW link

South Bay ACX/LW Pre-Holiday Get-Together

IS4 Dec 2022 22:57 UTC

10 points

0 comments1 min readLW link

ChatGPT is settling the Chinese Room argument

averros4 Dec 2022 20:25 UTC

−7 points

7 comments1 min readLW link

Race to the Top: Benchmarks for AI Safety

Isabella Duan4 Dec 2022 18:48 UTC

29 points

6 comments1 min readLW link

Open & Welcome Thread—December 2022

niplav4 Dec 2022 15:06 UTC

8 points

22 comments1 min readLW link

AI can exploit safety plans posted on the Internet

Peter S. Park4 Dec 2022 12:17 UTC

−15 points

4 comments1 min readLW link

ChatGPT seems overconfident to me

qbolec4 Dec 2022 8:03 UTC

19 points

3 comments16 min readLW link

Could an AI be Religious?

mk544 Dec 2022 5:00 UTC

−12 points

14 comments1 min readLW link

Can GPT-3 Write Contra Dances?

jefftk4 Dec 2022 3:00 UTC

6 points

4 comments10 min readLW link

(www.jefftk.com)

Take 3: No indescribable heavenworlds.

Charlie Steiner4 Dec 2022 2:48 UTC

32 points

12 comments2 min readLW link

Summary of a new study on out-group hate (and how to fix it)

DirectedEvolution4 Dec 2022 1:53 UTC

60 points

30 comments3 min readLW link

(www.pnas.org)

[Question] Will the first AGI agent have been designed as an agent (in addition to an AGI)?

nahoj3 Dec 2022 20:32 UTC

1 point

8 comments1 min readLW link

Logical induction for software engineers

Alex Flint3 Dec 2022 19:55 UTC

163 points

8 comments27 min readLW link 1 review

Utilitarianism is the only option

aelwood3 Dec 2022 17:14 UTC

−12 points

7 comments6 min readLW link

(pursuingreality.substack.com)

Our 2022 Giving

jefftk3 Dec 2022 15:40 UTC

33 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] Is school good or bad?

tailcalled3 Dec 2022 13:14 UTC

10 points

76 comments1 min readLW link

MrBeast’s Squid Game Tricked Me

lsusr3 Dec 2022 5:50 UTC

76 points

1 comment2 min readLW link

Great Cryonics Survey of 2022

Mati_Roy3 Dec 2022 5:10 UTC

16 points

0 comments1 min readLW link

Causal scrubbing: results on induction heads

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, Tao Lin, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:59 UTC

34 points

1 comment17 min readLW link

Causal scrubbing: results on a paren balance checker

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, Tao Lin, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:59 UTC

39 points

2 comments30 min readLW link

Causal scrubbing: Appendix

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:58 UTC

18 points

4 comments20 min readLW link

Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:58 UTC

208 points

35 comments20 min readLW link 1 review

Take 2: Building tools to help build FAI is a legitimate strategy, but it’s dual-use.

Charlie Steiner3 Dec 2022 0:54 UTC

17 points

1 comment2 min readLW link

D&D.Sci December 2022: The Boojumologist

abstractapplic2 Dec 2022 23:39 UTC

32 points

9 comments2 min readLW link

Subsets and quotients in interpretability

Erik Jenner2 Dec 2022 23:13 UTC

26 points

1 comment7 min readLW link

Research Principles for 6 Months of AI Alignment Studies

Shoshannah Tekofsky2 Dec 2022 22:55 UTC

23 points

3 comments6 min readLW link

Three Fables of Magical Girls and Longtermism

Ulisse Mini2 Dec 2022 22:01 UTC

33 points

11 comments2 min readLW link

Brun’s theorem and sieve theory

Ege Erdil2 Dec 2022 20:57 UTC

31 points

1 comment73 min readLW link

Apply for the ML Upskilling Winter Camp in Cambridge, UK [2-10 Jan]

hannah wing-yee2 Dec 2022 20:45 UTC

3 points

0 comments2 min readLW link

Takeoff speeds, the chimps analogy, and the Cultural Intelligence Hypothesis

NickGabs2 Dec 2022 19:14 UTC

17 points

3 comments4 min readLW link

[ASoT] Finetuning, RL, and GPT’s world prior

Jozdien2 Dec 2022 16:33 UTC

45 points

8 comments5 min readLW link

NeurIPS Safety & ChatGPT. MLAISU W48

Esben Kran and Steinthal

2 Dec 2022 15:50 UTC

3 points

0 comments4 min readLW link

(newsletter.apartresearch.com)

[Question] Is ChatGPT rigth when advising to brush the tongue when brushing teeth?

ChristianKl2 Dec 2022 14:53 UTC

13 points

14 comments2 min readLW link

Jailbreaking ChatGPT on Release Day

Zvi2 Dec 2022 13:10 UTC

243 points

77 comments6 min readLW link 1 review

(thezvi.wordpress.com)

Deconfusing Direct vs Amortised Optimization

beren2 Dec 2022 11:30 UTC

137 points

19 comments10 min readLW link

Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout2 Dec 2022 2:43 UTC

146 points

23 comments47 min readLW link 3 reviews

New Feature: Collaborative editing now supports logged-out users

RobertM2 Dec 2022 2:41 UTC

10 points

0 comments1 min readLW link

Mastering Stratego (Deepmind)

svemirski2 Dec 2022 2:21 UTC

6 points

0 comments1 min readLW link

(www.deepmind.com)

Update on Harvard AI Safety Team and MIT AI Alignment

Xander Davies, Sam Marks, kaivu, tlevin, leni, maxnadeau and Naomi Bashkansky

2 Dec 2022 0:56 UTC

60 points

4 comments8 min readLW link

Quick look: cognitive damage from well-administered anesthesia

Elizabeth2 Dec 2022 0:40 UTC

28 points

0 comments4 min readLW link

(acesounderglass.com)

Against meta-ethical hedonism

Joe Carlsmith2 Dec 2022 0:23 UTC

25 points

5 comments35 min readLW link

Lumenators for very lazy British people

shakeelh2 Dec 2022 0:18 UTC

16 points

3 comments1 min readLW link

 Understanding goals in complex systems

Johannes C. Mayer1 Dec 2022 23:49 UTC

9 points

0 comments1 min readLW link

(www.youtube.com)

A challenge for AGI organizations, and a challenge for readers

Rob Bensinger and Eliezer Yudkowsky

1 Dec 2022 23:11 UTC

304 points

33 comments2 min readLW link

Playing with Aerial Photos

jefftk1 Dec 2022 22:50 UTC

9 points

0 comments1 min readLW link

(www.jefftk.com)

Take 1: We’re not going to reverse-engineer the AI.

Charlie Steiner1 Dec 2022 22:41 UTC

38 points

4 comments4 min readLW link

Re-Examining LayerNorm

Eric Winsor1 Dec 2022 22:20 UTC

128 points

12 comments5 min readLW link

The LessWrong 2021 Review: Intellectual Circle Expansion

Ruby and Raemon

1 Dec 2022 21:17 UTC

95 points

55 comments8 min readLW link

The Plan − 2022 Update

johnswentworth1 Dec 2022 20:43 UTC

240 points

37 comments8 min readLW link 1 review