All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025 2026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 567 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

[Link] Why I’m optimistic about OpenAI’s alignment approach

janleike5 Dec 2022 22:51 UTC

98 points

15 comments1 min readLW link

(aligned.substack.com)

The No Free Lunch theorem for dummies

Steven Byrnes5 Dec 2022 21:46 UTC

37 points

16 comments3 min readLW link

ChatGPT and Ideological Turing Test

Viliam5 Dec 2022 21:45 UTC

42 points

1 comment1 min readLW link

ChatGPT on Spielberg’s A.I. and AI Alignment

Bill Benzon5 Dec 2022 21:10 UTC

5 points

0 comments4 min readLW link

Updating my AI timelines

Matthew Barnett5 Dec 2022 20:46 UTC

145 points

50 comments2 min readLW link

Steering Behaviour: Testing for (Non-)Myopia in Language Models

Evan R. Murphy and Megan Kinniment

5 Dec 2022 20:28 UTC

40 points

19 comments10 min readLW link

College Admissions as a Brutal One-Shot Game

devansh5 Dec 2022 20:05 UTC

8 points

26 comments2 min readLW link

Analysis of AI Safety surveys for field-building insights

Ash Jafari5 Dec 2022 19:21 UTC

11 points

2 comments5 min readLW link

Testing Ways to Bypass ChatGPT’s Safety Features

Robert_AIZI5 Dec 2022 18:50 UTC

7 points

4 comments5 min readLW link

(aizi.substack.com)

Foresight for AGI Safety Strategy: Mitigating Risks and Identifying Golden Opportunities

jacquesthibs5 Dec 2022 16:09 UTC

28 points

6 comments8 min readLW link

Aligned Behavior is not Evidence of Alignment Past a Certain Level of Intelligence

Ronny Fernandez5 Dec 2022 15:19 UTC

19 points

5 comments7 min readLW link

[Question] How should I judge the impact of giving $5k to a family of three kids and two mentally ill parents?

Blake5 Dec 2022 13:42 UTC

19 points

15 comments1 min readLW link

Is the “Valley of Confused Abstractions” real?

jacquesthibs5 Dec 2022 13:36 UTC

20 points

11 comments2 min readLW link

Take 4: One problem with natural abstractions is there’s too many of them.

Charlie Steiner5 Dec 2022 10:39 UTC

37 points

4 comments1 min readLW link

[Question] What are some good Lesswrong-related accounts or hashtags on Mastodon that I should follow?

SpectrumDT5 Dec 2022 9:42 UTC

2 points

0 comments1 min readLW link

[Question] Who are some prominent reasonable people who are confident that AI won’t kill everyone?

Optimization Process5 Dec 2022 9:12 UTC

72 points

54 comments1 min readLW link

Monthly Shorts 11/22

Celer5 Dec 2022 7:30 UTC

8 points

0 comments3 min readLW link

(keller.substack.com)

A ChatGPT story about ChatGPT doom

Matt He5 Dec 2022 5:40 UTC

6 points

2 comments4 min readLW link

A Tentative Timeline of The Near Future (2022-2025) for Self-Accountability

Yitz5 Dec 2022 5:33 UTC

26 points

0 comments4 min readLW link

Nook Nature

Duncan Sabien (Inactive)5 Dec 2022 4:10 UTC

57 points

20 comments10 min readLW link

Probably good projects for the AI safety ecosystem

Ryan Kidd5 Dec 2022 2:26 UTC

78 points

40 comments2 min readLW link

Historical Notes on Charitable Funds

jefftk4 Dec 2022 23:30 UTC

28 points

0 comments3 min readLW link

(www.jefftk.com)

AGI as a Black Swan Event

Stephen McAleese4 Dec 2022 23:00 UTC

8 points

8 comments7 min readLW link

South Bay ACX/LW Pre-Holiday Get-Together

IS4 Dec 2022 22:57 UTC

10 points

0 comments1 min readLW link

ChatGPT is settling the Chinese Room argument

averros4 Dec 2022 20:25 UTC

−7 points

7 comments1 min readLW link

Race to the Top: Benchmarks for AI Safety

Isabella Duan4 Dec 2022 18:48 UTC

29 points

6 comments1 min readLW link

Open & Welcome Thread—December 2022

niplav4 Dec 2022 15:06 UTC

8 points

22 comments1 min readLW link

AI can exploit safety plans posted on the Internet

Peter S. Park4 Dec 2022 12:17 UTC

−15 points

4 comments1 min readLW link

ChatGPT seems overconfident to me

qbolec4 Dec 2022 8:03 UTC

19 points

3 comments16 min readLW link

Could an AI be Religious?

mk544 Dec 2022 5:00 UTC

−12 points

14 comments1 min readLW link

Can GPT-3 Write Contra Dances?

jefftk4 Dec 2022 3:00 UTC

6 points

4 comments10 min readLW link

(www.jefftk.com)

Take 3: No indescribable heavenworlds.

Charlie Steiner4 Dec 2022 2:48 UTC

32 points

12 comments2 min readLW link

Summary of a new study on out-group hate (and how to fix it)

DirectedEvolution4 Dec 2022 1:53 UTC

60 points

30 comments3 min readLW link

(www.pnas.org)

[Question] Will the first AGI agent have been designed as an agent (in addition to an AGI)?

nahoj3 Dec 2022 20:32 UTC

1 point

8 comments1 min readLW link

Logical induction for software engineers

Alex Flint3 Dec 2022 19:55 UTC

163 points

8 comments27 min readLW link 1 review

Utilitarianism is the only option

aelwood3 Dec 2022 17:14 UTC

−12 points

7 comments6 min readLW link

(pursuingreality.substack.com)

Our 2022 Giving

jefftk3 Dec 2022 15:40 UTC

33 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] Is school good or bad?

tailcalled3 Dec 2022 13:14 UTC

10 points

76 comments1 min readLW link

MrBeast’s Squid Game Tricked Me

lsusr3 Dec 2022 5:50 UTC

76 points

1 comment2 min readLW link

Great Cryonics Survey of 2022

Mati_Roy3 Dec 2022 5:10 UTC

16 points

0 comments1 min readLW link

Causal scrubbing: results on induction heads

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, Tao Lin, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:59 UTC

34 points

1 comment17 min readLW link

Causal scrubbing: results on a paren balance checker

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, Tao Lin, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:59 UTC

39 points

2 comments30 min readLW link

Causal scrubbing: Appendix

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:58 UTC

18 points

4 comments20 min readLW link

Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:58 UTC

208 points

35 comments20 min readLW link 1 review

Take 2: Building tools to help build FAI is a legitimate strategy, but it’s dual-use.

Charlie Steiner3 Dec 2022 0:54 UTC

17 points

1 comment2 min readLW link

D&D.Sci December 2022: The Boojumologist

abstractapplic2 Dec 2022 23:39 UTC

32 points

9 comments2 min readLW link

Subsets and quotients in interpretability

Erik Jenner2 Dec 2022 23:13 UTC

26 points

1 comment7 min readLW link

Research Principles for 6 Months of AI Alignment Studies

Shoshannah Tekofsky2 Dec 2022 22:55 UTC

23 points

3 comments6 min readLW link

Three Fables of Magical Girls and Longtermism

Ulisse Mini2 Dec 2022 22:01 UTC

33 points

11 comments2 min readLW link

Brun’s theorem and sieve theory

Ege Erdil2 Dec 2022 20:57 UTC

31 points

1 comment73 min readLW link