All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 293031

Pausing AI Developments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky

jacquesthibs29 Mar 2023 23:16 UTC

293 points

297 comments3 min readLW link

(time.com)

Othello-GPT: Reflections on the Research Process

Neel Nanda29 Mar 2023 22:13 UTC

39 points

0 comments15 min readLW link

(neelnanda.io)

Othello-GPT: Future Work I Am Excited About

Neel Nanda29 Mar 2023 22:13 UTC

48 points

2 comments33 min readLW link

(neelnanda.io)

Actually, Othello-GPT Has A Linear Emergent World Representation

Neel Nanda29 Mar 2023 22:13 UTC

211 points

26 comments19 min readLW link

(neelnanda.io)

Draft: Detecting optimization

Alex_Altair29 Mar 2023 20:17 UTC

23 points

2 comments6 min readLW link

“Sorcerer’s Apprentice” from Fantasia as an analogy for alignment

awg29 Mar 2023 18:21 UTC

9 points

4 comments1 min readLW link

(video.disney.com)

The Changing Face of Twitter

Zvi29 Mar 2023 17:50 UTC

23 points

8 comments26 min readLW link

(thezvi.wordpress.com)

Nobody’s on the ball on AGI alignment

leopold29 Mar 2023 17:40 UTC

94 points

38 comments9 min readLW link

(www.forourposterity.com)

Want to win the AGI race? Solve alignment.

leopold29 Mar 2023 17:40 UTC

21 points

3 comments5 min readLW link

(www.forourposterity.com)

ChatGPT and Bing Chat can’t play Botticelli

Asha Saavoss29 Mar 2023 17:39 UTC

11 points

0 comments6 min readLW link

The Rationalist Guide to Hinduism

Harsha G.29 Mar 2023 17:03 UTC

25 points

12 comments9 min readLW link

(somestrangeloops.substack.com)

“Unintentional AI safety research”: Why not systematically mine AI technical research for safety purposes?

Jemal Young29 Mar 2023 15:56 UTC

27 points

3 comments6 min readLW link

The open letter

kornai29 Mar 2023 15:09 UTC

−21 points

2 comments1 min readLW link

I made AI Risk Propaganda

monkymind29 Mar 2023 14:26 UTC

−3 points

0 comments1 min readLW link

Strong Cheap Signals

trevor29 Mar 2023 14:18 UTC

29 points

3 comments2 min readLW link

(betonit.substack.com)

Missing forecasting tools: from catalogs to a new kind of prediction market

MichaelLatowicki29 Mar 2023 9:55 UTC

14 points

3 comments5 min readLW link

Spreadsheet for 200 Concrete Problems In Interpretability

Jay Bailey29 Mar 2023 6:51 UTC

13 points

0 comments1 min readLW link

[Question] Which parts of the existing internet are already likely to be in (GPT-5/other soon-to-be-trained LLMs)’s training corpus?

AnnaSalamon29 Mar 2023 5:17 UTC

49 points

2 comments1 min readLW link

[Question] Are there specific books that it might slightly help alignment to have on the internet?

AnnaSalamon29 Mar 2023 5:08 UTC

77 points

25 comments1 min readLW link

FLI open letter: Pause giant AI experiments

Zach Stein-Perlman29 Mar 2023 4:04 UTC

126 points

123 comments2 min readLW link

(futureoflife.org)

Run Posts By Orgs

jefftk29 Mar 2023 2:40 UTC

16 points

74 comments3 min readLW link

(www.jefftk.com)

Desensitizing Deepfakes

worse29 Mar 2023 1:20 UTC

1 point

0 comments1 min readLW link

Large language models aren’t trained enough

sanxiyn29 Mar 2023 0:56 UTC

5 points

4 comments1 min readLW link

(finbarr.ca)

Job Board (28 March 2033)

dr_s28 Mar 2023 22:44 UTC

20 points

1 comment3 min readLW link

Four lenses on AI risks

jasoncrawford28 Mar 2023 21:52 UTC

23 points

5 comments3 min readLW link

(rootsofprogress.org)

Some common confusion about induction heads

Alexandre Variengien28 Mar 2023 21:51 UTC

64 points

4 comments5 min readLW link

Draft: The optimization toolbox

Alex_Altair28 Mar 2023 20:40 UTC

20 points

1 comment7 min readLW link

Inching “Kubla Khan” and GPT into the same intellectual framework @ 3 Quarks Daily

Bill Benzon28 Mar 2023 19:50 UTC

5 points

0 comments3 min readLW link

A rough and incomplete review of some of John Wentworth’s research

So8res28 Mar 2023 18:52 UTC

175 points

18 comments18 min readLW link

[Question] How do you manage your inputs?

Mateusz Bagiński28 Mar 2023 18:26 UTC

15 points

2 comments1 min readLW link

Chatbot convinces Belgian to commit suicide

Jeroen De Ryck28 Mar 2023 18:14 UTC

60 points

18 comments3 min readLW link

(www.standaard.be)

A Primer On Chaos

johnswentworth28 Mar 2023 18:01 UTC

53 points

9 comments9 min readLW link

[Question] How likely are scenarios where AGI ends up overtly or de facto torturing us? How likely are scenarios where AGI prevents us from committing suicide or dying?

JohnGreer28 Mar 2023 18:00 UTC

11 points

4 comments1 min readLW link

How do we align humans and what does it mean for the new Conjecture’s strategy

Igor Ivanov28 Mar 2023 17:54 UTC

7 points

4 comments7 min readLW link

Governing High-Impact AI Systems: Understanding Canada’s Proposed AI Bill. April 15, Carleton University, Ottawa

Liav Koren28 Mar 2023 17:48 UTC

11 points

1 comment1 min readLW link

(forum.effectivealtruism.org)

I had a chat with GPT-4 on the future of AI and AI safety

Kristian Freed28 Mar 2023 17:47 UTC

1 point

0 comments8 min readLW link

LessWrong Hangout

Raymond Koopmanschap28 Mar 2023 17:47 UTC

0 points

0 comments1 min readLW link

Half-baked alignment idea

ozb28 Mar 2023 17:47 UTC

6 points

27 comments1 min readLW link

Some of My Current Impressions Entering AI Safety

worse28 Mar 2023 17:46 UTC

2 points

0 comments2 min readLW link

[Question] Why do the Sequences say that “Löb’s Theorem shows that a mathematical system cannot assert its own soundness without becoming inconsistent.”?

Thoth Hermes28 Mar 2023 17:19 UTC

12 points

30 comments1 min readLW link

Corrigibility, Self-Deletion, and Identical Strawberries

Robert_AIZI28 Mar 2023 16:54 UTC

9 points

2 comments6 min readLW link

(aizi.substack.com)

[Question] Why no major LLMs with memory?

Kaj_Sotala28 Mar 2023 16:34 UTC

42 points

15 comments1 min readLW link

Response to Tyler Cowen’s Existential risk, AI, and the inevitable turn in human history

Zvi28 Mar 2023 16:00 UTC

72 points

27 comments20 min readLW link

(thezvi.wordpress.com)

Adapting to Change: Overcoming Chronostasis in AI Language Models

RationalMindset28 Mar 2023 14:32 UTC

−1 points

0 comments6 min readLW link

Feeling Progress as Motivation

Sable28 Mar 2023 9:11 UTC

4 points

0 comments3 min readLW link

(affablyevil.substack.com)

Creating a family with GPT-4

Kaj_Sotala28 Mar 2023 6:40 UTC

23 points

3 comments10 min readLW link

(kajsotala.fi)

Some 2-4-6 problems

abstractapplic28 Mar 2023 6:32 UTC

28 points

9 comments1 min readLW link

(h-b-p.github.io)

[Question] Deep folding docs site?

mcint28 Mar 2023 6:01 UTC

−1 points

2 comments1 min readLW link

[Question] Why does advanced AI want not to be shut down?

RedFishBlueFish28 Mar 2023 4:26 UTC

2 points

19 comments1 min readLW link

100 Dinners And A Workshop: Information Preservation And Goals

Stephen Fowler28 Mar 2023 3:13 UTC

8 points

0 comments7 min readLW link