All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 232425 26 27 28 29 30 31

AI Safety Newsletter #7: Disinformation, Governance Recommendations for AI labs, and Senate Hearings on AI

Dan H and Orpheus16

23 May 2023 21:47 UTC

25 points

0 comments6 min readLW link

(newsletter.safe.ai)

The Polarity Problem [Draft]

Dan H, cdkg and Simon Goldstein

23 May 2023 21:05 UTC

24 points

3 comments44 min readLW link

Progress links and tweets, 2023-05-23

jasoncrawford23 May 2023 20:15 UTC

16 points

0 comments1 min readLW link

(rootsofprogress.org)

 Yoshua Bengio: How Rogue AIs may Arise

harfe23 May 2023 18:28 UTC

92 points

12 comments18 min readLW link

(yoshuabengio.org)

‘Fundamental’ vs ‘applied’ mechanistic interpretability research

Lee Sharkey23 May 2023 18:26 UTC

65 points

6 comments3 min readLW link

Coercion is an adaptation to scarcity; trust is an adaptation to abundance

Richard_Ngo23 May 2023 18:14 UTC

91 points

11 comments4 min readLW link

[Question] Is “brittle alignment” good enough?

the8thbit23 May 2023 17:35 UTC

9 points

5 comments3 min readLW link

Will Artificial Superintelligence Kill Us?

James_Miller23 May 2023 16:27 UTC

33 points

2 comments22 min readLW link

Phone Number Jingle

jefftk23 May 2023 15:20 UTC

11 points

12 comments1 min readLW link

(www.jefftk.com)

GPT4 is capable of writing decent long-form science fiction (with the right prompts)

RomanS23 May 2023 13:41 UTC

22 points

28 comments65 min readLW link

[Question] Do humans still provide value in correspondence chess?

Jonathan Paulson23 May 2023 12:15 UTC

24 points

16 comments1 min readLW link

[Linkpost] The AGI Show podcast

Soroush Pour23 May 2023 9:52 UTC

4 points

0 comments1 min readLW link

Data and “tokens” a 30 year old human “trains” on

Jose Miguel Cruz y Celis23 May 2023 5:34 UTC

16 points

15 comments1 min readLW link

How I learned to stop worrying and love skill trees

junk heap homotopy23 May 2023 4:08 UTC

83 points

3 comments1 min readLW link

T-Shirt Size Distribution

jefftk23 May 2023 2:40 UTC

9 points

0 comments1 min readLW link

(www.jefftk.com)

AI self-improvement is possible

bhauth23 May 2023 2:32 UTC

18 points

3 comments8 min readLW link

Worrying less about acausal extortion

Raemon23 May 2023 2:08 UTC

42 points

12 comments13 min readLW link

Self-leadership and self-love dissolve anger and trauma

Richard_Ngo22 May 2023 22:30 UTC

74 points

7 comments5 min readLW link

A Manifold market notice: Binance

Scrooge Mcduck22 May 2023 22:24 UTC

15 points

13 comments1 min readLW link

I don’t want to talk about AI

KirstenH22 May 2023 21:23 UTC

34 points

11 comments2 min readLW link

(ealifestyles.substack.com)

Activation additions in a small residual network

Garrett Baker22 May 2023 20:28 UTC

22 points

4 comments3 min readLW link

[Linkpost] “Governance of superintelligence” by OpenAI

Daniel_Eth22 May 2023 20:15 UTC

67 points

20 comments2 min readLW link

(openai.com)

Two Pieces of Advice About How to Remember Things

Bentham's Bulldog22 May 2023 18:10 UTC

13 points

3 comments4 min readLW link

Why I Believe LLMs Do Not Have Human-like Emotions

Onid22 May 2023 15:46 UTC

13 points

6 comments7 min readLW link

AI Safety in China: Part 2

Lao Mein22 May 2023 14:50 UTC

104 points

28 comments2 min readLW link

Conjecture internal survey: AGI timelines and probability of human extinction from advanced AI

Maris Sala22 May 2023 14:31 UTC

156 points

5 comments3 min readLW link

(www.conjecture.dev)

Papers, Please #1: Various Papers on Employment, Wages and Productivity

Zvi22 May 2023 12:00 UTC

42 points

2 comments8 min readLW link

(thezvi.wordpress.com)

In Defense of «The Army of Jakoths»

MikkW22 May 2023 11:59 UTC

−14 points

10 comments4 min readLW link

Speed of information input is a bottleneck for rationality

MikkW22 May 2023 10:24 UTC

13 points

0 comments4 min readLW link

Distillation of Neurotech and Alignment Workshop January 2023

lisathiergart and Sumner L Norman

22 May 2023 7:17 UTC

53 points

9 comments14 min readLW link

The Treacherous Turn is finished! (AI-takeover-themed tabletop RPG)

Daniel Kokotajlo22 May 2023 5:49 UTC

55 points

5 comments2 min readLW link

(thetreacherousturn.ai)

The Stanley Parable: Making philosophy fun

Nathan112322 May 2023 2:15 UTC

6 points

3 comments3 min readLW link

Sea Monsters

Adam Zerner22 May 2023 0:58 UTC

30 points

11 comments5 min readLW link

The Army of Jakoths (a parable)

MikkW21 May 2023 22:48 UTC

−6 points

0 comments1 min readLW link

A&I (Rihanna ‘S&M’ parody lyrics)

nahoj21 May 2023 22:34 UTC

−2 points

0 comments2 min readLW link

Four Battlegrounds: Power in the Age of Artificial Intelligence (Book review)

PeterMcCluskey21 May 2023 21:19 UTC

25 points

0 comments4 min readLW link

(bayesianinvestor.com)

Gender Vectors in ROME’s Latent Space

Xodarap21 May 2023 18:46 UTC

14 points

2 comments3 min readLW link

Weight by Impact

Vaniver21 May 2023 14:37 UTC

29 points

1 comment3 min readLW link

 [outdated] My current theory of change to mitigate existential risk by misaligned ASI

mesaoptimizer21 May 2023 13:46 UTC

32 points

8 comments6 min readLW link

(mesaoptimizer.com)

Babble on growing trust

qbolec21 May 2023 13:19 UTC

13 points

1 comment5 min readLW link

Elevator Positioning

jefftk21 May 2023 11:30 UTC

15 points

1 comment1 min readLW link

(www.jefftk.com)

Transformer Architecture Choice for Resisting Prompt Injection and Jail-Breaking Attacks

RogerDearnaley21 May 2023 8:29 UTC

11 points

1 comment4 min readLW link

Jeff Clune advertising a postdoc on twitter...and asking where he should target his posts

Joyee Chen21 May 2023 1:02 UTC

4 points

0 comments1 min readLW link

Running Sound for Yourself

jefftk20 May 2023 22:10 UTC

11 points

0 comments2 min readLW link

(www.jefftk.com)

Job Opening: SWE to help build signature vetting system for AI-related petitions

Ethan Ashkie and Andrew_Critch

20 May 2023 19:02 UTC

52 points

0 comments1 min readLW link

My Kind of Pragmatism

Nora Belrose20 May 2023 18:58 UTC

38 points

11 comments3 min readLW link

Colors Appear To Have Almost-Universal Symbolic Associations

Thoth Hermes20 May 2023 18:40 UTC

−33 points

4 comments7 min readLW link

(thothhermes.substack.com)

Twiblings, four-parent babies and other reproductive technology

GeneSmith20 May 2023 17:11 UTC

192 points

33 comments6 min readLW link

P-zombies, Compression and the Simulation Hypothesis

RussellThor20 May 2023 11:36 UTC

5 points

0 comments5 min readLW link

The possible shared Craft of deliberate Lexicogenesis

TsviBT20 May 2023 5:56 UTC

71 points

5 comments61 min readLW link