All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 101112 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

[Question] AI interpretability could be harmful?

Roman Leventov10 May 2023 20:43 UTC

13 points

2 comments1 min readLW link

Athens, Greece – ACX Meetups Everywhere Spring 2023

Spyros Dovas10 May 2023 19:45 UTC

1 point

0 comments1 min readLW link

Better debates

TsviBT10 May 2023 19:34 UTC

78 points

7 comments3 min readLW link

Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)

Chris Scammell and DivineMango

10 May 2023 19:04 UTC

265 points

54 comments21 min readLW link

A Corrigibility Metaphore—Big Gambles

WCargo10 May 2023 18:13 UTC

16 points

0 comments4 min readLW link

Roadmap for a collaborative prototype of an Open Agency Architecture

Deger Turan10 May 2023 17:41 UTC

31 points

0 comments12 min readLW link

AGI-Automated Interpretability is Suicide

__RicG__10 May 2023 14:20 UTC

25 points

33 comments7 min readLW link

Class-Based Addressing

jefftk10 May 2023 13:40 UTC

22 points

6 comments1 min readLW link

(www.jefftk.com)

In defence of epistemic modesty [distillation]

Luise10 May 2023 9:44 UTC

17 points

2 comments9 min readLW link

[Question] How much of a concern are open-source LLMs in the short, medium and long terms?

JavierCC10 May 2023 9:14 UTC

5 points

0 comments1 min readLW link

10 great reasons why Lex Fridman should invite Eliezer and Robin to re-do the FOOM debate on his podcast

chaosmage10 May 2023 8:27 UTC

−7 points

1 comment1 min readLW link

(www.reddit.com)

New OpenAI Paper—Language models can explain neurons in language models

MrThink10 May 2023 7:46 UTC

47 points

14 comments1 min readLW link

Naturalist Experimentation

LoganStrohl10 May 2023 4:28 UTC

62 points

14 comments10 min readLW link

[Question] Could A Superintelligence Out-Argue A Doomer?

tjaffee10 May 2023 2:40 UTC

−16 points

6 comments1 min readLW link

Gradient hacking via actual hacking

Max H10 May 2023 1:57 UTC

12 points

7 comments3 min readLW link

Red teaming: challenges and research directions

joshc10 May 2023 1:40 UTC

31 points

1 comment10 min readLW link

[Question] Looking for a post I read if anyone recognizes it

SilverFlame10 May 2023 1:24 UTC

2 points

2 comments1 min readLW link

Research Report: Incorrectness Cascades (Corrected)

Robert_AIZI9 May 2023 21:54 UTC

9 points

0 comments9 min readLW link

(aizi.substack.com)

Stopping dangerous AI: Ideal US behavior

Zach Stein-Perlman9 May 2023 21:00 UTC

17 points

0 comments3 min readLW link

Stopping dangerous AI: Ideal lab behavior

Zach Stein-Perlman9 May 2023 21:00 UTC

8 points

0 comments2 min readLW link

Progress links and tweets, 2023-05-09

jasoncrawford9 May 2023 20:22 UTC

14 points

0 comments2 min readLW link

(rootsofprogress.org)

[Question] Have you heard about MIT’s “liquid neural networks”? What do you think about them?

Ppau9 May 2023 20:16 UTC

35 points

14 comments1 min readLW link

Respect for Boundaries as non-arbirtrary coordination norms

Jonas Hallgren9 May 2023 19:42 UTC

9 points

3 comments7 min readLW link

Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1

StefanHex and Marius Hobbhahn

9 May 2023 19:41 UTC

119 points

1 comment10 min readLW link

Forecasting as a tool for teaching the general public to make better judgements?

Dominik Hajduk | České priority9 May 2023 17:35 UTC

3 points

0 comments3 min readLW link

Language models can explain neurons in language models

nz9 May 2023 17:29 UTC

23 points

0 comments1 min readLW link

(openai.com)

Asimov on building robots without the First Law

rossry9 May 2023 16:44 UTC

4 points

1 comment2 min readLW link

Making Up Baby Signs

jefftk9 May 2023 16:40 UTC

44 points

6 comments2 min readLW link

(www.jefftk.com)

Exciting New Interpretability Paper!

research_prime_space9 May 2023 16:39 UTC

12 points

1 comment1 min readLW link

Result Of The Bounty/Contest To Explain Infra-Bayes In The Language Of Game Theory

johnswentworth9 May 2023 16:35 UTC

79 points

0 comments1 min readLW link

The Bleak Harmony of Diets and Survival: A Glimpse into Nature’s Unforgiving Balance

bardstale9 May 2023 16:08 UTC

−16 points

0 comments1 min readLW link

Entropic Abyss

bardstale9 May 2023 15:59 UTC

−12 points

0 comments2 min readLW link

AI Safety Newsletter #5: Geoffrey Hinton speaks out on AI risk, the White House meets with AI labs, and Trojan attacks on language models

Dan H and Orpheus16

9 May 2023 15:26 UTC

28 points

1 comment4 min readLW link

(newsletter.safe.ai)

A Search for More ChatGPT / GPT-3.5 / GPT-4 “Unspeakable” Glitch Tokens

Martin Fell9 May 2023 14:36 UTC

26 points

9 comments6 min readLW link

How to Interpret Prediction Market Prices as Probabilities

SimonM9 May 2023 14:12 UTC

14 points

1 comment4 min readLW link

Stampy’s AI Safety Info—New Distillations #2 [April 2023]

markov9 May 2023 13:31 UTC

25 points

1 comment1 min readLW link

(aisafety.info)

Quote quiz answer

jasoncrawford9 May 2023 13:27 UTC

19 points

0 comments4 min readLW link

(rootsofprogress.org)

[Question] Does reversible computation let you compute the complexity class PSPACE as efficiently as normal computers compute the complexity class P?

Noosphere899 May 2023 13:18 UTC

6 points

14 comments1 min readLW link

EconTalk podcast: “Eliezer Yudkowsky on the Dangers of AI”

TekhneMakre9 May 2023 11:14 UTC

15 points

1 comment1 min readLW link

(www.econtalk.org)

Most people should probably feel safe most of the time

Kaj_Sotala9 May 2023 9:35 UTC

95 points

28 comments10 min readLW link

Summaries of top forum posts (1st to 7th May 2023)

Zoe Williams9 May 2023 9:30 UTC

21 points

0 comments11 min readLW link

Focusing on longevity research as a way to avoid the AI apocalypse

Random Trader9 May 2023 4:47 UTC

14 points

2 comments2 min readLW link

When is Goodhart catastrophic?

Drake Thomas and Thomas Kwa

9 May 2023 3:59 UTC

180 points

30 comments8 min readLW link 1 review

Chilean AIS Hackathon Retrospective

agucova9 May 2023 1:34 UTC

9 points

0 comments5 min readLW link

Announcing “Key Phenomena in AI Risk” (facilitated reading group)

Nora_Ammann and particlemania

9 May 2023 0:31 UTC

65 points

4 comments2 min readLW link

Yoshua Bengio argues for tool-AI and to ban “executive-AI”

habryka9 May 2023 0:13 UTC

53 points

15 comments7 min readLW link

(yoshuabengio.org)

South Bay ACX/LW Meetup

IS8 May 2023 23:55 UTC

2 points

0 comments1 min readLW link

H-JEPA might be technically alignable in a modified form

Roman Leventov8 May 2023 23:04 UTC

12 points

2 comments7 min readLW link

All AGI Safety questions welcome (especially basic ones) [May 2023]

steven04618 May 2023 22:30 UTC

33 points

44 comments2 min readLW link

Predictable updating about AI risk

Joe Carlsmith8 May 2023 21:53 UTC

295 points

25 comments36 min readLW link 1 review