All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 222324 25 26 27 28 29 30 31

AI Safety proposal—Influencing the superintelligence explosion

Morgan22 May 2024 23:31 UTC

0 points

2 comments7 min readLW link

Implementing Asimov’s Laws of Robotics—How I imagine alignment working.

Joshua Clancy22 May 2024 23:15 UTC

2 points

0 comments11 min readLW link

Higher-Order Forecasts

ozziegooen22 May 2024 21:49 UTC

45 points

1 comment3 min readLW link

A Positive Double Standard—Self-Help Principles Work For Individuals Not Populations

James Stephen Brown22 May 2024 21:37 UTC

8 points

3 comments5 min readLW link

A Bi-Modal Brain Model

Johannes C. Mayer22 May 2024 20:10 UTC

12 points

3 comments2 min readLW link

Offering service as a sensayer for simulationist-adjacent beliefs.

mako yass22 May 2024 18:52 UTC

22 points

0 comments1 min readLW link

Do Not Mess With Scarlett Johansson

Zvi22 May 2024 15:10 UTC

65 points

7 comments16 min readLW link

(thezvi.wordpress.com)

How Multiverse Theory dissolves Quantum inexplicability

mrdlm22 May 2024 14:55 UTC

0 points

0 comments1 min readLW link

[Question] Should we be concerned about eating too much soy?

ChristianKl22 May 2024 12:53 UTC

18 points

3 comments1 min readLW link

Procedural Executive Function, Part 3

DaystarEld22 May 2024 11:58 UTC

21 points

4 comments23 min readLW link

Cicadas, Anthropic, and the bilateral alignment problem

kromem22 May 2024 11:09 UTC

28 points

6 comments5 min readLW link

Announcing Human-aligned AI Summer School

Jan_Kulveit and Tomáš Gavenčiak

22 May 2024 8:55 UTC

51 points

0 comments1 min readLW link

(humanaligned.ai)

“Which chains-of-thought was that faster than?”

Emrik22 May 2024 8:21 UTC

37 points

4 comments4 min readLW link

Each Llama3-8b text uses a different “random” subspace of the activation space

tailcalled22 May 2024 7:31 UTC

3 points

4 comments7 min readLW link

ARIA’s Safeguarded AI grant program is accepting applications for Technical Area 1.1 until May 28th

Brendon_Wong22 May 2024 6:54 UTC

11 points

0 comments1 min readLW link

(www.aria.org.uk)

Anthropic announces interpretability advances. How much does this advance alignment?

Seth Herd21 May 2024 22:30 UTC

49 points

4 comments3 min readLW link

(www.anthropic.com)

[Question] What would stop you from paying for an LLM?

yanni kyriacos21 May 2024 22:25 UTC

17 points

15 comments1 min readLW link

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024

scasper21 May 2024 20:15 UTC

157 points

16 comments3 min readLW link

Mitigating extreme AI risks amid rapid progress [Linkpost]

Orpheus1621 May 2024 19:59 UTC

21 points

7 comments4 min readLW link

The problem with rationality

David Loomis21 May 2024 18:49 UTC

−17 points

1 comment6 min readLW link

rough draft on what happens in the brain when you have an insight

Emrik21 May 2024 18:02 UTC

11 points

2 comments1 min readLW link

On Dwarkesh’s Podcast with OpenAI’s John Schulman

Zvi21 May 2024 17:30 UTC

73 points

4 comments20 min readLW link

(thezvi.wordpress.com)

[Question] Is deleting capabilities still a relevant research question?

tailcalled21 May 2024 13:24 UTC

15 points

1 comment1 min readLW link

New voluntary commitments (AI Seoul Summit)

Zach Stein-Perlman21 May 2024 11:00 UTC

81 points

17 comments7 min readLW link

(www.gov.uk)

ACX/LW/EA/* Meetup Bremen

RasmusHB21 May 2024 5:42 UTC

2 points

0 comments1 min readLW link

My Dating Heuristic

Declan Molony21 May 2024 5:28 UTC

27 points

4 comments2 min readLW link

Scorable Functions: A Format for Algorithmic Forecasting

ozziegooen21 May 2024 4:14 UTC

29 points

0 comments8 min readLW link

The Problem With the Word ‘Alignment’

peligrietzer and particlemania

21 May 2024 3:48 UTC

63 points

8 comments6 min readLW link

What’s Going on With OpenAI’s Messaging?

ozziegooen21 May 2024 2:22 UTC

191 points

13 comments3 min readLW link

Harmony Intelligence is Hiring!

James Dao and Soroush Pour

21 May 2024 2:11 UTC

10 points

0 comments1 min readLW link

(www.harmonyintelligence.com)

[Linkpost] Statement from Scarlett Johansson on OpenAI’s use of the “Sky” voice, that was shockingly similar to her own voice.

Linch20 May 2024 23:50 UTC

31 points

8 comments1 min readLW link

(variety.com)

Some perspectives on the discipline of Physics

Tahp20 May 2024 18:19 UTC

18 points

3 comments13 min readLW link

(quark.rodeo)

[Question] Are there any groupchats for people working on Representation reading/control, activation steering type experiments?

Joe Kwon20 May 2024 18:03 UTC

3 points

1 comment1 min readLW link

Interpretability: Integrated Gradients is a decent attribution method

Lucius Bushnaq, jake_mendel, StefanHex and Kaarel

20 May 2024 17:55 UTC

23 points

7 comments6 min readLW link

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Lucius Bushnaq, jake_mendel, Dan Braun, StefanHex, Nicholas Goldowsky-Dill, Kaarel, Avery, Joern Stoehler, debrevitatevitae, Magdalena Wache and Marius Hobbhahn

20 May 2024 17:53 UTC

108 points

4 comments3 min readLW link

NAO Updates, Spring 2024

jefftk20 May 2024 16:51 UTC

13 points

0 comments6 min readLW link

(naobservatory.org)

OpenAI: Exodus

Zvi20 May 2024 13:10 UTC

153 points

26 comments44 min readLW link

(thezvi.wordpress.com)

Infra-Bayesian haggling

hannagabor20 May 2024 12:23 UTC

28 points

0 comments20 min readLW link

Jaan Tallinn’s 2023 Philanthropy Overview

jaan20 May 2024 12:11 UTC

203 points

5 comments1 min readLW link

(jaan.info)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset]

abstractapplic20 May 2024 9:38 UTC

31 points

2 comments1 min readLW link

Why I find Davidad’s plan interesting

Paul W20 May 2024 8:13 UTC

18 points

0 comments6 min readLW link

Anthropic: Reflections on our Responsible Scaling Policy

Zac Hatfield-Dodds20 May 2024 4:14 UTC

30 points

21 comments10 min readLW link

(www.anthropic.com)

The consistent guessing problem is easier than the halting problem

jessicata20 May 2024 4:02 UTC

38 points

5 comments4 min readLW link

(unstableontology.com)

A poem titled ‘Tick Tock’.

Krantz20 May 2024 3:52 UTC

−1 points

0 comments1 min readLW link

Against Computers (infinite play)

rogersbacon20 May 2024 0:43 UTC

−11 points

1 comment14 min readLW link

(www.secretorum.life)

Testing for parallel reasoning in LLMs

meemi and Olli Järviniemi

19 May 2024 15:28 UTC

9 points

7 comments9 min readLW link

Hot take: The AI safety movement is way too sectarian and this is greatly increasing p(doom)

O O19 May 2024 2:18 UTC

14 points

15 comments2 min readLW link

Some “meta-cruxes” for AI x-risk debates

Aryeh Englander19 May 2024 0:21 UTC

20 points

2 comments3 min readLW link

On Privilege

Shmi18 May 2024 22:36 UTC

16 points

10 comments2 min readLW link

Fund me please—I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University

Johannes C. Mayer18 May 2024 19:53 UTC

22 points

37 comments6 min readLW link