All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 232425 26 27 28 29 30 31

How to Give Coming AGI’s the Best Chance of Figuring Out Ethics for Us

sweenesm23 May 2024 19:44 UTC

1 point

2 comments10 min readLW link

Mentorship in AGI Safety (MAGIS) call for mentors

Valentin2026 and Joe Rogero

23 May 2024 18:28 UTC

32 points

3 comments2 min readLW link

Quick Thoughts on Scaling Monosemanticity

Joel Burget23 May 2024 16:22 UTC

28 points

1 comment4 min readLW link

(transformer-circuits.pub)

The case for stopping AI safety research

catubc23 May 2024 15:55 UTC

53 points

38 comments1 min readLW link

[Question] SAE sparse feature graph using only residual layers

Jaehyuk Lim23 May 2024 13:32 UTC

0 points

3 comments1 min readLW link

[Question] Are most people deeply confused about “love”, or am I missing a human universal?

SpectrumDT23 May 2024 13:22 UTC

13 points

28 comments3 min readLW link

Executive Dysfunction 101

DaystarEld23 May 2024 12:43 UTC

33 points

1 comment3 min readLW link

(daystareld.com)

AI #65: I Spy With My AI

Zvi23 May 2024 12:40 UTC

28 points

7 comments43 min readLW link

(thezvi.wordpress.com)

What mistakes has the AI safety movement made?

EuanMcLean23 May 2024 11:19 UTC

64 points

29 comments12 min readLW link

What should AI safety be trying to achieve?

EuanMcLean23 May 2024 11:17 UTC

17 points

1 comment13 min readLW link

What will the first human-level AI look like, and how might things go wrong?

EuanMcLean23 May 2024 11:17 UTC

20 points

2 comments15 min readLW link

Big Picture AI Safety: Introduction

EuanMcLean23 May 2024 11:15 UTC

46 points

7 comments5 min readLW link

Paper in Science: Managing extreme AI risks amid rapid progress

JanB23 May 2024 8:40 UTC

50 points

2 comments1 min readLW link

Power Law Policy

Ben Turtel23 May 2024 5:28 UTC

4 points

7 comments6 min readLW link

(bturtel.substack.com)

Why entropy means you might not have to worry as much about superintelligent AI

Ron J23 May 2024 3:52 UTC

−26 points

1 comment2 min readLW link

Quick Thoughts on Our First Sampling Run

jefftk23 May 2024 0:20 UTC

29 points

3 comments2 min readLW link

(www.jefftk.com)

AI Safety proposal—Influencing the superintelligence explosion

Morgan22 May 2024 23:31 UTC

0 points

2 comments7 min readLW link

Implementing Asimov’s Laws of Robotics—How I imagine alignment working.

Joshua Clancy22 May 2024 23:15 UTC

2 points

0 comments11 min readLW link

Higher-Order Forecasts

ozziegooen22 May 2024 21:49 UTC

45 points

1 comment3 min readLW link

A Positive Double Standard—Self-Help Principles Work For Individuals Not Populations

James Stephen Brown22 May 2024 21:37 UTC

8 points

3 comments5 min readLW link

A Bi-Modal Brain Model

Johannes C. Mayer22 May 2024 20:10 UTC

12 points

3 comments2 min readLW link

Offering service as a sensayer for simulationist-adjacent beliefs.

mako yass22 May 2024 18:52 UTC

22 points

0 comments1 min readLW link

Do Not Mess With Scarlett Johansson

Zvi22 May 2024 15:10 UTC

65 points

7 comments16 min readLW link

(thezvi.wordpress.com)

How Multiverse Theory dissolves Quantum inexplicability

mrdlm22 May 2024 14:55 UTC

0 points

0 comments1 min readLW link

[Question] Should we be concerned about eating too much soy?

ChristianKl22 May 2024 12:53 UTC

18 points

3 comments1 min readLW link

Procedural Executive Function, Part 3

DaystarEld22 May 2024 11:58 UTC

21 points

4 comments23 min readLW link

Cicadas, Anthropic, and the bilateral alignment problem

kromem22 May 2024 11:09 UTC

28 points

6 comments5 min readLW link

Announcing Human-aligned AI Summer School

Jan_Kulveit and Tomáš Gavenčiak

22 May 2024 8:55 UTC

51 points

0 comments1 min readLW link

(humanaligned.ai)

“Which chains-of-thought was that faster than?”

Emrik22 May 2024 8:21 UTC

37 points

4 comments4 min readLW link

Each Llama3-8b text uses a different “random” subspace of the activation space

tailcalled22 May 2024 7:31 UTC

3 points

4 comments7 min readLW link

ARIA’s Safeguarded AI grant program is accepting applications for Technical Area 1.1 until May 28th

Brendon_Wong22 May 2024 6:54 UTC

11 points

0 comments1 min readLW link

(www.aria.org.uk)

Anthropic announces interpretability advances. How much does this advance alignment?

Seth Herd21 May 2024 22:30 UTC

49 points

4 comments3 min readLW link

(www.anthropic.com)

[Question] What would stop you from paying for an LLM?

yanni kyriacos21 May 2024 22:25 UTC

17 points

15 comments1 min readLW link

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024

scasper21 May 2024 20:15 UTC

157 points

16 comments3 min readLW link

Mitigating extreme AI risks amid rapid progress [Linkpost]

Orpheus1621 May 2024 19:59 UTC

21 points

7 comments4 min readLW link

The problem with rationality

David Loomis21 May 2024 18:49 UTC

−17 points

1 comment6 min readLW link

rough draft on what happens in the brain when you have an insight

Emrik21 May 2024 18:02 UTC

11 points

2 comments1 min readLW link

On Dwarkesh’s Podcast with OpenAI’s John Schulman

Zvi21 May 2024 17:30 UTC

73 points

4 comments20 min readLW link

(thezvi.wordpress.com)

[Question] Is deleting capabilities still a relevant research question?

tailcalled21 May 2024 13:24 UTC

15 points

1 comment1 min readLW link

New voluntary commitments (AI Seoul Summit)

Zach Stein-Perlman21 May 2024 11:00 UTC

81 points

17 comments7 min readLW link

(www.gov.uk)

ACX/LW/EA/* Meetup Bremen

RasmusHB21 May 2024 5:42 UTC

2 points

0 comments1 min readLW link

My Dating Heuristic

Declan Molony21 May 2024 5:28 UTC

27 points

4 comments2 min readLW link

Scorable Functions: A Format for Algorithmic Forecasting

ozziegooen21 May 2024 4:14 UTC

29 points

0 comments8 min readLW link

The Problem With the Word ‘Alignment’

peligrietzer and particlemania

21 May 2024 3:48 UTC

63 points

8 comments6 min readLW link

What’s Going on With OpenAI’s Messaging?

ozziegooen21 May 2024 2:22 UTC

191 points

13 comments3 min readLW link

Harmony Intelligence is Hiring!

James Dao and Soroush Pour

21 May 2024 2:11 UTC

10 points

0 comments1 min readLW link

(www.harmonyintelligence.com)

[Linkpost] Statement from Scarlett Johansson on OpenAI’s use of the “Sky” voice, that was shockingly similar to her own voice.

Linch20 May 2024 23:50 UTC

31 points

8 comments1 min readLW link

(variety.com)

Some perspectives on the discipline of Physics

Tahp20 May 2024 18:19 UTC

18 points

3 comments13 min readLW link

(quark.rodeo)

[Question] Are there any groupchats for people working on Representation reading/control, activation steering type experiments?

Joe Kwon20 May 2024 18:03 UTC

3 points

1 comment1 min readLW link

Interpretability: Integrated Gradients is a decent attribution method

Lucius Bushnaq, jake_mendel, StefanHex and Kaarel

20 May 2024 17:55 UTC

23 points

7 comments6 min readLW link