All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 242526 27 28 29 30 31

Computational Mechanics Hackathon (June 1 & 2)

Adam ShaiMay 24, 2024, 10:18 PM

34 points

5 comments1 min readLW link

[Question] Request for comments/opinions/ideas on safety/ethics for use of tool AI in a large healthcare system.

bokovMay 24, 2024, 8:53 PM

5 points

2 comments1 min readLW link

NYU Code Debates Update/Postmortem

David ReinMay 24, 2024, 4:08 PM

27 points

4 comments10 min readLW link

AI companies aren’t really using external evaluators

Zach Stein-PerlmanMay 24, 2024, 4:01 PM

242 points

15 comments4 min readLW link

The Schumer Report on AI (RTFB)

ZviMay 24, 2024, 3:10 PM

34 points

3 comments36 min readLW link

(thezvi.wordpress.com)

minutes from a human-alignment meeting

bhauthMay 24, 2024, 5:01 AM

67 points

4 comments2 min readLW link

Talent Needs of Technical AI Safety Teams

yams, Carson Jones, McKennaFitzgerald and Ryan Kidd

May 24, 2024, 12:36 AM

118 points

65 comments14 min readLW link

How to Give Coming AGI’s the Best Chance of Figuring Out Ethics for Us

sweenesmMay 23, 2024, 7:44 PM

1 point

2 comments10 min readLW link

Mentorship in AGI Safety (MAGIS) call for mentors

Valentin2026 and Joe Rogero

May 23, 2024, 6:28 PM

31 points

3 comments2 min readLW link

Quick Thoughts on Scaling Monosemanticity

Joel BurgetMay 23, 2024, 4:22 PM

28 points

1 comment4 min readLW link

(transformer-circuits.pub)

The case for stopping AI safety research

catubcMay 23, 2024, 3:55 PM

53 points

38 comments1 min readLW link

[Question] SAE sparse feature graph using only residual layers

Jaehyuk LimMay 23, 2024, 1:32 PM

0 points

3 comments1 min readLW link

[Question] Are most people deeply confused about “love”, or am I missing a human universal?

SpectrumDTMay 23, 2024, 1:22 PM

13 points

28 comments3 min readLW link

Executive Dysfunction 101

DaystarEldMay 23, 2024, 12:43 PM

28 points

1 comment3 min readLW link

(daystareld.com)

AI #65: I Spy With My AI

ZviMay 23, 2024, 12:40 PM

28 points

7 comments43 min readLW link

(thezvi.wordpress.com)

What mistakes has the AI safety movement made?

EuanMcLeanMay 23, 2024, 11:19 AM

64 points

29 comments12 min readLW link

What should AI safety be trying to achieve?

EuanMcLeanMay 23, 2024, 11:17 AM

17 points

1 comment13 min readLW link

What will the first human-level AI look like, and how might things go wrong?

EuanMcLeanMay 23, 2024, 11:17 AM

20 points

2 comments15 min readLW link

Big Picture AI Safety: Introduction

EuanMcLeanMay 23, 2024, 11:15 AM

46 points

7 comments5 min readLW link

Paper in Science: Managing extreme AI risks amid rapid progress

JanBMay 23, 2024, 8:40 AM

50 points

2 comments1 min readLW link

Power Law Policy

Ben TurtelMay 23, 2024, 5:28 AM

4 points

7 comments6 min readLW link

(bturtel.substack.com)

Why entropy means you might not have to worry as much about superintelligent AI

Ron JMay 23, 2024, 3:52 AM

−26 points

1 comment2 min readLW link

Quick Thoughts on Our First Sampling Run

jefftkMay 23, 2024, 12:20 AM

29 points

3 comments2 min readLW link

(www.jefftk.com)

AI Safety proposal—Influencing the superintelligence explosion

MorganMay 22, 2024, 11:31 PM

0 points

2 comments7 min readLW link

Implementing Asimov’s Laws of Robotics—How I imagine alignment working.

Joshua ClancyMay 22, 2024, 11:15 PM

2 points

0 comments11 min readLW link

Higher-Order Forecasts

ozziegooenMay 22, 2024, 9:49 PM

45 points

1 comment LW link

A Positive Double Standard—Self-Help Principles Work For Individuals Not Populations

James Stephen BrownMay 22, 2024, 9:37 PM

8 points

3 comments5 min readLW link

A Bi-Modal Brain Model

Johannes C. MayerMay 22, 2024, 8:10 PM

12 points

3 comments2 min readLW link

Offering service as a sensayer for simulationist-adjacent beliefs.

mako yassMay 22, 2024, 6:52 PM

22 points

0 comments1 min readLW link

Do Not Mess With Scarlett Johansson

ZviMay 22, 2024, 3:10 PM

65 points

7 comments16 min readLW link

(thezvi.wordpress.com)

How Multiverse Theory dissolves Quantum inexplicability

mrdlmMay 22, 2024, 2:55 PM

0 points

0 comments1 min readLW link

[Question] Should we be concerned about eating too much soy?

ChristianKlMay 22, 2024, 12:53 PM

18 points

3 comments1 min readLW link

Procedural Executive Function, Part 3

DaystarEldMay 22, 2024, 11:58 AM

20 points

4 comments LW link

Cicadas, Anthropic, and the bilateral alignment problem

kromemMay 22, 2024, 11:09 AM

28 points

6 comments5 min readLW link

Announcing Human-aligned AI Summer School

Jan_Kulveit and Tomáš Gavenčiak

May 22, 2024, 8:55 AM

50 points

0 comments1 min readLW link

(humanaligned.ai)

“Which chains-of-thought was that faster than?”

EmrikMay 22, 2024, 8:21 AM

37 points

4 comments4 min readLW link

Each Llama3-8b text uses a different “random” subspace of the activation space

tailcalledMay 22, 2024, 7:31 AM

3 points

4 comments7 min readLW link

ARIA’s Safeguarded AI grant program is accepting applications for Technical Area 1.1 until May 28th

Brendon_WongMay 22, 2024, 6:54 AM

11 points

0 comments1 min readLW link

(www.aria.org.uk)

Anthropic announces interpretability advances. How much does this advance alignment?

Seth HerdMay 21, 2024, 10:30 PM

49 points

4 comments3 min readLW link

(www.anthropic.com)

[Question] What would stop you from paying for an LLM?

yanni kyriacosMay 21, 2024, 10:25 PM

17 points

15 comments1 min readLW link

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024

scasperMay 21, 2024, 8:15 PM

157 points

16 comments3 min readLW link

Mitigating extreme AI risks amid rapid progress [Linkpost]

Orpheus16May 21, 2024, 7:59 PM

21 points

7 comments4 min readLW link

The problem with rationality

David LoomisMay 21, 2024, 6:49 PM

−17 points

1 comment6 min readLW link

rough draft on what happens in the brain when you have an insight

Emrik21 May 2024 18:02 UTC

11 points

2 comments1 min readLW link

On Dwarkesh’s Podcast with OpenAI’s John Schulman

Zvi21 May 2024 17:30 UTC

73 points

4 comments20 min readLW link

(thezvi.wordpress.com)

[Question] Is deleting capabilities still a relevant research question?

tailcalled21 May 2024 13:24 UTC

15 points

1 comment1 min readLW link

New voluntary commitments (AI Seoul Summit)

Zach Stein-Perlman21 May 2024 11:00 UTC

81 points

17 comments7 min readLW link

(www.gov.uk)

ACX/LW/EA/* Meetup Bremen

RasmusHB21 May 2024 5:42 UTC

2 points

0 comments1 min readLW link

My Dating Heuristic

Declan Molony21 May 2024 5:28 UTC

26 points

4 comments2 min readLW link

Scorable Functions: A Format for Algorithmic Forecasting

ozziegooen21 May 2024 4:14 UTC

29 points

0 comments LW link